Search Results for author: Haojie Duanmu

Found 2 papers, 0 papers with code

SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models

no code implementations10 May 2024 Haojie Duanmu, Zhihang Yuan, Xiuhong Li, Jiangfei Duan, Xingcheng Zhang, Dahua Lin

Large language models (LLMs) can now handle longer sequences of tokens, enabling complex tasks like book understanding and generating lengthy novels.

WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More

no code implementations19 Feb 2024 Yuxuan Yue, Zhihang Yuan, Haojie Duanmu, Sifan Zhou, Jianlong Wu, Liqiang Nie

Large Language Models (LLMs) face significant deployment challenges due to their substantial memory requirements and the computational demands of auto-regressive text generation process.

Quantization Text Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.