Search Results for author: Jiangfei Duan

Found 2 papers, 1 papers with code

SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models

no code implementations • 10 May 2024 • Haojie Duanmu, Zhihang Yuan, Xiuhong Li, Jiangfei Duan, Xingcheng Zhang, Dahua Lin

Large language models (LLMs) can now handle longer sequences of tokens, enabling complex tasks like book understanding and generating lengthy novels.

Quantization

Paper
Add Code

SpotServe: Serving Generative Large Language Models on Preemptible Instances

1 code implementation • 27 Nov 2023 • Xupeng Miao, Chunan Shi, Jiangfei Duan, Xiaoli Xi, Dahua Lin, Bin Cui, Zhihao Jia

This paper aims to reduce the monetary cost for serving LLMs by leveraging preemptible GPU instances on modern clouds, which offer accesses to spare GPUs at a much cheaper price than regular instances but may be preempted by the cloud at any time.

Graph Matching

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.