no code implementations • 23 Jan 2024 • Hanchen Li, YuHan Liu, Yihua Cheng, Siddhant Ray, Kuntai Du, Junchen Jiang
To render each generated token in real time, the LLM server generates response tokens one by one and streams each generated token (or group of a few tokens) through the network to the user right after it is generated, which we refer to as LLM token streaming.
1 code implementation • 11 Oct 2023 • YuHan Liu, Hanchen Li, Yihua Cheng, Siddhant Ray, YuYang Huang, Qizheng Zhang, Kuntai Du, Jiayi Yao, Shan Lu, Ganesh Ananthanarayanan, Michael Maire, Henry Hoffmann, Ari Holtzman, Junchen Jiang
Compared to the recent systems that reuse the KV cache, CacheGen reduces the KV cache size by 3. 5-4. 3x and the total delay in fetching and processing contexts by 3. 2-3. 7x while having negligible impact on the LLM response quality in accuracy or perplexity.
no code implementations • 7 Oct 2023 • YuHan Liu, Chengcheng Wan, Kuntai Du, Henry Hoffmann, Junchen Jiang, Shan Lu, Michael Maire
ML APIs have greatly relieved application developers of the burden to design and train their own neural network models -- classifying objects in an image can now be as simple as one line of Python code to call an API.
no code implementations • 3 Oct 2023 • Kuntai Du, YuHan Liu, Yitian Hao, Qizheng Zhang, Haodong Wang, YuYang Huang, Ganesh Ananthanarayanan, Junchen Jiang
While the high demand for network bandwidth and GPU resources could be substantially reduced by optimally adapting the configuration knobs, such as video resolution and frame rate, current adaptation techniques fail to meet three requirements simultaneously: adapt configurations (i) with minimum extra GPU or bandwidth overhead; (ii) to reach near-optimal decisions based on how the data affects the final DNN's accuracy, and (iii) do so for a range of configuration knobs.
no code implementations • 26 Apr 2022 • Kuntai Du, Qizheng Zhang, Anton Arapin, Haodong Wang, Zhengxu Xia, Junchen Jiang
This paper presents AccMPEG, a new video encoding and streaming system that meets all the three requirements.