1 code implementation • 28 Feb 2024 • Siqi Kou, Lanxiang Hu, Zhezhi He, Zhijie Deng, Hao Zhang
Parallel decoding methods such as Jacobi decoding show promise for more efficient LLM inference as it breaks the sequential nature of the LLM decoding process and transforms it into parallelizable computation.
no code implementations • 26 Oct 2023 • Ligeng Zhu, Lanxiang Hu, Ji Lin, Wei-Chen Wang, Wei-Ming Chen, Chuang Gan, Song Han
On-device learning and efficient fine-tuning enable continuous and privacy-preserving customization (e. g., locally fine-tuning large language models on personalized data).
no code implementations • 11 Oct 2023 • Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Ion Stoica, Zhijie Deng, Alvin Cheung, Hao Zhang
We develop a prototype of online speculative decoding based on online knowledge distillation and evaluate it using both synthetic and real query data on several popular LLMs.