no code implementations • 5 Apr 2024 • Ajay Jaiswal, Bodun Hu, Lu Yin, Yeonju Ro, Shiwei Liu, Tianlong Chen, Aditya Akella
In this work, we observed the saturation of computationally expensive feed-forward blocks of LLM layers and proposed FFN-SkipLLM, which is a novel fine-grained skip strategy of autoregressive LLMs.
no code implementations • CVPR 2022 • Yongkweon Jeon, Chungman Lee, Eulrang Cho, Yeonju Ro
We thus propose a new post-training non-uniform quantization method, called Mr. BiQ, allowing low bit-width quantization even on Transformer models.
no code implementations • 5 May 2021 • Byeongwook Kim, Dongsoo Lee, Yeonju Ro, Yongkweon Jeon, Se Jung Kwon, Baeseong Park, Daehwan Oh
When the number of quantization bits is relatively low, however, non-convex optimization is unavoidable to improve model accuracy.
no code implementations • 1 Jan 2021 • Se Jung Kwon, Dongsoo Lee, Yongkweon Jeon, Byeongwook Kim, Bae Seong Park, Yeonju Ro
As a practical model compression technique, parameter quantization is effective especially for language models associated with a large memory footprint.