Search Results for author: Yongkweon Jeon

Found 10 papers, 1 papers with code

Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

no code implementations • 14 Feb 2024 • Junhan Kim, Kyungphil Park, Chungman Lee, Ho-young Kim, Joonyoung Kim, Yongkweon Jeon

Through extensive experiments on various language models and complexity analysis, we demonstrate that aespa is accurate and efficient in quantizing Transformer models.

Quantization

Paper
Add Code

Genie: Show Me the Data for Quantization

1 code implementation • CVPR 2023 • Yongkweon Jeon, Chungman Lee, Ho-young Kim

We also propose a post-training quantization algorithm to enhance the performance of quantized models.

Data Free Quantization

Paper
Code

Mr.BiQ: Post-Training Non-Uniform Quantization Based on Minimizing the Reconstruction Error

no code implementations • CVPR 2022 • Yongkweon Jeon, Chungman Lee, Eulrang Cho, Yeonju Ro

We thus propose a new post-training non-uniform quantization method, called Mr. BiQ, allowing low bit-width quantization even on Transformer models.

Binarization Quantization

Paper
Add Code

Modulating Regularization Frequency for Efficient Compression-Aware Model Training

no code implementations • 5 May 2021 • Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Jeongin Yun, Baeseong Park, Yongkweon Jeon

While model compression is increasingly important because of large neural network size, compression-aware training is challenging as it needs sophisticated model modifications and longer training time. In this paper, we introduce regularization frequency (i. e., how often compression is performed during training) as a new regularization technique for a practical and efficient compression-aware training method.

Model Compression

Paper
Add Code

Q-Rater: Non-Convex Optimization for Post-Training Uniform Quantization

no code implementations • 5 May 2021 • Byeongwook Kim, Dongsoo Lee, Yeonju Ro, Yongkweon Jeon, Se Jung Kwon, Baeseong Park, Daehwan Oh

When the number of quantization bits is relatively low, however, non-convex optimization is unavoidable to improve model accuracy.

Quantization

Paper
Add Code

Post-Training Weighted Quantization of Neural Networks for Language Models

no code implementations • 1 Jan 2021 • Se Jung Kwon, Dongsoo Lee, Yongkweon Jeon, Byeongwook Kim, Bae Seong Park, Yeonju Ro

As a practical model compression technique, parameter quantization is effective especially for language models associated with a large memory footprint.

Model Compression Quantization

Paper
Add Code

Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation

no code implementations • Findings of the Association for Computational Linguistics 2020 • Insoo Chung, Byeongwook Kim, Yoonjung Choi, Se Jung Kwon, Yongkweon Jeon, Baeseong Park, Sangha Kim, Dongsoo Lee

Our analysis shows that for a given number of quantization bits, each block of Transformer contributes to translation quality and inference computations in different manners.

Machine Translation NMT +2

Paper
Add Code

FleXOR: Trainable Fractional Quantization

no code implementations • NeurIPS 2020 • Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Yongkweon Jeon, Baeseong Park, Jeongin Yun

Quantization based on the binary codes is gaining attention because each quantized bit can be directly utilized for computations without dequantization using look-up tables.

Quantization

Paper
Add Code

BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs

no code implementations • 20 May 2020 • Yongkweon Jeon, Baeseong Park, Se Jung Kwon, Byeongwook Kim, Jeongin Yun, Dongsoo Lee

Success of quantization in practice, hence, relies on an efficient computation engine design, especially for matrix multiplication that is a basic computation engine in most DNNs.

Quantization

Paper
Add Code

Decoupling Weight Regularization from Batch Size for Model Compression

no code implementations • 25 Sep 2019 • Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Yongkweon Jeon, Baeseong Park, Jeongin Yun, Gu-Yeon Wei

Using various models, we show that simple weight updates to comply with compression formats along with long NR period is enough to achieve high compression ratio and model accuracy.

Model Compression

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.