Search Results for author: Yuezhou Hu

Found 1 papers, 1 papers with code

Accelerating Transformer Pre-training with 2:4 Sparsity

2 code implementations • 2 Apr 2024 • Yuezhou Hu, Kang Zhao, Weiyu Huang, Jianfei Chen, Jun Zhu

Utilizing this metric, we propose three techniques to preserve accuracy: to modify the sparse-refined straight-through estimator by applying the masked decay term on gradients, to determine a feasible decay factor in warm-up stage, and to enhance the model's quality by a dense fine-tuning procedure near the end of pre-training.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.