Search Results for author: Xiangyu Jiang

Found 1 papers, 0 papers with code

SiDA: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models

no code implementations • 29 Oct 2023 • Zhixu Du, Shiyu Li, Yuhao Wu, Xiangyu Jiang, Jingwei Sun, Qilin Zheng, Yongkai Wu, Ang Li, Hai "Helen" Li, Yiran Chen

Specifically, SiDA attains a remarkable speedup in MoE inference with up to 3. 93X throughput increasing, up to 75% latency reduction, and up to 80% GPU memory saving with down to 1% performance drop.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.