no code implementations • 16 Jan 2024 • Jinghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K., Panda
Unlike previous methods, our solution can be directly applied to pre-trained MoE models without any fine-tuning or accuracy degradation.
1 code implementation • 22 May 2023 • Jinghan Yao, Nawras Alnaasan, Tian Chen, Aamir Shafi, Hari Subramoni, Dhabaleswar K., Panda
Inference on these models, by design, harnesses a temporal dependency, where the current token's probability distribution is conditioned on preceding tokens.
1 code implementation • 21 Jan 2021 • Aamir Shafi, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K., Panda
This paper presents the design and implementation of a new communication backend for Dask -- called MPI4Dask -- that is targeted for modern HPC clusters built with GPUs.