no code implementations • 13 May 2024 • Raghu Prabhakar, Ram Sivaramakrishnan, Darshan Gandhi, Yun Du, Mingran Wang, XiangYu Song, Kejie Zhang, Tianren Gao, Angela Wang, Karen Li, Yongning Sheng, Joshua Brot, Denis Sokolov, Apurv Vivek, Calvin Leung, Arjun Sabnis, Jiayu Bai, Tuowen Zhao, Mark Gottscho, David Jackson, Mark Luttrell, Manish K. Shah, Edison Chen, Kaizhao Liang, Swayambhoo Jain, Urmish Thakker, Dawei Huang, Sumti Jairath, Kevin J. Brown, Kunle Olukotun
In this paper, we describe how combining CoE, streaming dataflow, and a three-tier memory system scales the AI memory wall.
no code implementations • 11 Apr 2023 • Venkat Srinivasan, Darshan Gandhi, Urmish Thakker, Raghu Prabhakar
We show that we can successfully train GPT 13B to the same quality as the dense GPT 13B model, while achieving an end-end speedup of 4. 5x over dense A100 baseline.