no code implementations • 30 Jan 2024 • Suchita Pati, Shaizeen Aga, Mahzabeen Islam, Nuwan Jayasena, Matthew D. Sinclair
One approach to hide this serialized communication is to interleave it with the producer operation (of the communicated data) in a fine-grained manner.
no code implementations • 14 Apr 2021 • Suchita Pati, Shaizeen Aga, Nuwan Jayasena, Matthew D. Sinclair
Further, we also identify heterogeneity in compute-intensive BERT computations and discuss software and possible hardware mechanisms to further optimize these computations.
13 code implementations • 18 Nov 2018 • Jonathan Lew, Deval Shah, Suchita Pati, Shaylin Cattell, Mengchi Zhang, Amruth Sandhupatla, Christopher Ng, Negar Goli, Matthew D. Sinclair, Timothy G. Rogers, Tor Aamodt
Most deep neural networks deployed today are trained using GPUs via high-level frameworks such as TensorFlow and PyTorch.
Distributed, Parallel, and Cluster Computing