1 code implementation • 18 Apr 2024 • Yubo Gao, Maryam Haghifam, Christina Giannoula, Renbo Tu, Gennady Pekhimenko, Nandita Vijaykumar
Development of new DL models typically involves two parties: the model developers and performance optimizers.
no code implementations • 26 Feb 2024 • Christina Giannoula, Peiming Yang, Ivan Fernandez Vega, Jiacheng Yang, Yu Xin Li, Juan Gomez Luna, Mohammad Sadrosadati, Onur Mutlu, Gennady Pekhimenko
Graph Neural Network (GNN) execution involves both compute-intensive and memory-intensive kernels, the latter dominates the total time, being significantly bottlenecked by data movement between memory and processors.
1 code implementation • 1 Dec 2023 • Jiacheng Yang, Christina Giannoula, Jun Wu, Mostafa Elhoushi, James Gleeson, Gennady Pekhimenko
Minuet proposes to (i) replace the hash tables used in the Map step with a novel segmented sorting double-traversed binary search algorithm that highly utilizes the on-chip memory hierarchy of GPUs, (ii) use a lightweight scheme to autotune the tile size in the Gather and Scatter operations of the GMaS step, such that to adapt the execution to the particular characteristics of each SC layer, dataset, and GPU architecture, and (iii) employ a padding-efficient GEMM grouping approach that reduces both memory padding and kernel launching overheads.
no code implementations • 28 Oct 2023 • Qidong Su, Christina Giannoula, Gennady Pekhimenko
Large Language Models (LLMs) like GPT are state-of-the-art text generation models that provide significant assistance in daily routines.