3 code implementations • 12 Jun 2023 • George E. Dahl, Frank Schneider, Zachary Nado, Naman Agarwal, Chandramouli Shama Sastry, Philipp Hennig, Sourabh Medapati, Runa Eschenhagen, Priya Kasimbeg, Daniel Suo, Juhan Bae, Justin Gilmer, Abel L. Peirson, Bilal Khan, Rohan Anil, Mike Rabbat, Shankar Krishnan, Daniel Snider, Ehsan Amid, Kongtao Chen, Chris J. Maddison, Rakshith Vasudev, Michal Badura, Ankush Garg, Peter Mattson
In order to address these challenges, we introduce a new, competitive, time-to-result benchmark using multiple workloads running on fixed hardware, the AlgoPerf: Training Algorithms benchmark.
no code implementations • 30 Jan 2023 • Daniel Snider, Ruofan Liang
Machine learning (ML) compilers are an active area of research because they offer the potential to automatically speedup tensor programs.
no code implementations • 15 Jul 2022 • James Gleeson, Daniel Snider, Yvonne Yang, Moshe Gabel, Eyal de Lara, Gennady Pekhimenko
We show that simulator kernel fusion speedups with a simple simulator are $11. 3\times$ and increase by up to $1024\times$ as simulator complexity increases in terms of memory bandwidth requirements.