1 code implementation • 2 Jun 2020 • Sanket Tavarageri, Alexander Heinecke, Sasikanth Avancha, Gagandeep Goyal, Ramakrishna Upadrasta, Bharat Kaul
However, given the constant emergence of new DNN architectures, creating hand optimized code is expensive, slow and is not scalable.
no code implementations • 6 Feb 2020 • Sanket Tavarageri, Alexander Heinecke, Sasikanth Avancha, Gagandeep Goyal, Ramakrishna Upadrasta, Bharat Kaul
In this paper, we develop a hybrid solution to the development of deep learning kernels that achieves the best of both worlds: the expert coded microkernels are utilized for the innermost loops of kernels and we use the advanced polyhedral technology to automatically tune the outer loops for performance.
no code implementations • 11 Jun 2019 • Sanket Tavarageri, Srinivas Sridharan, Bharat Kaul
We model the computation and communication costs of a dataflow graph that embodies the neural network training process and then, partition the graph using heuristics in such a manner that the communication between compute devices is minimal and we have a good load balance.
no code implementations • 29 May 2019 • Sanket Tavarageri
Consequently, the performance of the produced code will match the performance when all the aggressive optimization passes are applied over the entire input program.
no code implementations • 16 Sep 2018 • Sanket Tavarageri, Nag Mani, Anand Ramasubramanian, Jaskiran Kalsi
In this paper, we present an end-to-end pipeline for processing of aggregate data to derive individual level statistics, and then using the inferred data to train machine learning models to answer questions of interest.