no code implementations • 28 May 2020 • Behnam Pourghassemi, Chenghao Zhang, Joo Hwan Lee, Aparna Chandramowlishwaran
However, popular deep learning (DL) frameworks such as TensorFlow and PyTorch launch the majority of neural network operations, especially convolutions, serially on GPUs and do not exploit this inter-op parallelism.