no code implementations • 2 Mar 2024 • Ahmed F. AbouElhamayed, Susanne Balle, Deshanand Singh, Mohamed S. Abdelfattah
Our results consistently demonstrate that end-to-end application performance can easily be dominated by data processing and data movement functions (up to 56% of end-to-end latency in a medium-sized image, and $\sim$ 80% impact on system throughput in a large image), even though these functions have been conventionally overlooked in deep learning system design.
1 code implementation • 25 May 2023 • Ahmed F. AbouElhamayed, Angela Cui, Javier Fernandez-Marques, Nicholas D. Lane, Mohamed S. Abdelfattah
We identify PQ configurations that improve performance-per-area for ResNet20 by up to 3. 1$\times$, even when compared to a highly optimized conventional DNN accelerator, with similar improvements on two additional compact DNNs.