no code implementations • 23 Nov 2023 • Cyrus Zhou, Pedro Savarese, Vaughn Richard, Zack Hassman, Xin Yuan, Michael Maire, Michael DiBrino, Yanjing Li
We present an end-to-end co-design approach encompassing computer architecture, training algorithm, and inference optimization to efficiently execute networks with fine-grained heterogeneous precisions.
no code implementations • 1 Oct 2023 • Cyrus Zhou, Zack Hassman, Ruize Xu, Dhirpal Shah, Vaugnn Richard, Yanjing Li
Our results demonstrate that the dataflow that keeps outputs in SIMD registers while also maximizing both input and weight reuse consistently yields the best performance for a wide variety of inference workloads, achieving up to 3x speedup for 8-bit neural networks, and up to 4. 8x speedup for binary neural networks, respectively, over the optimized implementations of neural networks today.