no code implementations • 28 Apr 2022 • Miloš Nikolić, Enrique Torres Sanchez, Jiahui Wang, Ali Hadi Zadeh, Mostafa Mahmoud, Ameer Abdelhadi, Kareem Ibrahim, Andreas Moshovos
Quantum Mantissa and Quantum Exponent are machine learning compression methods that tap into the gradient descent algorithm to learn the minimal mantissa and exponent bitlengths on a per-layer granularity.
no code implementations • 23 Mar 2022 • Ali Hadi Zadeh, Mostafa Mahmoud, Ameer Abdelhadi, Andreas Moshovos
Mokey reduces the footprint of state-of-the-art 32-bit or 16-bit floating-point transformer models by quantizing all values to 4-bit indexes into dictionaries of representative 16-bit fixed-point centroids.