Pre-Quantized Deep Learning Models Codified in ONNX to Enable Hardware/Software Co-Design

This paper presents a methodology to separate the quantization process from the hardware-specific model compilation stage via a pre-quantized deep learning model description in standard ONNX format. Separating the quantization process from the model compilation stage enables independent development. The methodology is expressive to convey hardware-specific operations and to embed key quantization parameters into a ONNX model which enables hardware/software co-design. Detailed examples are given for both MLP and CNN based networks, which can be extended to other networks in a straightforward fashion.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here