no code implementations • 25 May 2024 • Chenqi Lin, Tianshi Xu, Zebin Yang, Runsheng Wang, Ru Huang, Meng Li
We observe the overhead mainly comes from the neglect of 1) the one-hot nature of user queries and 2) the robustness of the embedding table to low bit-width quantization noise.
no code implementations • 23 May 2024 • Tianshi Xu, Lemeng Wu, Runsheng Wang, Meng Li
Homomorphic encryption (HE)-based deep neural network (DNN) inference protects data and model privacy but suffers from significant computation overhead.
no code implementations • 29 Jan 2024 • Tianshi Xu, Meng Li, Runsheng Wang
Compared with prior-art HE-based protocols, e. g., CrypTFlow2, Cheetah, Iron, etc, HEQuant achieves $3. 5\sim 23. 4\times$ communication reduction and $3. 0\sim 9. 3\times$ latency reduction.
no code implementations • 25 Aug 2023 • Tianshi Xu, Meng Li, Runsheng Wang, Ru Huang
Efficient networks, e. g., MobileNetV2, EfficientNet, etc, achieves state-of-the-art (SOTA) accuracy with lightweight computation.