no code implementations • 21 Feb 2024 • Shuzhang Zhong, Zebin Yang, Meng Li, Ruihao Gong, Runsheng Wang, Ru Huang
Additionally, it introduces a dynamic token tree generation algorithm to balance the computation and parallelism of the verification phase in real-time and maximize the overall efficiency across different batch sizes, sequence lengths, and tasks, etc.
no code implementations • 26 Aug 2023 • Shuzhang Zhong, Meng Li, Yun Liang, Runsheng Wang, Ru Huang
Memory-aware network scheduling is becoming increasingly important for deep neural network (DNN) inference on resource-constrained devices.