Search Results for author: Shang Yang

Found 4 papers, 3 papers with code

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

1 code implementation • 7 May 2024 • Yujun Lin, Haotian Tang, Shang Yang, Zhekai Zhang, Guangxuan Xiao, Chuang Gan, Song Han

The key insight driving QServe is that the efficiency of LLM serving on GPUs is critically influenced by operations on low-throughput CUDA cores.

Language Modelling Large Language Model +1

225

Paper
Code

TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs

1 code implementation • 25 Oct 2023 • Haotian Tang, Shang Yang, Zhijian Liu, Ke Hong, Zhongming Yu, Xiuyu Li, Guohao Dai, Yu Wang, Song Han

On top of this, we design the Sparse Autotuner, which extends the design space of existing sparse convolution libraries and searches for the best dataflow configurations for training and inference workloads.

Autonomous Driving Recommendation Systems

1,125

Paper
Code

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

6 code implementations • 1 Jun 2023 • Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, Song Han

We then propose to search for the optimal per-channel scaling that protects the salient weights by observing the activation, not weights.

Autonomous Driving Common Sense Reasoning +3

19,791

Paper
Code

FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

no code implementations • CVPR 2023 • Zhijian Liu, Xinyu Yang, Haotian Tang, Shang Yang, Song Han

Transformer, as an alternative to CNN, has been proven effective in many modalities (e. g., texts and images).

Autonomous Driving

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.