Search Results for author: Liangzhen Lai

Found 20 papers, 5 papers with code

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

no code implementations • 25 Apr 2024 • Mostafa Elhoushi, Akshat Shrivastava, Diana Liskovich, Basil Hosmer, Bram Wasti, Liangzhen Lai, Anas Mahmoud, Bilge Acun, Saurabh Agarwal, Ahmed Roman, Ahmed A Aly, Beidi Chen, Carole-Jean Wu

We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs).

Continual Pretraining Semantic Parsing

Paper
Add Code

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

no code implementations • 22 Feb 2024 • Zechun Liu, Changsheng Zhao, Forrest Iandola, Chen Lai, Yuandong Tian, Igor Fedorov, Yunyang Xiong, Ernie Chang, Yangyang Shi, Raghuraman Krishnamoorthi, Liangzhen Lai, Vikas Chandra

The resultant models, denoted as MobileLLM-LS, demonstrate a further accuracy enhancement of 0. 7%/0. 8% than MobileLLM 125M/350M.

Paper
Add Code

Not All Weights Are Created Equal: Enhancing Energy Efficiency in On-Device Streaming Speech Recognition

no code implementations • 20 Feb 2024 • Yang Li, Yuan Shangguan, Yuhao Wang, Liangzhen Lai, Ernie Chang, Changsheng Zhao, Yangyang Shi, Vikas Chandra

This study delves into how weight parameters in speech recognition models influence the overall power consumption of these models.

speech-recognition Speech Recognition

Paper
Add Code

Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition

no code implementations • 14 Sep 2023 • Yang Li, Liangzhen Lai, Yuan Shangguan, Forrest N. Iandola, Zhaoheng Ni, Ernie Chang, Yangyang Shi, Vikas Chandra

Instead, the bottleneck lies in the linear projection layers of multi-head attention and feedforward networks, constituting a substantial portion of the model size and contributing significantly to computation, memory, and power usage.

speech-recognition Speech Recognition

Paper
Add Code

GPU-based Private Information Retrieval for On-Device Machine Learning Inference

1 code implementation • 26 Jan 2023 • Maximilian Lam, Jeff Johnson, Wenjie Xiong, Kiwan Maeng, Udit Gupta, Yang Li, Liangzhen Lai, Ilias Leontiadis, Minsoo Rhu, Hsien-Hsin S. Lee, Vijay Janapa Reddi, Gu-Yeon Wei, David Brooks, G. Edward Suh

Together, for various on-device ML applications such as recommendation and language modeling, our system on a single V100 GPU can serve up to $100, 000$ queries per second -- a $>100 \times$ throughput improvement over a CPU-based baseline -- while maintaining model accuracy.

Information Retrieval Language Modelling +1

Paper
Code

DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads

no code implementations • 7 Dec 2022 • Seah Kim, Hyoukjun Kwon, Jinook Song, Jihyuck Jo, Yu-Hsin Chen, Liangzhen Lai, Vikas Chandra

Such dynamic behaviors introduce new challenges to the system software in an ML system since the overall system load is not completely predictable, unlike traditional ML workloads.

Scheduling

Paper
Add Code

XRBench: An Extended Reality (XR) Machine Learning Benchmark Suite for the Metaverse

no code implementations • 16 Nov 2022 • Hyoukjun Kwon, Krishnakumar Nair, Jamin Seo, Jason Yik, Debabrata Mohapatra, Dongyuan Zhan, Jinook Song, Peter Capak, Peizhao Zhang, Peter Vajda, Colby Banbury, Mark Mazumder, Liangzhen Lai, Ashish Sirasao, Tushar Krishna, Harshit Khaitan, Vikas Chandra, Vijay Janapa Reddi

We hope that our work will stimulate research and lead to the development of a new generation of ML systems for XR use cases.

Paper
Add Code

Low-Rank+Sparse Tensor Compression for Neural Networks

no code implementations • 2 Nov 2021 • Cole Hawkins, Haichuan Yang, Meng Li, Liangzhen Lai, Vikas Chandra

Low-rank tensor compression has been proposed as a promising approach to reduce the memory and compute requirements of neural networks for their deployment on edge devices.

Tensor Decomposition

Paper
Add Code

Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation

1 code implementation • CVPR 2022 • Jiaqi Gu, Hyoukjun Kwon, Dilin Wang, Wei Ye, Meng Li, Yu-Hsin Chen, Liangzhen Lai, Vikas Chandra, David Z. Pan

Therefore, we propose HRViT, which enhances ViTs to learn semantically-rich and spatially-precise multi-scale representations by integrating high-resolution multi-branch architectures with ViTs.

Ranked #25 on Semantic Segmentation on Cityscapes val

Image Classification Representation Learning +3

175

Paper
Code

Improving Efficiency in Neural Network Accelerator Using Operands Hamming Distance optimization

no code implementations • 13 Feb 2020 • Meng Li, Yilei Li, Pierce Chuang, Liangzhen Lai, Vikas Chandra

Neural network accelerator is a key enabler for the on-device AI inference, for which energy efficiency is an important metric.

Paper
Add Code

Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks

no code implementations • 10 Feb 2020 • Lei Yang, Zheyu Yan, Meng Li, Hyoukjun Kwon, Liangzhen Lai, Tushar Krishna, Vikas Chandra, Weiwen Jiang, Yiyu Shi

Neural Architecture Search (NAS) has demonstrated its power on various AI accelerating platforms such as Field Programmable Gate Arrays (FPGAs) and Graphic Processing Units (GPUs).

Neural Architecture Search

Paper
Add Code

Heterogeneous Dataflow Accelerators for Multi-DNN Workloads

no code implementations • 13 Sep 2019 • Hyoukjun Kwon, Liangzhen Lai, Tushar Krishna, Vikas Chandra

The results suggest that HDA is an alternative class of Pareto-optimal accelerators to RDA with strength in energy, which can be a better choice than RDAs depending on the use cases.

Distributed, Parallel, and Cluster Computing

Paper
Add Code

Rethinking Machine Learning Development and Deployment for Edge Devices

no code implementations • 20 Jun 2018 • Liangzhen Lai, Naveen Suda

Machine learning (ML), especially deep learning is made possible by the availability of big data, enormous compute power and, often overlooked, development tools or frameworks.

BIG-bench Machine Learning

Paper
Add Code

Federated Learning with Non-IID Data

2 code implementations • 2 Jun 2018 • Yue Zhao, Meng Li, Liangzhen Lai, Naveen Suda, Damon Civin, Vikas Chandra

Experiments show that accuracy can be increased by 30% for the CIFAR-10 dataset with only 5% globally shared data.

Federated Learning

103

Paper
Code

CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs

1 code implementation • 19 Jan 2018 • Liangzhen Lai, Naveen Suda, Vikas Chandra

Deep Neural Networks are becoming increasingly popular in always-on IoT edge devices performing data analytics right at the source, reducing latency as well as energy consumption for data communication.

Efficient Neural Network

1,258

Paper
Code

Not All Ops Are Created Equal!

no code implementations • 12 Jan 2018 • Liangzhen Lai, Naveen Suda, Vikas Chandra

Efficient and compact neural network models are essential for enabling the deployment on mobile and embedded devices.

Paper
Add Code

Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks

no code implementations • 5 Dec 2017 • Hardik Sharma, Jongse Park, Naveen Suda, Liangzhen Lai, Benson Chau, Joon Kyung Kim, Vikas Chandra, Hadi Esmaeilzadeh

Compared to Stripes, BitFusion provides 2. 6x speedup and 3. 9x energy reduction at 45 nm node when BitFusion area and frequency are set to those of Stripes.

Paper
Add Code

Hello Edge: Keyword Spotting on Microcontrollers

18 code implementations • 20 Nov 2017 • Yundong Zhang, Naveen Suda, Liangzhen Lai, Vikas Chandra

We train various neural network architectures for keyword spotting published in literature to compare their accuracy and memory/compute requirements.

Ranked #14 on Keyword Spotting on Google Speech Commands

Keyword Spotting

33,142

Paper
Code

PrivyNet: A Flexible Framework for Privacy-Preserving Deep Neural Network Training

no code implementations • ICLR 2018 • Meng Li, Liangzhen Lai, Naveen Suda, Vikas Chandra, David Z. Pan

Massive data exist among user local platforms that usually cannot support deep neural network (DNN) training due to computation and storage resource constraints.

General Classification Image Classification +1

Paper
Add Code

Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations

no code implementations • 8 Mar 2017 • Liangzhen Lai, Naveen Suda, Vikas Chandra

To alleviate these problems to some extent, prior research utilize low precision fixed-point numbers to represent the CNN weights and activations.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.