no code implementations • 29 May 2024 • ChonLam Lao, Jiaqi Gao, Ganesh Ananthanarayanan, Aditya Akella, Minlan Yu
The trend of modeless ML inference is increasingly growing in popularity as it hides the complexity of model inference from users and caters to diverse user and application accuracy requirements.
no code implementations • 5 Apr 2024 • Ajay Jaiswal, Bodun Hu, Lu Yin, Yeonju Ro, Shiwei Liu, Tianlong Chen, Aditya Akella
In this work, we observed the saturation of computationally expensive feed-forward blocks of LLM layers and proposed FFN-SkipLLM, which is a novel fine-grained skip strategy of autoregressive LLMs.
1 code implementation • 12 Feb 2024 • Haoyu Li, Yuchen Xu, Jiayi Chen, Rohit Dwivedula, Wenfei Wu, Keqiang He, Aditya Akella, Daehyeok Kim
As deep neural networks (DNNs) grow in complexity and size, the resultant increase in communication overhead during distributed training has become a significant bottleneck, challenging the scalability of distributed training systems.
no code implementations • 13 Dec 2023 • Divyanshu Saxena, Nihal Sharma, Donghyun Kim, Rohit Dwivedula, Jiayi Chen, Chenxi Yang, Sriram Ravula, Zichao Hu, Aditya Akella, Sebastian Angel, Joydeep Biswas, Swarat Chaudhuri, Isil Dillig, Alex Dimakis, P. Brighten Godfrey, Daehyeok Kim, Chris Rossbach, Gang Wang
This paper lays down the research agenda for a domain-specific foundation model for operating systems (OSes).
no code implementations • 27 Oct 2023 • Bodun Hu, Le Xu, Jeongyoon Moon, Neeraja J. Yadwadkar, Aditya Akella
Rapid advancements over the years have helped machine learning models reach previously hard-to-achieve goals, sometimes even exceeding human capabilities.
no code implementations • 1 Aug 2023 • Sudarsanan Rajasekaran, Manya Ghobadi, Aditya Akella
We present CASSINI, a network-aware job scheduler for machine learning (ML) clusters.
no code implementations • 29 Oct 2022 • Jiachen Liu, Fan Lai, Yinwei Dai, Aditya Akella, Harsha Madhyastha, Mosharaf Chowdhury
In this paper, we explore an additional layer of complexity to mitigate such heterogeneity by grouping clients with statistically similar data distributions (cohorts).
no code implementations • 22 Jul 2022 • Tarannum Khan, Saeed Rashidi, Srinivas Sridharan, Pallavi Shurpali, Aditya Akella, Tushar Krishna
Our results indicate that previously proposed RoCE congestion control schemes have little impact on the end-to-end performance of training workloads, motivating the necessity of designing an optimized, yet low-overhead, congestion control scheme based on the characteristics of distributed training platforms and workloads.
no code implementations • 28 May 2022 • Chi Zhang, Olga Papaemmanouil, Josiah P. Hanna, Aditya Akella
Thus, the paper attempts to address the question "Is it possible to design a database consisting of various learned components that cooperatively work to improve end-to-end query latency?".
1 code implementation • 20 Nov 2021 • Adarsh Kumar, Kausik Subramanian, Shivaram Venkataraman, Aditya Akella
This simultaneously reduces network bandwidth, compute utilization, and memory footprint while preserving model quality.
no code implementations • 18 Jan 2021 • Arjun Balasubramanian, Adarsh Kumar, YuHan Liu, Han Cao, Shivaram Venkataraman, Aditya Akella
We present the design of GATI, an end-to-end prediction serving system that incorporates learned caches for low-latency DNN inference.
no code implementations • 7 Feb 2020 • Adarsh Kumar, Arjun Balasubramanian, Shivaram Venkataraman, Aditya Akella
In this work, we observe that caching intermediate layer outputs can help us avoid running all the layers of a DNN for a sizeable fraction of inference requests.