no code implementations • 12 Apr 2024 • Juntaek Lim, Youngeun Kwon, Ranggi Hwang, Kiwan Maeng, G. Edward Suh, Minsoo Rhu
Differential privacy (DP) is widely being employed in the industry as a practical standard for privacy protection.
no code implementations • 10 May 2022 • Youngeun Kwon, Minsoo Rhu
Prior work proposed to cache frequently accessed embeddings inside GPU memory as means to filter down the embedding layer traffic to CPU memory, but this paper observes several limitations with such cache design.
no code implementations • 25 Oct 2020 • Youngeun Kwon, Yunjae Lee, Minsoo Rhu
Personalized recommendations are one of the most widely deployed machine learning (ML) workload serviced from cloud datacenters.
no code implementations • 12 May 2020 • Ranggi Hwang, Taehun Kim, Youngeun Kwon, Minsoo Rhu
Personalized recommendations are the backbone machine learning (ML) algorithm that powers several important application domains (e. g., ads, e-commerce, etc) serviced from cloud datacenters.
no code implementations • 15 Nov 2019 • Bongjoon Hyun, Youngeun Kwon, Yujeong Choi, John Kim, Minsoo Rhu
To satisfy the compute and memory demands of deep neural networks, neural processing units (NPUs) are widely being utilized for accelerating deep learning algorithms.
no code implementations • 8 Aug 2019 • Youngeun Kwon, Yunjae Lee, Minsoo Rhu
Recent studies from several hyperscalars pinpoint to embedding layers as the most memory-intensive deep learning (DL) algorithm being deployed in today's datacenters.
no code implementations • 18 Feb 2019 • Youngeun Kwon, Minsoo Rhu
As the models and the datasets to train deep learning (DL) models scale, system architects are faced with new challenges, one of which is the memory capacity bottleneck, where the limited physical memory inside the accelerator device constrains the algorithm that can be studied.