Search Results for author: Youngeun Kwon

Found 7 papers, 0 papers with code

LazyDP: Co-Designing Algorithm-Software for Scalable Training of Differentially Private Recommendation Models

no code implementations • 12 Apr 2024 • Juntaek Lim, Youngeun Kwon, Ranggi Hwang, Kiwan Maeng, G. Edward Suh, Minsoo Rhu

Differential privacy (DP) is widely being employed in the industry as a practical standard for privacy protection.

Recommendation Systems

Paper
Add Code

Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards

no code implementations • 10 May 2022 • Youngeun Kwon, Minsoo Rhu

Prior work proposed to cache frequently accessed embeddings inside GPU memory as means to filter down the embedding layer traffic to CPU memory, but this paper observes several limitations with such cache design.

Recommendation Systems

Paper
Add Code

Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training

no code implementations • 25 Oct 2020 • Youngeun Kwon, Yunjae Lee, Minsoo Rhu

Personalized recommendations are one of the most widely deployed machine learning (ML) workload serviced from cloud datacenters.

Paper
Add Code

Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations

no code implementations • 12 May 2020 • Ranggi Hwang, Taehun Kim, Youngeun Kwon, Minsoo Rhu

Personalized recommendations are the backbone machine learning (ML) algorithm that powers several important application domains (e. g., ads, e-commerce, etc) serviced from cloud datacenters.

Paper
Add Code

NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units

no code implementations • 15 Nov 2019 • Bongjoon Hyun, Youngeun Kwon, Yujeong Choi, John Kim, Minsoo Rhu

To satisfy the compute and memory demands of deep neural networks, neural processing units (NPUs) are widely being utilized for accelerating deep learning algorithms.

Management Translation

Paper
Add Code

TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning

no code implementations • 8 Aug 2019 • Youngeun Kwon, Yunjae Lee, Minsoo Rhu

Recent studies from several hyperscalars pinpoint to embedding layers as the most memory-intensive deep learning (DL) algorithm being deployed in today's datacenters.

Recommendation Systems

Paper
Add Code

Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning

no code implementations • 18 Feb 2019 • Youngeun Kwon, Minsoo Rhu

As the models and the datasets to train deep learning (DL) models scale, system architects are faced with new challenges, one of which is the memory capacity bottleneck, where the limited physical memory inside the accelerator device constrains the algorithm that can be studied.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.