Search Results for author: Keisuke Kamahori

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

Large Language Models (LLMs) based on Mixture-of-Experts (MoE) architecture are showing promising performance on various tasks.

144

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.