1 code implementation • 24 Jun 2023 • Daniel Zou, Xinchen Jin, Xueyang Yu, Hao Zhang, James Demmel
In anticipation of workloads that involve serving many of such large models to handle different tasks, we develop Computron, a system that uses memory swapping to serve multiple distributed models on a shared GPU cluster.