(Translated by https://www.hiragana.jp/)
[2402.07033] Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models