(Translated by https://www.hiragana.jp/)
[2402.14797] Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis