(Translated by https://www.hiragana.jp/)
[2407.01921v1] GVDIFF: Grounded Text-to-Video Generation with Diffusion Models