(Translated by https://www.hiragana.jp/)
[2407.01921] GVDIFF: Grounded Text-to-Video Generation with Diffusion Models