(Translated by https://www.hiragana.jp/)
[2406.05615] Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives