(Translated by https://www.hiragana.jp/)
[2401.01578] Context-Guided Spatio-Temporal Video Grounding