(Translated by https://www.hiragana.jp/)
[2406.00038] ViSpeR: Multilingual Audio-Visual Speech Recognition