(Translated by https://www.hiragana.jp/)
[2205.05586] End-to-End Multi-Person Audio/Visual Automatic Speech Recognition