(Translated by https://www.hiragana.jp/)
[2407.05782] Sequential Contrastive Audio-Visual Learning