(Translated by https://www.hiragana.jp/)
[2406.15725] Self Training and Ensembling Frequency Dependent Networks with Coarse Prediction Pooling and Sound Event Bounding Boxes