(Translated by https://www.hiragana.jp/)
[2309.16265] Semantic Proximity Alignment: Towards Human Perception-consistent Audio Tagging by Aligning with Label Text Description