(Translated by https://www.hiragana.jp/)
[2407.03188] MuDiT & MuSiT: Alignment with Colloquial Expression in Description-to-Song Generation