(Translated by https://www.hiragana.jp/)
[2305.19602] Learning Music Sequence Representation from Text Supervision