Learning Music Sequence Representation from Text Supervision

Chen, Tianyu; Xie, Yuan; Zhang, Shuai; Huang, Shaohan; Zhou, Haoyi; Li, Jianxin

doi:10.1109/ICASSP43922.2022.9746131

Computer Science > Sound

arXiv:2305.19602 (cs)

[Submitted on 31 May 2023]

Title:Learning Music Sequence Representation from Text Supervision

Authors:Tianyu Chen, Yuan Xie, Shuai Zhang, Shaohan Huang, Haoyi Zhou, Jianxin Li

View PDF

Abstract:Music representation learning is notoriously difficult for its complex human-related concepts contained in the sequence of numerical signals. To excavate better MUsic SEquence Representation from labeled audio, we propose a novel text-supervision pre-training method, namely MUSER. MUSER adopts an audio-spectrum-text tri-modal contrastive learning framework, where the text input could be any form of meta-data with the help of text templates while the spectrum is derived from an audio sequence. Our experiments reveal that MUSER could be more flexibly adapted to downstream tasks compared with the current data-hungry pre-training method, and it only requires 0.056% of pre-training data to achieve the state-of-the-art performance.

Subjects:	Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2305.19602 [cs.SD]
	(or arXiv:2305.19602v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2305.19602
Journal reference:	IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022: 4583-4587
Related DOI:	https://doi.org/10.1109/ICASSP43922.2022.9746131

Submission history

From: Yuan Xie [view email]
[v1] Wed, 31 May 2023 07:15:06 UTC (5,112 KB)

Computer Science > Sound

Title:Learning Music Sequence Representation from Text Supervision

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Learning Music Sequence Representation from Text Supervision

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators