(Translated by https://www.hiragana.jp/)
[2303.01508] Fine-grained Emotional Control of Text-To-Speech: Learning To Rank Inter- And Intra-Class Emotion Intensities