(Translated by https://www.hiragana.jp/)
[2305.12311] i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data