(Translated by https://www.hiragana.jp/)
[2406.02547] Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning