(Translated by https://www.hiragana.jp/)
[2405.14715] Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models