(Translated by https://www.hiragana.jp/)
[2407.08156] AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization