Multi-level Cross-modal Alignment for Image Clustering

Qiu, Liping; Zhang, Qin; Chen, Xiaojun; Cai, Shaotian

Computer Science > Computer Vision and Pattern Recognition

arXiv:2401.11740 (cs)

[Submitted on 22 Jan 2024]

Title:Multi-level Cross-modal Alignment for Image Clustering

Authors:Liping Qiu, Qin Zhang, Xiaojun Chen, Shaotian Cai

View PDF

Abstract:Recently, the cross-modal pretraining model has been employed to produce meaningful pseudo-labels to supervise the training of an image clustering model. However, numerous erroneous alignments in a cross-modal pre-training model could produce poor-quality pseudo-labels and degrade clustering performance. To solve the aforementioned issue, we propose a novel \textbf{Multi-level Cross-modal Alignment} method to improve the alignments in a cross-modal pretraining model for downstream tasks, by building a smaller but better semantic space and aligning the images and texts in three levels, i.e., instance-level, prototype-level, and semantic-level. Theoretical results show that our proposed method converges, and suggests effective means to reduce the expected clustering risk of our method. Experimental results on five benchmark datasets clearly show the superiority of our new method.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2401.11740 [cs.CV]
	(or arXiv:2401.11740v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2401.11740

Submission history

From: Xiaojun Chen Dr. [view email]
[v1] Mon, 22 Jan 2024 07:37:25 UTC (3,259 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Multi-level Cross-modal Alignment for Image Clustering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Multi-level Cross-modal Alignment for Image Clustering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators