ODFormer: Semantic Fundus Image Segmentation Using Transformer for Optic Nerve Head Detection

Wang, Jiayi; Mao, Yi-An; Ma, Xiaoyu; Guo, Sicen; Shao, Yuting; Lv, Xiao; Han, Wenting; Christopher, Mark; Zangwill, Linda M.; Bi, Yanlong; Fan, Rui

Electrical Engineering and Systems Science > Image and Video Processing

arXiv:2405.09552v1 (eess)

[Submitted on 15 Apr 2024 (this version), latest version 2 Jun 2024 (v2)]

Title:ODFormer: Semantic Fundus Image Segmentation Using Transformer for Optic Nerve Head Detection

Authors:Jiayi Wang, Yi-An Mao, Xiaoyu Ma, Sicen Guo, Yuting Shao, Xiao Lv, Wenting Han, Mark Christopher, Linda M. Zangwill, Yanlong Bi, Rui Fan

View PDF HTML (experimental)

Abstract:Optic nerve head (ONH) detection has been an important topic in the medical community for many years. Previous approaches in this domain primarily center on the analysis, localization, and detection of fundus images. However, the non-negligible discrepancy between fundus image datasets, all exclusively generated using a single type of fundus camera, challenges the generalizability of ONH detection approaches. Furthermore, despite the numerous recent semantic segmentation methods employing convolutional neural networks (CNNs) and Transformers, there is currently a lack of benchmarks for these state-of-the-art (SoTA) networks trained specifically for ONH detection. Therefore, in this article, we first introduce ODFormer, a network based on the Swin Transformer architecture. ODFormer is designed to enhance the extraction of correlation information between features, leading to improved generalizability. In our experimental evaluations, we compare our top-performing implementation with 13 SoTA CNNs and Transformers. The results indicate that our proposed ODFormer outperforms all other approaches in ONH detection. Subsequently, we release TongjiU-DCOD dataset, the first multi-resolution mixed fundus image dataset with corresponding ONH ground-truth annotations. This dataset comprises 400 fundus images captured using two different types of fundus cameras with varying resolutions. This diversity enhances the availability of data regularity information, contributing to the improved generalizability of the model. Moreover, we establish a benchmark to thoroughly evaluate the performance for ONH detection of SoTA networks designed for semantic segmentation with extensive experiments.

Subjects:	Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2405.09552 [eess.IV]
	(or arXiv:2405.09552v1 [eess.IV] for this version)
	https://doi.org/10.48550/arXiv.2405.09552

Submission history

From: Rui Fan [view email]
[v1] Mon, 15 Apr 2024 11:49:37 UTC (1,358 KB)
[v2] Sun, 2 Jun 2024 10:49:47 UTC (1,445 KB)

Electrical Engineering and Systems Science > Image and Video Processing

Title:ODFormer: Semantic Fundus Image Segmentation Using Transformer for Optic Nerve Head Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Image and Video Processing

Title:ODFormer: Semantic Fundus Image Segmentation Using Transformer for Optic Nerve Head Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators