UHNet: An Ultra-Lightweight and High-Speed Edge Detection Network

Fuzhang Li stoneLi20@163.com Chuan Lin

Abstract

Edge detection is crucial in medical image processing, enabling precise extraction of structural information to support lesion identification and image analysis. Traditional edge detection models typically rely on complex Convolutional Neural Networks and Vision Transformer architectures. Due to their numerous parameters and high computational demands, these models are limited in their application on resource-constrained devices. This paper presents an ultra-lightweight edge detection model (UHNet), characterized by its minimal parameter count, rapid computation speed, negligible of pre-training costs, and commendable performance. UHNet boasts impressive performance metrics with 42.3k parameters, 166 FPS, and 0.79G FLOPs. By employing an innovative feature extraction module and optimized residual connection method, UHNet significantly reduces model complexity and computational requirements. Additionally, a lightweight feature fusion strategy is explored, enhancing detection accuracy. Experimental results on the BSDS500, NYUD, and BIPED datasets validate that UHNet achieves remarkable edge detection performance while maintaining high efficiency. This work not only provides new insights into the design of lightweight edge detection models but also demonstrates the potential and application prospects of the UHNet model in engineering applications such as medical image processing. The codes are available at https://github.com/stoneLi20cv/UHNet.

keywords:

Edge detection, Ultra-lightweight, High-speed, Feature extraction, Feature fusion, Residual connection

\affiliation

[label1]organization=School of Electronic Engineering, Guangxi University of Science and Technology,city=Liuzhou, postcode=545006, state=Guangxi, country=China

\affiliation

[label2]organization=School of Automation, Guangxi University of Science and Technology,city=Liuzhou, postcode=545006, state=Guangxi, country=China

1 Introduction

Edge detection, a crucial foundational technique in computer vision, has had a profound impact on various medical image processing domains, such as X-rays, CT scans, and MRI images. These images contain rich structural information, and edges are significant manifestations of these structures. Accurate edge detection not only aids in the clear identification of pathological regions but also provides robust support for subsequent image analysis and diagnosis. Therefore, lightweight, fast, and high-performance edge detection models present increasingly complex challenges in practical deployment scenarios.

Specifically, convolutional neural networks (CNNs) [1], as a mainstream deep learning architecture, have seen numerous derivative algorithms applied to edge detection tasks. Traditional algorithms [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29] based on pre-trained models use series backbone networks like VGG [30] and ResNet [31] (often referred to as encoder networks), focusing on and developing decoder network structures. This results in a series of large-parameter, non-lightweight, and computationally intensive edge detection models, with most models operating at relatively slow speeds. Consequently, this leads to substantial GPU resource consumption and training time, making it difficult to deploy these models on low-computation edge devices.

Furthermore, Vision Transformer (ViT) [32, 33], leveraging self-attention mechanisms to model long-range dependencies in images, excel in understanding global context. Despite their superior performance over CNNs in some tasks, their higher computational complexity and memory requirements often limit their use in real-time applications or resource-constrained environments.

Thus, achieving both efficient training and inference while maintaining considerable detection performance is a challenging problem in edge detection tasks. Exploring this issue is essential for cost-effectively deploying efficient network models in real-world applications. A direct solution is designing lightweight detection models based on CNNs [1]. Lightweight network design, as an effective solution, provides new insights for reducing model complexity and computational load by optimizing network structures and reducing parameter numbers. However, despite some progress in lightweight network design for edge detection tasks, there remains a trade-off between achieving ultra-high speed and maintaining high accuracy. Given the complexity and diversity of images, how to achieve high-precision edge detection while ensuring ultra-high processing speed is the core issue this paper focuses on.

This paper proposes an ultra-lightweight network model with minimal parameters, extremely fast computation speed, no pre-training cost, and considerable performance, aimed at detecting target edges. Our innovative work primarily focuses on the following four aspects:

(1)

Proposing an ultra-lightweight feature extraction module, PDDP block. This module evolves from the Bottleneck structure in ResNet [31], extracting and integrating target edge features in images with few parameters and high speed.
(2)

Replacing the original 1x1 convolution for channel transformation with max pooling (MaxPool) and average pooling (AvgPool) operations in different stages of the backbone network, further reducing computation while enhancing feature diversity.
(3)

Exploring lightweight fusion methods for features between different stages, improving edge detection accuracy through effective feature fusion strategies with fewer parameters compared to other fusion methods.
(4)

Experiments demonstrate that the proposed ultra-lightweight network model (UHNet) with minimal parameters (42.3k), high speed (166 FPS), and low FLOPs (0.79G) shows strong competitiveness across multiple public datasets.

The rest of this paper is structured as follows: Section 2 explains the work related to edge detection. In Section 3, we describe our proposed method in detail. In Section 4, we conduct detailed experimental verification and comparative analysis of the proposed method on three datasets: BSDS500, NYUD, and BIPED. In Section 5, we summarize the entire paper and discuss directions worthy of further exploration in this paper.

2 Related Work

With the development of deep learning in computer vision, deep learning-based edge detection methods have achieved significant progress. Xie et al. proposed the first CNN-based edge detection model, HED [2], using VGG16 [30] as the backbone network, demonstrating strong capabilities in extracting target edges from RGB images. Subsequently, numerous edge detection methods emerged, employing VGG [30], or ResNet [31] as backbone networks. These methods [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29] use transfer learning, pre-training on the ImageNet dataset, followed by fine-tuning on specialized edge detection datasets to further enhance performance. Compared to CNN-based detection models, edge detection methods based on Transformer architectures, like DPED [34] and EDTER [35], though performing well, have large parameter numbers leading to higher computational resource demands, posing deployment challenges and cost concerns in practical applications.

Addressing the challenges posed by large models, designing lightweight architectures becomes key to improving edge detection efficiency. Some existing studies [36, 37, 38, 39, 40] have optimized encoding and decoding network structures or introduced learnable differential convolution operators to reduce model complexity and computational requirements while maintaining detection accuracy. PiDiNet [37] proposed an innovative differential convolution operator, dynamically adjusting convolution kernel parameters to better capture edge features, achieving high-performance edge detection under lightweight conditions with a novel network architecture. DexiNed [38] abandoned the traditional paradigm relying on large-scale dataset pre-training, demonstrating excellent detection performance directly from edge detection datasets through carefully designed network structures and training strategies, offering new insights into lightweight edge detection model design. Inspired by biological vision pathways, research like LNRFM [36], BLEDNet [39] and XYW-Net [40] constructed lightweight and efficient edge detection network models by simulating hierarchical structures of the visual system, reducing computational resource consumption and enhancing adaptability to complex scenes.

However, existing lightweight models commonly face two key issues: 1) The network model design heavily relies on valuable personal experience; 2) Model structures are fixed and single, lacking flexibility, making it difficult to directly transfer and extend to other research tasks. Addressing these issues, as shown in PiDiNet’s related experiments [37], differential convolution operators do not always perform positively in all convolution layers. Similarly, models such as LNRFM [36], BLEDNet [39] and XYW-Net [40], which are constructed by simulating biological visual mechanisms, also rely on a deep understanding of the information processing processes in the visual system. However, the performance of network models simply constructed by simulating the physiological mechanisms of the visual system may even be inferior to traditional methods such as HED [2]. Moreover, fixed and single network structures, like LNRFM [36], DexiNed [38], BLEDNet [39], and XYW-Net [40], cannot form diverse, general-purpose foundational models like VGG [30] and ResNet [31] series. Unlike these models, we prioritize simplicity, transferability, and scalability in network design, and proposing an ultra-lightweight network model (UHNet) with minimal parameters, high processing speed, and considerable performance, aiming for efficient detection and acquisition of target edges in images.

3 Method

3.1 PDDP Block

Refer to caption — Fig. 1: (a) Bottleneck structure in ResNet [31]; (b) Lightweight Bottleneck structure; (c) PDDP block.

A common lightweight strategy is using Depthwise Separable Convolution to replace standard convolution operations. This convolution method decomposes standard convolution into depthwise and pointwise convolutions, significantly reducing computational load and parameter count. Based on this strategy, the 3x3 standard convolution kernel in the Bottleneck structure of ResNet series networks [31] (Fig. 1.(a)) is replaced by the depthwise convolution in Depthwise Separable Convolution, resulting in a lightweight Bottleneck structure (Fig. 1.(b)).

The receptive field is particularly important in convolutional neural networks as it directly relates to feature extraction effectiveness. With the increase in network layers, deeper neurons can see larger input areas, capturing more contextual information and improving the model’s representation and generalization capabilities. In the lightweight Bottleneck structure (Fig. 1.(b)), the receptive field size is 3x3. To capture more contextual information, we consider increasing the receptive field size by adding another depthwise convolution layer, slightly increasing parameters (Fig. 1.(c)). This structure is called a PDDP block.

Assuming the number of input channels is 32 and the number of output channels in the final 1x1 convolution layer is 64 (other convolution layers have 32 output channels), and ignoring Norm+ReLU layers, the parameter counts for the three structures in Fig. 1 are: for the Bottleneck structure in ResNet [31], the parameter count is 12288; for the lightweight structure (Fig. 1(b)), the parameter count is 3360; and for the structure with an added depthwise convolution layer to enlarge the receptive field (Fig. 1(c)), the parameter count is 3648. Increasing the receptive field by adding a small number of parameters can improve detection performance while minimally impacting the network’s processing speed. We believe this is significant for detecting and acquiring edge information of targets in images.

3.2 UHNet Architecture

Fig. 2 shows the proposed model structure, mainly composed of the backbone network on the left and the feature decoding network on the right. The backbone network is divided into three stages, separated by PoolBlock between every two stages. Each stage contains four consecutive PDDP blocks, with the first 1x1 convolution layer in the first stage responsible for initial channel transformation of the input image.

In the PoolBlock, the Fusion layer decides the feature fusion method based on the channel numbers of two adjacent stages. Specifically, there are two cases: 1) when the channel numbers of the two adjacent stages are the same, the Fusion layer fuses features by addition; 2) when the channel number of the latter stage is twice that of the former stage, the Fusion layer fuses features by concatenation.

In the feature decoding network, the deep features of the latter stage are first processed by the FBlock, then directly added to the features of the former stage. This process is sequentially applied to three different stages, fusing the output features of each stage to finally obtain the edge detection output.

4 Experiment

4.1 Experimental Details

We evaluated our model using three widely adopted public datasets: BSDS500 [41], NYUD [42], and BIPED [43].

The BSDS500 dataset [41] comprises 500 images, divided into 200 training images, 100 validation images, and 200 testing images. Each image is annotated by 4 to 9 annotators to ensure accuracy and diversity. To enhance the model’s generalization ability, we followed the data augmentation methods from [2, 37, 40], applying flipping, scaling, and rotation to the training images, expanding the training set by 96 times. Additionally, to further enrich the training data’s diversity and quantity, we integrated the PASCAL VOC dataset [44], which includes 10K labeled images, and augmented it through flipping to 20K images. Ultimately, we obtained a new dataset, BSDS-VOC, containing a total of 48,548 images, providing a richer and more comprehensive data foundation for training edge detection models.

The NYUD dataset [42] includes 1449 pairs of aligned RGB and depth images, all of which have been densely annotated. The dataset is split into 381 training images, 414 validation images, and 654 testing images. For data augmentation, we merged the training and validation sets and applied flipping (2x), scaling (3x), and rotation (4x) to obtain a more diverse training subset.

The BIPED dataset [43] consists of 250 outdoor images with a resolution of 1280×720, each with expert-provided edge annotations. Following the method [43, 37, 40], we used 200 images for training and the remaining 50 for testing.

To evaluate model performance, we employed metrics such as Optimal Dataset Scale (ODS), Optimal Image Scale (OIS), and Average Precision (AP) to comprehensively assess the model’s accuracy and effectiveness. Additionally, to analyze the model’s computational efficiency and size, we introduced the concepts of Floating Point Operations (FLOP) and parameter count as evaluation metrics. The number of parameters directly reflects the model size, while FLOP measures the computational workload during data processing. We also considered the model’s Frames Per Second (FPS) as an essential performance metric to evaluate overall efficiency.

We validated the proposed algorithm using the PaddlePaddle deep learning framework [45] on a computer with 32GB RAM, an NVIDIA GeForce RTX 4090 D 24GB GPU, and an Intel 12th Gen Core i5-12600KF CPU. The parameter settings included the AdamW optimizer [46], 15 iterations, a learning rate (lr) of 0.001, a batch size of 1, and a cross-entropy loss function. For the BSDS500 [41] and BIPED [43] datasets, the maximum allowable error between predicted results and ground truth during Non-Maximum Suppression (NMS) was set to 0.0075. For the NYUD [42] dataset, this value was set to 0.011.

4.2 Ablation Study

We conducted a comprehensive ablation study and analysis of the proposed UHNet on the BSDS-VOC dataset. Notably, in all ablation studies, the number of channels in all three stages was set to 32.

Table 1: Performance analysis of different structures. Ns describes the number of standard convolution kernels, Nd describes the number of depthwise convolution kernels.

Architecture	Ns	Nd	Params	ODS	OIS	AP
RB1	1	0	11.3k	.770	.791	.823
RB2	2	0	20.5k	.780	.802	.834
LB	0	1	2.3k	.779	.801	.835
PDDP	0	2	2.6k	.784	.804	.840

As shown in Table 1 and Fig. 3, we verified the impact of the ResNet Bottleneck structure [31] (RB), Lightweight Bottleneck (LB), and PDDP block (PDDP) on model performance through relevant experiments. In this experiment, we focused on three performance aspects: 1) convolution type (standard convolution vs. depthwise convolution); 2) the number of convolution kernels; 3) convolution kernel size. We primarily considered four performance metrics: parameter count (Params), ODS, OIS, and AP, to conduct experimental verification, comparison, and analysis. The results showed that, without additional pre-training, the performance of a single depthwise convolution (LB) significantly outperformed a single standard convolution (RB1) and was almost equivalent to two standard convolutions (RB2). However, the parameter count of LB was only 20.4% of RB1 and 11.2% of RB2. Adding another depthwise convolution layer to the LB to form the PDDP structure resulted in improved ODS, OIS, and AP performance, as shown in Table 1, with only a 0.3k increase in parameters. We further increased the depthwise convolution kernel size from 3×3 to 5×5 on the LB to verify its impact on model performance. As shown in Fig. 3, when the depthwise convolution kernel was 5×5 (single), the ODS was 0.782, lower than the PDDP’s 0.784, with a 0.2k increase in parameter count to 2.8k. This finding demonstrates the effectiveness and potential of the PDDP block in lightweight network design.

In ResNet [31], the network is divided into multiple stages, each containing a series of residual blocks. For the first residual block between adjacent stages, a 1×1 convolution layer is typically used to adjust the input channels to ensure dimension matching for shortcut connections. In lightweight network design, the goal is usually to reduce parameter count or improve computational efficiency, but using a 1×1 convolution layer to adjust input channels increases the parameter count. As shown in Fig. 4, using a 1×1 convolution layer (Shortcut 1×1) increased the parameters without surpassing the performance of our proposed PoolBlock. Therefore, in lightweight network design for edge detection, it is effective to omit the use of a 1×1 convolution layer to adjust the channels for the first residual block between adjacent stages.

Table 2: Comparison analysis of different feature fusion methods. ”

\times

” indicates no FBlock processing, ”

\checkmark

” indicates FBlock processing.

X1	X2	ODS	OIS	AP
$\times$	$\checkmark$	.784	.804	.840
$\checkmark$	$\checkmark$	.784	.805	.840

Feature Fusion aims to integrate different feature information into an effective feature representation to enhance the model’s performance and understanding. How to fuse the output features of different stages in the backbone network is a critical element. For the output features of adjacent stages, we tested two different feature fusion methods. As shown in Table 2, X1 is the output feature of the previous stage, and X2 is the output feature of the next stage. The output feature X2 needs to be processed by the FBlock before feature fusion with X1. The experimental results in Table 2 show that processing only the output feature X2 of the subsequent stage with FBlock achieves performance comparable to processing both X1 and X2 with FBlock. Therefore, in lightweight network design, not processing the output feature X1 of the previous stage can reduce the model parameters without significant performance loss.

4.3 Comparison with Other Models

The proposed method aims to achieve parameter efficiency and substantial detection performance. To evaluate its effectiveness, we compared it with two types of models: non-lightweight methods and lightweight methods. Non-lightweight methods include: HED [2], RCF [7], CED [8], DRNet [16], LRNet [17], BDCN [12], CATS [23], DexiNed [38], EDTER [35], DPED [34], CHRNet [27]. Lightweight methods include: PiDiNet [37], TIN2 [47], FINED [21], BDCN2 [12], BLEDNet [39], XYW-Net [40].

In this paper, we present three different versions of experimental results: UHNet, UHNet-M, and UHNet-L. Their parameter counts increase sequentially, determined by the number of channels in the backbone network’s different stages: UHNet has 32, 32, 32 channels in its three stages; UHNet-M has 32, 64, 128 channels; and UHNet-L has 64, 128, 256 channels.

Table 3: Comparison with other methods on BSDS500 dataset.

Method	ODS	OIS	AP
HED [2]	.788	.808	.840
RCF [7]	.806	.823	-
BDCN [12]	.828	.844	.890
DRNet [16]	.817	.832	.836
LRNet [17]	.820	.838	.874
EDTER [35]	.832	.847	.886
EPED [34]	.823	.840	.832
CHRNet [27]	.787	.788	-
TIN2 [47]	.772	.795	-
FINED [21]	.790	.808	-
PiDiNet-Baseline [37]	.798	.816	-
PiDiNet [37]	.807	.823	-
PiDiNet-Tiny [37]	.789	.806	-
PiDiNet-Tiny-L [37]	.787	.804	-
BDCN2 [12]	.766	.787	-
BLEDNet [39]	.805	.823	.851
XYW-Net [40]	.812	.827	.873
UHNet	.784	.804	.840
UHNet-M	.793	.814	.849
UHNet-L	.798	.818	.840

Table 4: Comparison with other methods on NYUD dataset.

Method	Input	ODS	OIS	AP
HED [2]	RGB	.717	.732	.704
	HHA	.681	.685	.674
	RGB-HHA	.741	.757	.749
BDCN [12]	RGB	.748	.763	.770
	HHA	.707	.719	.731
	RGB-HHA	.765	.781	.813
TIN2 [47]	RGB	.729	.745	-
	HHA	.705	.722	-
	RGB-HHA	.753	.773	-
PiDiNet-Tiny [37]	RGB	.721	.736	-
	HHA	.700	.714	-
	RGB-HHA	.745	.763	-
PiDiNet-Tiny-L [37]	RGB	.714	.729	-
	HHA	.693	.706	-
	RGB-HHA	.741	.759	-
CHRNet [27]	RGB	.730	.737	-
	HHA	.710	.719	-
	RGB-HHA	.757	.769	-
BLEDNet [39]	RGB	.730	.747	.716
	HHA	.710	.728	.698
	RGB-HHA	.757	.775	.772
XYW-Net [40]	RGB	.730	.747	-
	HHA	.701	.715	-
	RGB-HHA	.756	.775	-
UHNet	RGB	.694	.713	.683
	HHA	.681	.699	.671
	RGB-HHA	.727	.747	.740
UHNet-m	RGB	.704	.721	.693
	HHA	.690	.706	.674
	RGB-HHA	.734	.753	.745
UHNet-L	RGB	.707	.724	.698
	HHA	.692	.709	.683
	RGB-HHA	.735	.755	.748

Table 5: Comparison with other methods on BIPED dataset.

Method	ODS	OIS	AP
HED [2]	.829	.847	.869
RCF [7]	.849	.861	.906
CED [8]	.795	.815	.830
BDCN [12]	.890	.899	.934
CATS [23]	.887	.892	.817
DexiNed-f [38]	.895	.900	.927
DexiNed-a [38]	.893	.897	.940
XYW-Net [40]	.887	.896	.925
UHNet	.882	.894	.931
UHNet-M	.889	.900	.926
UHNet-L	.892	.903	.910

To verify the performance of the three different versions of UHNet, we conducted quantitative and qualitative experiments on the BSDS500 [41], NYUD [42], and BIPED [43] datasets. The experimental results are given in Tables 3, 4, and 5, and compared with other network models. The experimental data shows that UHNet has the smallest parameter count among all current deep learning-based algorithms (Table 6), with only 42.3k, nearly 42% less than PiDiNet-Tiny-L’s [37] 73.0k, and achieves comparable performance with PiDiNet-Tiny-L [37] in the OIS metric on the BSDS500 [41] dataset. Additionally, on the BIPED [43] dataset, UHNet-M, with 232.9k parameters, significantly outperforms XYW-Net’s [40] 0.79M parameters. Notably, the ultra-lightweight version of UHNet is highly competitive in the OIS and AP metrics compared to other methods, both lightweight and non-lightweight. Although UHNet’s performance on the NYUD [42] dataset is slightly inferior to other models, especially the state-of-the-art models, overall, our proposed UHNet detection model remains sufficiently competitive in terms of parameter count and computational complexity on the BSDS500 [41] and BIPED [43] datasets, which is the focus of our research. Fig. 5 presents the qualitative detection results of the UHNet series models compared to other methods.

Table 6: Comparison of different methods in FLOPs and FPS performance metrics. Calculated based on 200×200 images.

Method	HED [2]	CED [8]	BDCN [12]	DPED [34]
Params	14.7M	21.8M	16.3M	67.9M
FLOPs(G)	24.3	60.8	37.0	63.4
FPS	54	22	23	5
Method	FINED [21]	TIN2 [47]	PiDiNet [37]	BLEDNet [39]
Params	1.4M	240k	720.0k	440.0k
FLOPs(G)	22.8	10.0	6.5	3.4
FPS	26	47	42	50
Method	XYW-Net [40]	UHNet	UHNet-M	UHNet-L
Params	790.0k	42.3k	232.9k	873.4k
FLOPs(G)	6.3	0.79	1.83	6.69
FPS	27	166	151	136

We present in Table 6 the comparison of different methods on model size (Params), FLOPs, and FPS performance metrics. These experimental data were calculated based on 200×200 input images. The data shows that our proposed UHNet series detection models achieve high FPS and low FLOPs, particularly UHNet, which, while maintaining competitive ODS, OIS, and AP performance metrics, achieves 0.79 FLOPs and 166 FPS, significantly outperforming all other methods.

5 Conclusion and Discussion

This paper proposes an ultra-lightweight edge detection model with minimal parameters and fast speed. The smallest version of UHNet demonstrates strong competitiveness on the BSDS500 [41] and BIPED [43] datasets with 42.3k parameters, 166 FPS, and 0.79G FLOPs. Our contributions to lightweight edge detection model design are threefold: first, based on the Bottleneck structure in the ResNet network [31], we propose an ultra-lightweight feature extraction module (PDDP block) with advantages such as fewer parameters, scalability, and transferability. Second, we optimized the residual connection between different stages in the backbone network, eliminating the need for 1×1 convolutions for channel transformation, overcoming this limitation. Third, for the output features of different stages, we explored more lightweight feature fusion methods that achieve comparable performance with other feature fusion methods while further reducing parameter count. We conducted extensive edge detection experiments on the BSDS500 [41], NYUD [42], and BIPED [43] datasets. We believe that the UHNet series models have very strong competitiveness in terms of accuracy and efficiency.

Additionally, this study highlights several promising directions for further exploration. First, integrating traditional edge detectors (e.g., PiDiNet [37]) or drawing on biological visual physiological mechanisms (e.g., BLEDNet [39], XYW-Net [40]) with convolutional neural networks can achieve robust and accurate edge detection. Second, the loss functions used for edge detection tasks are almost all borrowed from other computer vision tasks, with certain limitations. Future work can explore loss functions more suitable for edge detection tasks to extract effective target edge information from abundant texture information. Third, the combination of lightweight network design and training with limited data samples from scratch is a key research focus in edge detection. Exploring more efficient lightweight network models based on lightweight network design and training with limited data samples, and drawing on efficient methods from other computer vision tasks (e.g., lightweight ViT [32, 33], Mamba [48, 49, 50]), is also a worthwhile direction for research.

We hope that the proposal of UHNet provides more and newer insights into the design of ultra-lightweight edge detection network models. We also believe that efficient edge detection and extraction will play an even more significant role in engineering applications, particularly in medical image processing, as target edges serve as the foundation for other advanced visual tasks.

CRediT Authorship Contribution Statement

Fuzhang Li: Conceptualization, Methodology, Validation, Funding acquisition, Writing – original draft, Writing – review and editing. Chuan Lin: Conceptualization, Writing – review and editing, Funding acquisition.

Declaration of Interest Statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability

Data will be made available on request.

Acknowledgement

This work was supported by The Basic Ability Enhancement Program for Young and Middle-aged Teachers of Guangxi (2023KY0356), National Natural Science Foundation of China (Grant No.62266006), Guangxi Natural Science Foundation (Grant No.2020GXNSFDA297006) and National Natural Science Foundation of China (Grant No.61866002).

References

[1] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, G. Wang, J. Cai, et al., Recent advances in convolutional neural networks, Pattern recognition 77 (2018) 354–377.
[2] S. Xie, Z. Tu, Holistically-nested edge detection, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 1395–1403.
[3] I. Kokkinos, Pushing the boundaries of boundary detection using deep learning, arXiv preprint arXiv:1511.07386 (2015).
[4] K.-K. Maninis, J. Pont-Tuset, P. Arbeláez, L. Van Gool, Convolutional oriented boundaries, in: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, Springer, 2016, pp. 580–596.
[5] J. Yang, B. Price, S. Cohen, H. Lee, M.-H. Yang, Object contour detection with a fully convolutional encoder-decoder network, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 193–202.
[6] Y. Liao, S. Fu, X. Lu, C. Zhang, Z. Tang, Deep-learning-based object-level contour detection with ccg and crf optimization, in: 2017 IEEE International Conference on Multimedia and Expo (ICME), IEEE, 2017, pp. 859–864.
[7] Y. Liu, M.-M. Cheng, X. Hu, K. Wang, X. Bai, Richer convolutional features for edge detection, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3000–3009.
[8] Y. Wang, X. Zhao, K. Huang, Deep crisp boundaries, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3892–3900.
[9] D. Xu, W. Ouyang, X. Alameda-Pineda, E. Ricci, X. Wang, N. Sebe, Learning deep structured multi-scale features using attention-gated crfs for contour prediction, Advances in neural information processing systems 30 (2017).
[10] Z. Yu, C. Feng, M.-Y. Liu, S. Ramalingam, Casenet: Deep category-aware semantic edge detection, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5964–5973.
[11] R. Deng, C. Shen, S. Liu, H. Wang, X. Liu, Learning to predict crisp boundaries, in: Proceedings of the European conference on computer vision (ECCV), 2018, pp. 562–578.
[12] J. He, S. Zhang, M. Yang, Y. Shan, T. Huang, Bi-directional cascade network for perceptual edge detection, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3828–3837.
[13] A. P. Kelm, V. S. Rao, U. Zölzer, Object contour and edge detection with refinecontournet, in: Computer Analysis of Images and Patterns: 18th International Conference, CAIP 2019, Salerno, Italy, September 3–5, 2019, Proceedings, Part I 18, Springer, 2019, pp. 246–258.
[14] Y. Wang, X. Zhao, Y. Li, K. Huang, Deep crisp boundaries: From boundaries to higher-level tasks, IEEE Transactions on Image Processing 28 (3) (2018) 1285–1298.
[15] H. Yang, Y. Li, X. Yan, F. Cao, Contourgan: Image contour detection with generative adversarial network, Knowledge-Based Systems 164 (2019) 21–28.
[16] Y.-J. Cao, C. Lin, Y.-J. Li, Learning crisp boundaries using deep refinement network and adaptive weighting loss, IEEE Transactions on Multimedia 23 (2020) 761–771.
[17] C. Lin, L. Cui, F. Li, Y. Cao, Lateral refinement network for contour detection, Neurocomputing 409 (2020) 361–371.
[18] D. Linsley, J. Kim, A. Ashok, T. Serre, Recurrent neural circuits for contour detection, arXiv preprint arXiv:2010.15314 (2020).
[19] K. Li, Y. Tian, B. Wang, Z. Qi, Q. Wang, Bi-directional pyramid network for edge detection, Electronics 10 (3) (2021) 329.
[20] L. Gao, Z. Zhou, H. T. Shen, J. Song, Bottom-up and top-down: Bidirectional additive net for edge detection, in: Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, 2021, pp. 594–600.
[21] J. K. Wibisono, H.-M. Hang, Fined: Fast inference network for edge detection, in: 2021 IEEE International Conference on Multimedia and Expo (ICME), IEEE, 2021, pp. 1–6.
[22] M. Pu, Y. Huang, Q. Guan, H. Ling, Rindnet: Edge detection for discontinuity in reflectance, illumination, normal and depth, in: Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 6879–6888.
[23] L. Huan, N. Xue, X. Zheng, W. He, J. Gong, G.-S. Xia, Unmixing convolutional features for crisp edge detection, IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (10) (2021) 6602–6609.
[24] S.-S. Bao, Y.-R. Huang, G.-Y. Xu, Bidirectional multiscale refinement network for crisp edge detection, IEEE Access 10 (2022) 26282–26293.
[25] C. Lin, Z. Zhang, Y. Hu, Bio-inspired feature enhancement network for edge detection, Applied Intelligence 52 (10) (2022) 11027–11042.
[26] L. Zhou, C. Lin, X. Pang, H. Yang, Y. Pan, Y. Zhang, Learning parallel and hierarchical mechanisms for edge detection, Frontiers in Neuroscience 17 (2023) 1194713.
[27] O. Elharrouss, Y. Hmamouche, A. K. Idrissi, B. El Khamlichi, A. El Fallah-Seghrouchni, Refined edge detection with cascaded and high-resolution convolutional network, Pattern Recognition 138 (2023) 109361.
[28] Y. Ye, K. Xu, Y. Huang, R. Yi, Z. Cai, Diffusionedge: Diffusion probabilistic model for crisp edge detection, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, 2024, pp. 6675–6683.
[29] R. Xian, X. Xiong, H. Peng, J. Wang, A. R. de Arellano Marrero, Q. Yang, Feature fusion method based on spiking neural convolutional network for edge detection, Pattern Recognition 147 (2024) 110112.
[30] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556 (2014).
[31] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
[32] K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu, et al., A survey on vision transformer, IEEE transactions on pattern analysis and machine intelligence 45 (1) (2022) 87–110.
[33] S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, M. Shah, Transformers in vision: A survey, ACM computing surveys (CSUR) 54 (10s) (2022) 1–41.
[34] Y. Chen, C. Lin, Y. Qiao, Dped: Bio-inspired dual-pathway network for edge detection, Frontiers in Bioengineering and Biotechnology 10 (2022) 1008140.
[35] M. Pu, Y. Huang, Y. Liu, Q. Guan, H. Ling, Edter: Edge detection with transformer, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 1402–1412.
[36] Q. Tang, N. Sang, H. Liu, Learning nonclassical receptive field modulation for contour detection, IEEE Transactions on Image Processing 29 (2019) 1192–1203.
[37] Z. Su, W. Liu, Z. Yu, D. Hu, Q. Liao, Q. Tian, M. Pietikäinen, L. Liu, Pixel difference networks for efficient edge detection, in: Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 5117–5127.
[38] X. Soria, A. Sappa, P. Humanante, A. Akbarinia, Dense extreme inception network for edge detection, Pattern Recognition 139 (2023) 109461.
[39] Z. Luo, C. Lin, F. Li, Y. Pan, Blednet: Bio-inspired lightweight neural network for edge detection, Engineering Applications of Artificial Intelligence 124 (2023) 106530.
[40] X. Pang, C. Lin, F. Li, Y. Pan, Bio-inspired xyw parallel pathway edge detection network, Expert Systems with Applications 237 (2024) 121649.
[41] P. Arbelaez, M. Maire, C. Fowlkes, J. Malik, Contour detection and hierarchical image segmentation, IEEE transactions on pattern analysis and machine intelligence 33 (5) (2010) 898–916.
[42] N. Silberman, D. Hoiem, P. Kohli, R. Fergus, Indoor segmentation and support inference from rgbd images, in: Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V 12, Springer, 2012, pp. 746–760.
[43] X. S. Poma, E. Riba, A. Sappa, Dense extreme inception network: Towards a robust cnn model for edge detection, in: Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2020, pp. 1923–1932.
[44] R. Mottaghi, X. Chen, X. Liu, N.-G. Cho, S.-W. Lee, S. Fidler, R. Urtasun, A. Yuille, The role of context for object detection and semantic segmentation in the wild, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 891–898.
[45] Y. Ma, D. Yu, T. Wu, H. Wang, Paddlepaddle: An open-source deep learning platform from industrial practice, Frontiers of Data and Domputing 1 (1) (2019) 105–115.
[46] I. Loshchilov, F. Hutter, Decoupled weight decay regularization, arXiv preprint arXiv:1711.05101 (2017).
[47] J. K. Wibisono, H.-M. Hang, Traditional method inspired deep neural network for edge detection, in: 2020 IEEE international conference on image processing (ICIP), IEEE, 2020, pp. 678–682.
[48] A. Gu, T. Dao, Mamba: Linear-time sequence modeling with selective state spaces, arXiv preprint arXiv:2312.00752 (2023).
[49] L. Zhu, B. Liao, Q. Zhang, X. Wang, W. Liu, X. Wang, Vision mamba: Efficient visual representation learning with bidirectional state space model, arXiv preprint arXiv:2401.09417 (2024).
[50] W. Yu, X. Wang, Mambaout: Do we really need mamba for vision?, arXiv preprint arXiv:2405.07992 (2024).