(Translated by https://www.hiragana.jp/)
FedDBL: Communication and Data Efficient Federated Deep-Broad Learning for Histopathological Tissue Classification

HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: arydshln

Authors: achieve the best HTML results from your LaTeX submissions by selecting from this list of supported packages.

License: CC BY 4.0
arXiv:2302.12662v2 [eess.IV] 18 Dec 2023

FedDBL: Communication and Data Efficient Federated Deep-Broad Learning for Histopathological Tissue Classification

Tianpeng Deng{}^{\dagger}start_FLOATSUPERSCRIPT † end_FLOATSUPERSCRIPT, Yanqi Huang{}^{\dagger}start_FLOATSUPERSCRIPT † end_FLOATSUPERSCRIPT, Zhenwei Shi{}^{\dagger}start_FLOATSUPERSCRIPT † end_FLOATSUPERSCRIPT, Jiatai Lin, Qi Dou, Zaiyi Liu, Xiao-jing Guo, C. L. Philip Chen, Chu Han This work was supported by Key-Area Research and Development Program of Guangdong Province (No. 2021B0101420006), Regional Innovation and Development Joint Fund of National Natural Science Foundation of China (No. U22A20345), National Science Foundation for Young Scientists of China (No. 62102103), Natural Science Foundation for Distinguished Young Scholars of Guangdong Province (No. 2023B1515020043). (Equal contribution: Tianpeng Deng, Yanqi Huang and Guoqiang Han.)(Corresponding author: Xiao-jing Guo, C. L. Philip Chen and Chu Han.)Tianpeng Deng, Jiatai Lin, Guoqiang Han and C. L. P. Chen are with the School of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong, 510006, China.Yanqi Huang, Zhenwei Shi, Zaiyi Liu and Chu Han are with the Department of Radiology, Guangdong Provincial People’s Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou 510080, China and Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangzhou 510080, China.Qi Dou is with the Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, 999077, China.Xiao-jing Guo is with the Department of Breast Pathology and Lab, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center of Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, Tianjin’s Clinical Research Center for Cancer, Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin 300060, China.
Abstract

Histopathological tissue classification is a fundamental task in computational pathology. Deep learning-based models have achieved superior performance but centralized training with data centralization suffers from the privacy leakage problem. Federated learning (FL) can safeguard privacy by keeping training samples locally, but existing FL-based frameworks require a large number of well-annotated training samples and numerous rounds of communication which hinder their practicability in the real-world clinical scenario. In this paper, we propose a universal and lightweight federated learning framework, named Federated Deep-Broad Learning (FedDBL), to achieve superior classification performance with limited training samples and only one-round communication. By simply associating a pre-trained deep learning feature extractor, a fast and lightweight broad learning inference system and a classical federated aggregation approach, FedDBL can dramatically reduce data dependency and improve communication efficiency. Five-fold cross-validation demonstrates that FedDBL greatly outperforms the competitors with only one-round communication and limited training samples, while it even achieves comparable performance with the ones under multiple-round communications. Furthermore, due to the lightweight design and one-round communication, FedDBL reduces the communication burden from 4.6GB to only 276.5KB per client using the ResNet-50 backbone at 50-round training. Since no data or deep model sharing across different clients, the privacy issue is well-solved and the model security is guaranteed with no model inversion attack risk. Code is available at https://github.com/tianpeng-deng/FedDBL.

Index Terms:
Federated Learning, Data and Communication Efficiency, Deep-Broad Learning, Histopathological Tissue Classification

I Introduction

Tissue classification [gurcan2009histopathological, FUCHS2011515], also known as tissue phenotyping, aims to use computer algorithms to automatically recognize different tissue types in the Whole Slide Images (WSIs). It is one of the fundamental tasks in computational pathology [srinidhi2021deep, wang2019weakly] which can parse the landscape of tumor microenvironment for precise predictions of cancer diagnosis [bulten2022artificial], prognosis [fu2020pan, pages2018international] and treatment response [vanguri2022multimodal]. With the advancements of deep learning algorithms and the growing number of open data [kather2019predicting, zhao2020artificial, yang2022two], this problem has been well studied with outstanding classification performance [hatami2021deep]. In clinical practice, however, it still faces ethical, regulatory and legal obstacles where centralized data collection may lead to privacy leakage, especially the RAW data.

Federated Learning (FL) [yang2019federated, le2021federated] framework provides a promising solution to protect user privacy by only sharing the intermediate results or the model parameters instead of the raw data, which has been widely studied in medical image analysis [pati2022federated, sheller2020federated]. But only very few attempts [saldanha2022swarm, shen2022tmi, ke2021isbi] have been made in computational pathology and the research progress still lags behind other medical image modalities [rauniyar2022federated] due to the following two obstacles.

The first one is the data dependency problem. Since most of the existing FL frameworks are constructed based on deep learning models. They are data-hungry and commonly require a large amount of well-annotated samples. However, labeling histopathological images is time-consuming, expertise-dependent and expensive [greenwald2022whole, pati2021reducing]. When without enough training samples, existing models may not achieve favorable performance. Another obstacle is the communication overhead. The training procedure of traditional FL models needs multiple cloud-client iterations to achieve global convergence. However, deep learning models are with tens of millions of parameters, which greatly increases the communication burden when with multiple communication rounds. Although some recent works use self-supervised learning SSL methods with unlabelled data to reduce the demand of labelled data [yan2023label, kassem2022federated], they still need multiple communication rounds to train a stable domain-specific pre-trained model. Lack of training samples may further amplify the communication burden because deep learning models commonly require more iterations to converge when with limited training samples. Moreover, frequent communications may increase the chance of being attacked, such as man-in-the-middle attacks [wang2020man].

Therefore, it is urged to construct a data-efficient and communication-efficient FL model for histopathological tissue classification. In this paper, we proposed a simple and effective solution for histopathological tissue classification, which considers not only the data sharing problem, but also the data dependency, communication efficiency, model robustness and model inversion attack. Our proposed model Federated Deep-Broad Learning, FedDBL in short, contains three integrated components, including a common federated learning framework, a pre-trained deep learning (DL) backbone and a broad learning (BL) inference system [BLS, gong2022research]. The federated learning framework serves for decentralized training to avoid data sharing across different medical centers or institutions. The pre-trained DL backbone can provide stable and robust deep features when there are not enough training labels even with domain-irrelevant pre-trained DL backbone. It can also effectively avoid the model inversion attack since no back-propagation is calculated for gradients [zhu2019deep]. The BL system is a lightweight classifier with good approximation capability which can greatly shorten the transmission time and overcome the data dependency problem. Fig. 1 comprehensively demonstrates the strengths of FedDBL compared with the centralized learning and the conventional federated learning ways.

Refer to caption
Figure 1: Overall comparison among centralized training, traditional DL-based FL and our proposed FedDBL paradigms. (a) Centralized learning gathers data from all the clients which cannot protect the patient’s privacy. (b) Traditional DL-based FL preserves privacy by transmitting the model parameters to the central server without sharing the raw data. However, the communication overload highly depends on the model size and the number of communication rounds. (c) Our proposed FedDBL not only protects privacy, but also dramatically saves the communication burden through a super lightweight trainable broad learning system.

Extensive experiments with five-fold cross-validation are conducted to demonstrate the superiority of FedDBL in several aspects, including data dependency, communication efficiency, flexibility and the practicability of the model encryption. When with enough training data, FedDBL can mostly outperform conventional FL strategies and achieve comparable or even better classification performance compared with centralized learning strategy. When reducing the training samples in the data dependency experiment, FedDBL still maintains a high-level performance and greatly outperforms both centralized learning and conventional FL frameworks, even with only 1% training samples. FedDBL is also flexible to any deep learning architectures to support data- and communication-efficient histopathological tissue classification. Another spotlight of FedDBL is communication efficiency. Compared with the conventional FL frameworks, FedDBL’s one-round training manner reduces the upload workload from 4.609GB to 276.5KB (over 17,000 times faster) with ResNet-50 backbone compared to traditional 50-round iterative training. Thanks to the tiny model size, FedDBL is also computationally efficient in model encryption which can further upgrade the privacy protection level. The main contributions of this paper can be summarized as follows:

  • We propose a novel federated learning approach (FedDBL) for histopathological tissue classification to preserve patients’ privacy.

  • To the best of our knowledge, FedDBL is the first study that considers communication efficiency and data efficiency simultaneously which reduces the communication overhead of each client by around 17,000 times on ResNet-50 with extremely limited training samples (only 1%).

  • FedDBL is a simple, effective and easy-to-use algorithm that associates three classical modules, including a robust pre-trained deep learning feature extractor, a fast broad learning inference system and a simple federated learning framework. It is highly extendable that allows to replace any module with a more advanced one.

  • Extensive experiments demonstrate that FedDBL drastically relieves the dependence on training data and reduces the communication overhead while maintaining outstanding classification performance, which promotes its clinical practicability.

II Related Works

II-A Histopathological Tissue Classification

High-resolution WSIs offer a wide range of tissue phenotypes where the pixel-level annotation is time-consuming and requires a great deal of biomedical knowledge ([srinidhi2021deep]), making patch-level histopathological tissue classification an alternate solution for automated analysis in computer-aided tumor diagnosis ([kather2019predicting, xue2021selective, abdeltawab2021pyramidal]).

Due to the rapid development of computer vision, the most popular natural image classification models can be transferred into histopathological tissue phenotyping. However, it still suffers from the data dependency problem with a huge annotation burden ([ayyad2021role]). Thus, various approaches have been proposed to reduce the annotation effort. [han2022multi] proposed a multi-layer pseudo-supervision approach with a progressive dropout attention mechanism to convert patch-level labels into pseudo-pixel-level labels. An extra classification gate mechanism was presented which reduced the false-positive rate for non-predominant category classification and improved the segmentation performance in return. [xue2021selective] utilized a generative adversarial network (GAN) to generate pseudo samples to expand the training data. [dolezal2022uncertainty] cropped WSIs into tiles for training the uncertainty quantization model and solved the problem of domain shift in external validation data. In order to get rid of lacking image annotations, [wang2022transformer] employed unsupervised contrastive learning to obtain a robust initialized model with moderate feature representation of the histopathological feature space, with no annotation burden. Our previous study ([lin2022pdbl]) introduced pyramidal deep-broad learning (PDBL) as a pluggable module for any CNN backbone to further improve histopathological tissue classification performance.

Besides that, another unexplored challenge is the patient privacy issue. Only a few attempts ([saldanha2022swarm, saldanha2022direct]) have been made in federated learning for computational pathology, which will be discussed in the following subsection. And to the best of our knowledge, we are the first study to consider privacy protection in histopathological tissue classification.

II-B Federated Learning

II-B1 Federated Learning in Medical Image Analysis

Because of the ethical issue, federated learning (FL) has been widely adopted in medical applications to preserve the patients’ privacy ([pati2022federated, warnat2021swarm, sheller2020federated]). In medical imaging, FL has witnessed a boost in interest ([kaissis2020secure]), such as MRI reconstruction ([guo2021multi, li2020multi]), CT lesion segmentation ([yang2021federated]) and etc. In the COVID-19 pandemic, COVID-19-related applications with data from different medical centers or even from different countries become the most urgent demand in the real-world clinical scenario while FL greatly advances the diagnostic performance ([bai2021advancing]). [dayan2021federated] used 20 institutes’ data across the global for predicting the future oxygen requirements of symptomatic patients suffering from COVID-19. [dou2021federated] proposed a federated model to detect COVID-19 lung abnormalities with good generalization capability on unseen multinational datasets.

II-B2 Federated Learning in Computational Histopathology

In histopathological images, a swarm learning architecture with blockchain protocols has been proposed to predict the mutational status ([saldanha2022swarm]). However, compared with other medical imaging modalities, there are few studies ([saldanha2022direct]) that adopt federated learning in histopathological images for the following reasons. First, the digitalization of pathology is unpopular. Pathological diagnosis still relies on observing specimens under a microscope. Second, image annotation is also an obstacle for the histopathological image process since only pathologists are capable to label WSIs which greatly increases the difficulties of acquiring well-annotated data. Third, due to the gigapixel resolution of WSIs, the size of the deep learning model is generally large, which increases the communication burden in networking.

There are technical solutions in FL to the high communication overhead problem, such as compressing the model size ([reisizadeh2020fedpaq, jhunjhunwala2021adaptive]). [reisizadeh2020fedpaq] proposed FedPAQ to reduce the interactive overhead of FL by compressing the model with lower bit-precision and [jhunjhunwala2021adaptive] proposed an adaptive quantization strategy to achieve communication efficiency.

However, the underlying assumption of existing studies is that there should be enough samples for model training where they may not be able to take into account both communication efficiency and limited data issue ([kamp2021federated, zhang2023two]). In this study, we fully consider the specialty of histopathological images, the difficulties of data labeling and the communication efficiency in the real-world clinical scenario, which has never been discussed in decentralized computational pathology.

III Methodology

In this section, we introduce our framework Federated Deep-Broad Learning (FedDBL). This framework is designed for privacy-preserving tissue classification with limited training samples and extremely low communication overhead. In the following subsections, we first describe the intuitive thinking and problem setting in Section III-A. The overall framework and the methodology of FedDBL are shown in Section III-B. Finally, we demonstrate the implementation details in Section III-C.

III-A Problem Setting

As a classical upstream task in computational pathology, existing tissue classification approaches have achieved outstanding performance under an ideal condition with enough training samples by centralized learning. However, they might face the following obstacles in the real-world clinical scenario.

Annotation burden: Collecting enough well-labeled training samples is expensive and time-consuming because it requires labelers with medical background.

Privacy preservation: The raw data should not be shared across different medical institutions (or clients) to preserve the patient’s privacy. Transmitting raw data may break the principle of medical ethics.

Communication cost: The communication overhead has always been a challenge in federated learning models affected by many compound factors, such as the model size, the communication rounds, the model convergence speed, the network bandwidth and etc.

To resolve the aforementioned challenges, we propose a simple and effective FL-based framework, demonstrated in Fig. 2. First, we abandon conventional end-to-end training manner since limited training samples may harm the robustness of the deep learning model and decrease the convergence speed. Therefore, we separate feature extraction and inference for local training in each client. A pre-trained deep feature extractor (CNN backbone) is introduced to avoid the feature extractor being affected by the training sample bias from different clients in order to guarantee the robustness of extracted features. Then an independent broad learning inference system [BLS, lin2022pdbl] serves for fast inference. Finally, we apply a classical weighted averaging as in FedAvg [mcmahan2017communication], to fuse the broad learning inference systems from all the clients.

Refer to caption
Figure 2: The overall architecture of FedDBL with three modules, deep feature extraction module, broad inference module and federated decentralized module. (a) Deep feature extraction module serves for extracting multi-scale deep-broad features from low level to high level by a pre-trained DL backbone. Features of all the patches are stored in a local feature bank. (b) Broad inference module is introduced for fast inference by a broad learning system. (c) Federated decentralized module applies a classical federated average approach to aggregate the broad learning weights from different clients.

III-B FedDBL Architecture and Formulation

As shown in Fig. 2, FedDBL consists of three modules, deep feature extraction module (DL-module), broad inference module (BL-module) and federated decentralized module (Fed-module). DL-module together with BL-module serves for local training on the client side. Fed-module is executed on the server side. Algorithm 1 provides the details of the entire FedDBL pipeline.

Let 𝒟1,𝒟2,,𝒟k,,𝒟Ksubscript𝒟1subscript𝒟2subscript𝒟𝑘subscript𝒟𝐾\mathcal{D}_{1},\mathcal{D}_{2},\cdots,\mathcal{D}_{k},\cdots,\mathcal{D}_{K}caligraphic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , caligraphic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , caligraphic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , ⋯ , caligraphic_D start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT denote the local training sets from K𝐾Kitalic_K clients with the dataset size of nksubscript𝑛𝑘n_{k}italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for each client 𝒟ksubscript𝒟𝑘\mathcal{D}_{k}caligraphic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. The total number of training samples is denoted as N=k=1Knk𝑁superscriptsubscript𝑘1𝐾subscript𝑛𝑘N=\sum_{k=1}^{K}{n_{k}}italic_N = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. For each sample X𝑋Xitalic_X with ground truth Y𝑌Yitalic_Y in 𝒟𝒟\mathcal{D}caligraphic_D, DL-module with pre-trained parameters ΘしーたΘしーた\Thetaroman_Θしーた extracts the features and stores them in the local feature bank 𝐁𝐁\mathbf{B}bold_B. Then BL-module calculates the weights Wclientsubscript𝑊𝑐𝑙𝑖𝑒𝑛𝑡W_{client}italic_W start_POSTSUBSCRIPT italic_c italic_l italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT of broad learning system. By the federated aggregation approach, we can obtain the global weight Wglobalsubscript𝑊𝑔𝑙𝑜𝑏𝑎𝑙W_{global}italic_W start_POSTSUBSCRIPT italic_g italic_l italic_o italic_b italic_a italic_l end_POSTSUBSCRIPT. The workflows of the server and the clients are demonstrated in Algorithm 1 and Algorithm 2, respectively.

Input : A set of K𝐾Kitalic_K clients
Output : A global model Wglobalsubscript𝑊𝑔𝑙𝑜𝑏𝑎𝑙W_{global}italic_W start_POSTSUBSCRIPT italic_g italic_l italic_o italic_b italic_a italic_l end_POSTSUBSCRIPT
1 Prepare pre-trained DL backbone parameters ΘしーたΘしーた\Thetaroman_Θしーた Initialize BL system setting for each client k𝑘kitalic_k in parallel do
2       Wclientksuperscriptsubscript𝑊𝑐𝑙𝑖𝑒𝑛𝑡𝑘absentW_{client}^{k}\leftarrowitalic_W start_POSTSUBSCRIPT italic_c italic_l italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ←ClientExecution(Θしーた,𝒟k)Θしーたsubscript𝒟𝑘\left(\Theta,\mathcal{D}_{k}\right)( roman_Θしーた , caligraphic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT )
3 end for
Wglobalsubscript𝑊𝑔𝑙𝑜𝑏𝑎𝑙absentW_{global}\leftarrowitalic_W start_POSTSUBSCRIPT italic_g italic_l italic_o italic_b italic_a italic_l end_POSTSUBSCRIPT ← Fed-module(Wclient1,,WclientK)superscriptsubscript𝑊𝑐𝑙𝑖𝑒𝑛𝑡1superscriptsubscript𝑊𝑐𝑙𝑖𝑒𝑛𝑡𝐾(W_{client}^{1},\cdots,W_{client}^{K})( italic_W start_POSTSUBSCRIPT italic_c italic_l italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , ⋯ , italic_W start_POSTSUBSCRIPT italic_c italic_l italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ) return Wglobalsubscript𝑊𝑔𝑙𝑜𝑏𝑎𝑙W_{global}italic_W start_POSTSUBSCRIPT italic_g italic_l italic_o italic_b italic_a italic_l end_POSTSUBSCRIPT
Algorithm 1 FedDBL framework (Server Execution)
Input : Pre-trained DL backbone ΘしーたΘしーた\Thetaroman_Θしーた, training set 𝒟𝒟\mathcal{D}caligraphic_D with n𝑛nitalic_n training samples
Output : Deep-broad learning model Wclientsubscript𝑊𝑐𝑙𝑖𝑒𝑛𝑡W_{client}italic_W start_POSTSUBSCRIPT italic_c italic_l italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT.
/* DL-module */
1 for training sample X𝑋Xitalic_X in 𝒟𝒟\mathcal{D}caligraphic_D do
2       for s𝑠sitalic_s-th stage in Θしーたnormal-Θしーた\Thetaroman_Θしーた do
             fXsΘしーたs(X)superscriptsubscript𝑓𝑋𝑠superscriptΘしーた𝑠𝑋f_{X}^{s}\leftarrow\Theta^{s}\left(X\right)italic_f start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ← roman_Θしーた start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ( italic_X ) // Feature extraction
             𝐞Xs=1HXs×WXsi=1HXsj=1WXsfXs(i,j)superscriptsubscript𝐞𝑋𝑠1superscriptsubscript𝐻𝑋𝑠superscriptsubscript𝑊𝑋𝑠superscriptsubscript𝑖1superscriptsubscript𝐻𝑋𝑠superscriptsubscript𝑗1superscriptsubscript𝑊𝑋𝑠superscriptsubscript𝑓𝑋𝑠𝑖𝑗\mathbf{e}_{X}^{s}=\frac{1}{H_{X}^{s}\times W_{X}^{s}}\sum_{i=1}^{H_{X}^{s}}% \sum_{j=1}^{W_{X}^{s}}f_{X}^{s}(i,j)bold_e start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_H start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT × italic_W start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ( italic_i , italic_j ) // Adaptive global average pooling
3            
4       end for
      𝐛X=𝐞X1𝐞X2𝐞XSsubscript𝐛𝑋conditionalsuperscriptsubscript𝐞𝑋1normsuperscriptsubscript𝐞𝑋2superscriptsubscript𝐞𝑋𝑆\mathbf{b}_{X}=\mathbf{e}_{X}^{1}\parallel\mathbf{e}_{X}^{2}\parallel\cdots% \parallel\mathbf{e}_{X}^{S}bold_b start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT = bold_e start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∥ bold_e start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ ⋯ ∥ bold_e start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT // Concatenation
5      
6 end for
7Obtain {𝐛i|i=1,2,,n}conditional-setsubscript𝐛𝑖𝑖12𝑛\left\{\mathbf{b}_{i}|i=1,2,\cdots,n\right\}{ bold_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_i = 1 , 2 , ⋯ , italic_n }
𝐁σしぐま([𝐛𝟏𝐓,𝐛𝟐𝐓,,𝐛n𝐓])𝐓𝐁𝜎superscriptsuperscriptsubscript𝐛1𝐓superscriptsubscript𝐛2𝐓superscriptsubscript𝐛𝑛𝐓𝐓\mathbf{B}\leftarrow\sigma\left(\left[\mathbf{b_{1}}^{\mathbf{T}},\mathbf{b_{2% }}^{\mathbf{T}},\dots,\mathbf{b}_{n}^{\mathbf{T}}\right]\right)^{\mathbf{T}}bold_B ← italic_σしぐま ( [ bold_b start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_T end_POSTSUPERSCRIPT , bold_b start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_T end_POSTSUPERSCRIPT , … , bold_b start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_T end_POSTSUPERSCRIPT ] ) start_POSTSUPERSCRIPT bold_T end_POSTSUPERSCRIPT // Normalization transformation
/* BL-module */
8 Initialize BL system setting defined by central server
𝐁+limλらむだ0(𝐁𝐁𝐓+λらむだE)1𝐁𝐓superscript𝐁subscript𝜆0superscriptsuperscript𝐁𝐁𝐓𝜆𝐸1superscript𝐁𝐓\mathbf{B}^{+}\leftarrow\lim_{\lambda\to 0}\left(\mathbf{B}\mathbf{B}^{\mathbf% {T}}+\lambda E\right)^{-1}\mathbf{B}^{\mathbf{T}}bold_B start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ← roman_lim start_POSTSUBSCRIPT italic_λらむだ → 0 end_POSTSUBSCRIPT ( bold_BB start_POSTSUPERSCRIPT bold_T end_POSTSUPERSCRIPT + italic_λらむだ italic_E ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_B start_POSTSUPERSCRIPT bold_T end_POSTSUPERSCRIPT // Solve Pseudo-inverse
Wclient𝐁+Ysubscript𝑊𝑐𝑙𝑖𝑒𝑛𝑡superscript𝐁𝑌W_{client}\leftarrow\mathbf{B}^{+}Yitalic_W start_POSTSUBSCRIPT italic_c italic_l italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT ← bold_B start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT italic_Y // Calculate BL model weight
return Wclientsubscript𝑊𝑐𝑙𝑖𝑒𝑛𝑡W_{client}italic_W start_POSTSUBSCRIPT italic_c italic_l italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT
Algorithm 2 FedDBL framework (Client Execution)

III-B1 Deep Feature Extraction Module

A large number of samples and repeated backpropagation are required in standard DL training to achieve a good feature representation ability. When suffering from the insufficient data problem, the model training procedure might be unstable which leads to poor feature representation and model overfitting. Our previous study [lin2022pdbl] reveals that directly adopting a stable pre-trained model for feature extraction is more favorable to the model performance than training the model with limited samples, even the pre-trained model was trained by an irrelevant image domain (ImageNet111https://image-net.org/). Inspired by this idea, we use a pre-trained CNN model with no further training to extract the deep features. Notice that, the selection of the pre-trained models is flexible, and can be from any image domain. We have conducted an experiment to justify the flexibility in Section IV. Another advantage of using pre-trained models is to avoid model inverse attacks since the training samples are all unseen. To enrich the feature representation, we extract multi-stage features from low-level to high-level, details as below.

As illustrated in DL-module of Algorithm 2, each client k𝑘kitalic_k (k[1,,K])𝑘1𝐾(k\in\left[1,\cdots,K\right])( italic_k ∈ [ 1 , ⋯ , italic_K ] ) downloads the pre-trained DL backbone as feature extractor ΘしーたΘしーた\Thetaroman_Θしーた and extracts multi-stage deep features 𝐛Xsubscript𝐛𝑋\mathbf{b}_{X}bold_b start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT of training sample X𝑋Xitalic_X locally (we omit k𝑘kitalic_k for simplicity), where 𝐛Xsubscript𝐛𝑋\mathbf{b}_{X}bold_b start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT consists of multiple stages’ features Θしーたs(X)(s[1,,S])superscriptΘしーた𝑠𝑋𝑠1𝑆\Theta^{s}(X)(s\in\left[1,\cdots,S\right])roman_Θしーた start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ( italic_X ) ( italic_s ∈ [ 1 , ⋯ , italic_S ] ). The features of the entire dataset 𝒟ksubscript𝒟𝑘\mathcal{D}_{k}caligraphic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT are stored in the local feature bank 𝐁𝐁\mathbf{B}bold_B. Then the local feature bank 𝐁𝐁\mathbf{B}bold_B will be passed to broad inference module. Since neither the training data nor the feature bank is shared across different clients, there is no privacy leakage risk for the RAW data in deep feature extraction module.

III-B2 Broad Inference Module

With the local feature bank 𝐁𝐁\mathbf{B}bold_B, each client k𝑘kitalic_k can conduct a local BL system [BLS] through BL-module (Algorithm 2) for fast inference. By solving the Eq. (1) optimization problem, an optimal BL model Wclientsubscript𝑊𝑐𝑙𝑖𝑒𝑛𝑡W_{client}italic_W start_POSTSUBSCRIPT italic_c italic_l italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT can be obtained rapidly through the pseudo-inverse method (Eq. (2)).

Wclient=argminWinit𝐁WinitY22+γがんまWinit22subscript𝑊𝑐𝑙𝑖𝑒𝑛𝑡subscript𝑊𝑖𝑛𝑖𝑡superscriptsubscriptnorm𝐁subscript𝑊𝑖𝑛𝑖𝑡𝑌22𝛾superscriptsubscriptnormsubscript𝑊𝑖𝑛𝑖𝑡22W_{client}=\underset{W_{init}}{{\arg\min}}\left\|\mathbf{B}W_{init}-Y\right\|_% {2}^{2}+\gamma\left\|W_{init}\right\|_{2}^{2}italic_W start_POSTSUBSCRIPT italic_c italic_l italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT = start_UNDERACCENT italic_W start_POSTSUBSCRIPT italic_i italic_n italic_i italic_t end_POSTSUBSCRIPT end_UNDERACCENT start_ARG roman_arg roman_min end_ARG ∥ bold_B italic_W start_POSTSUBSCRIPT italic_i italic_n italic_i italic_t end_POSTSUBSCRIPT - italic_Y ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_γがんま ∥ italic_W start_POSTSUBSCRIPT italic_i italic_n italic_i italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (1)
Wclient=𝐁+Y=limλらむだ0(𝐁𝐁𝐓+λらむだE)1𝐁𝐓Ysubscript𝑊𝑐𝑙𝑖𝑒𝑛𝑡superscript𝐁𝑌subscript𝜆0superscriptsuperscript𝐁𝐁𝐓𝜆𝐸1superscript𝐁𝐓𝑌W_{client}=\mathbf{B}^{+}Y=\lim_{\lambda\to 0}\left(\mathbf{B}\mathbf{B}^{% \mathbf{T}}+\lambda E\right)^{-1}\mathbf{B}^{\mathbf{T}}Yitalic_W start_POSTSUBSCRIPT italic_c italic_l italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT = bold_B start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT italic_Y = roman_lim start_POSTSUBSCRIPT italic_λらむだ → 0 end_POSTSUBSCRIPT ( bold_BB start_POSTSUPERSCRIPT bold_T end_POSTSUPERSCRIPT + italic_λらむだ italic_E ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_B start_POSTSUPERSCRIPT bold_T end_POSTSUPERSCRIPT italic_Y (2)

where Y𝑌Yitalic_Y represents the ground-truth label matrix, 𝐁𝐁\mathbf{B}bold_B is local feature bank in the form of matrix. Winitsubscript𝑊𝑖𝑛𝑖𝑡W_{init}italic_W start_POSTSUBSCRIPT italic_i italic_n italic_i italic_t end_POSTSUBSCRIPT is the initialized broad learning weights. E𝐸Eitalic_E is the identity matrix, λらむだ𝜆\lambdaitalic_λらむだ is a constant parameter and γがんま𝛾\gammaitalic_γがんま is the regularization parameter. The pseudo-inverse method of solving BL model considerably reduces the computational burden while achieving high communication efficiency. For the inference process, the predicted results can be calculated by Ytest=𝐁testWclientsubscript𝑌𝑡𝑒𝑠𝑡subscript𝐁𝑡𝑒𝑠𝑡subscript𝑊𝑐𝑙𝑖𝑒𝑛𝑡Y_{test}=\mathbf{B}_{test}W_{client}italic_Y start_POSTSUBSCRIPT italic_t italic_e italic_s italic_t end_POSTSUBSCRIPT = bold_B start_POSTSUBSCRIPT italic_t italic_e italic_s italic_t end_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT italic_c italic_l italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT after extracting test samples’ deep features with the largest probabilistic value.

Thanks to the lightweight broad learning model Wclientsubscript𝑊𝑐𝑙𝑖𝑒𝑛𝑡W_{client}italic_W start_POSTSUBSCRIPT italic_c italic_l italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT, the communication efficiency is drastically improved compared with the conventional DL-based FL frameworks.

III-B3 Federated Decentralized Module

In this module, we conduct a federated learning framework for decentralized learning. Given the broad learning model Wclientksuperscriptsubscript𝑊𝑐𝑙𝑖𝑒𝑛𝑡𝑘W_{client}^{k}italic_W start_POSTSUBSCRIPT italic_c italic_l italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT of each client k𝑘kitalic_k, we first upload the models from all the clients to the central server. And then general federated aggregation methods can be applied to aggregate them. Here, we use the most common weighted averaging way for model aggregation as adopted in FedAvg [mcmahan2017communication], FedProx [li2020federated] and FedPAQ [reisizadeh2020fedpaq].

Wglobal=k=1KnkNWclientksubscript𝑊𝑔𝑙𝑜𝑏𝑎𝑙superscriptsubscript𝑘1𝐾subscript𝑛𝑘𝑁superscriptsubscript𝑊𝑐𝑙𝑖𝑒𝑛𝑡𝑘W_{global}=\sum_{k=1}^{K}\frac{n_{k}}{N}W_{client}^{k}italic_W start_POSTSUBSCRIPT italic_g italic_l italic_o italic_b italic_a italic_l end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_N end_ARG italic_W start_POSTSUBSCRIPT italic_c italic_l italic_i italic_e italic_n italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT (3)

where Wglobalsubscript𝑊𝑔𝑙𝑜𝑏𝑎𝑙W_{global}italic_W start_POSTSUBSCRIPT italic_g italic_l italic_o italic_b italic_a italic_l end_POSTSUBSCRIPT is the global model from the server, nksubscript𝑛𝑘n_{k}italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the number of training samples in client k𝑘kitalic_k and N𝑁Nitalic_N is the total number of training samples. A larger training dataset will contribute more to the global model. Since we only share the broad learning model for once, the communication efficiency and the patient’s privacy are guaranteed.

III-C Implementation Details

All of our experiments are implemented in Pytorch on a workstation with an NVIDIA RTX 3090 and the i9-11900K CPU with 16 cores. We use the cross-entropy loss for the baseline centralized training with batch size 20202020. The SGD optimizer is set as follows: the learning rate is 1e31superscript𝑒31e^{-3}1 italic_e start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT, the weight decay is 1e41superscript𝑒41e^{-4}1 italic_e start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT and the momentum is 0.9. The patches are 224×224224224224\times 224224 × 224 under 20×20\times20 × WSIs. Different client numbers are used depending on the datasets.

We adopt three well-known federated aggregation methods, FedAvg [mcmahan2017communication], FedProx [li2020federated] and FedPAQ [reisizadeh2020fedpaq], for comparison. And the centralized model is trained as the baseline. FedProx has the parameter μみゅー𝜇\muitalic_μみゅー to adjust the effect of the proximal term on the loss function. Here we set μみゅー𝜇\muitalic_μみゅー as 1111 which has better performance.

IV Experiments

In this section, we present the details of the datasets and conduct various experiments to demonstrate the performance and efficiency of the proposed FedDBL. Section IV-A shows two open datasets and the experimental settings in the federated learning framework. In Section LABEL:sub:one-round, we compare FedDBL with centralized learning baselines, conventional federated learning baselines and one-round federated learning baselines. The effectiveness is comprehensively discussed in Section LABEL:sub:multi-round. We use Matthews Correlation Coefficient (MCC), Accuracy and F1-score as the evaluation metrics in all the experiments.

IV-A Datasets and Experimental Settings

TABLE I: Statistics of MC-CRC. #1 denotes TCGA, #2 denotes Kather, #3 denotes Guangdong Provincial People’s Hospital and #4 denotes Yunnan Cancer Hospital.

A DI

B ACK

D EB

L YM

M UC

M US

N ORM

S TR

T UM