(Translated by https://www.hiragana.jp/)
Semi-supervised Multi-view Clustering based on NMF with Fusion Regularization | ACM Transactions on Knowledge Discovery from Data skip to main content
research-article
Open access

Semi-supervised Multi-view Clustering based on NMF with Fusion Regularization

Published: 27 April 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Multi-view clustering has attracted significant attention and application. Nonnegative matrix factorization is one popular feature of learning technology in pattern recognition. In recent years, many semi-supervised nonnegative matrix factorization algorithms were proposed by considering label information, which has achieved outstanding performance for multi-view clustering. However, most of these existing methods have either failed to consider discriminative information effectively or included too much hyper-parameters. Addressing these issues, a semi-supervised multi-view nonnegative matrix factorization with a novel fusion regularization (FRSMNMF) is developed in this article. In this work, we uniformly constrain alignment of multiple views and discriminative information among clusters with designed fusion regularization. Meanwhile, to align the multiple views effectively, two kinds of compensating matrices are used to normalize the feature scales of different views. Additionally, we preserve the geometry structure information of labeled and unlabeled samples by introducing the graph regularization simultaneously. Due to the proposed methods, two effective optimization strategies based on multiplicative update rules are designed. Experiments implemented on six real-world datasets have demonstrated the effectiveness of our FRSMNMF comparing with several state-of-the-art unsupervised and semi-supervised approaches.

    1 Introduction

    Clustering is a very important multi-variant analysis technology in pattern recognition and machine learning. In past decades, many clustering methods [1, 21, 50, 63] have been proposed such as k-means [15], Gaussian mixture model, spectral clustering, hierarchical clustering algorithm, and so on. While most of them are designed for single view data. However, in many cases, the data in the real world are collected from multiple sources [18, 34, 55], which characterizes the objects from various perspectives and is usually termed as multiple views or modalities in these literatures [9, 47, 51, 61]. The different views generally contain interaction and complementary information which can be integrated for enhancing the recognition performance for different tasks. Therefore, for multiview clustering tasks, how to effectively fuse the information from the multiple views to enhance the clustering performance is the most important issue in the learning process [2, 59].
    Among various outstanding multiview clustering methods [7, 8, 45, 48, 56, 57], nonnegative matrix factorization (NMF) [24, 25] based methods are widely studied owing to the excellent feature learning ability of NMF. Under the framework of multi-view nonnegative matrix factorization (MVNMF) [35, 38, 64], each point can be represented with an efficient low dimensional feature vector. From the perspective of whether to utilize label information, MVNMF methods [32] can be divided into unsupervised methods [46] and semi-supervised methods [28]. Many unsupervised MVNMFs focus on exploring the fusion mechanism of multiple views in hidden space. For example, in [46], a novel pair-wise co-regularization was proposed to align the multiple views, which can consider the similarity of the inter-view. In [14], a nonredundancy regularization was developed to discover the distinct contributions of the different views. In [42] and [27], the inter-view diversity terms are developed to learn more comprehensive information of data. Additionally, unsupervised MVNMFs usually take some normalizing operations to improve the performance of the model. For instance, in [41], to ensure the numerical stablility of NMF, in each iteration the column summation of the basis matrix is normalized to be 1, i.e., \(\sum \nolimits _i {{{\bf W}}_{i\cdot }^v = 1}\) ( \(v\) denotes the \(v\) th view). Furthermore, to make the representations of the different views comparable at the same scale, in [52], [39], and [32], the same normalization operation in [41] is adopted and the norms of the basis matrix are compensated into the coefficient matrix in the co-regularization term. To guarantee the solution uniqueness of NMF, in [17], the row-summation of the coefficient matrix is normalized to be 1, i.e., \(\sum \nolimits _j {{{\bf H}}_{\cdot j}^v = 1}\) , where each column denotes a data point. These methods have failed to consider the label information that the data may provide. In fact, effectively utilizing the label information provided by a small proportion of the data, the clustering performance can be further enhanced. Under this consideration, in recent years, some semi-supervised MVNMFs are developed trying to employ the label information.
    Most recently proposed semi-supervised MVNMF methods are based on constrained NMF (CNMF) [30] which is a semi-supervised NMF method for single view data. The shortage of CNMF-based semi-supervised MVNMFs [5, 6, 43, 54] is that they all have failed to discover the discriminative information among the different classes. In these methods, the data points of the same classes are presented by the same feature vectors. By this means, the distances among the data points residing in the same classes are zeroed. However, the distances among the data points residing in the different classes are not maximized. Another problem of these CNMF-based methods is that they all have failed to make use of the local geometrical information of the data. In [22], Jiang et al. have tried to discover the discriminative information of data by regressing the partial coefficient matrix to the label matrix. The work in [31] has also considered the discriminative information of data. Unfortunately, both of these methods have ignored the geometrical information of data. The defect of these methods has been overcome in the work of [28] which has extended the work of [31] by including a graph regularization. Liang et al. [29] have tried to make use of the geometrical information of data by involving an adaptive local structure learning module. In [58], hypergraph regularization is used to encode the structure of data.
    Although, the methods in [28], [29], and [58] have considered both discriminative information and geometrical information of the data. The price is that too many hyper-parameters make these methods hard to get satisfying performance for different datasets. Another problem is that, normalizing strategy is widely disccussed in unsupervised MVNMF methods; however, it is rarely explored under the framework of semi-supervised MVNMF. Addressing these issues, in this article, a semi-supervised multi-view nonnegative matrix factorization with fusion regularization (FRSMNMF) is proposed. In this method, the discriminative term and the feature alignment term are fused as one regularization prior termed as fusion regularization. The meaning of this regularization comes from two aspects. On the one hand, the features of different views are fused by this regularization. On the other hand, the feature fusion constraint and discriminative constraint are integrated as one constraint with this regularization, which effectively reduces the number of hyper-parameters. Additionally, to further make use of geometrical information of the data, graph regularization is also constructed for each view. To align the feature scale of different views, two different feature normalizing strategies are adopted, which corresponds to two variants denoted as FRSMNMF_N1 and FRSMNMF_N2. And the influences of these two normalizing strategies are compared in the experiments. For the two variants, corresponding specific iterative optimizing schemes are also designed. The contributions of this article are summarized as follows:
    A novel fusion regularization based semi-supervised MVNMF is presented. In this work, the exploration of discriminative information and feature alignment are achieved in one term which reduces the number of hyper-parameters and enlarges the inter-class distinction;
    Two feature normalizing strategies are adopted to align the feature scales of different views, which produces two variants of the proposed framework;
    For the two variants of the proposed framework, two specific iterative optimizing schemes are designed to solve the corresponding minimization problems effectively;
    The effectiveness of the proposed methods is evaluated by comparing with some recently proposed representative unsupervised and semi-supervised MVNMFs on six datasets.
    The rest of this article is organized as follows: Section 2 introduces some related works on multi-view clustering. In Section 3, the details of the proposed semi-supervised multi-view nonnegative matrix factorization with fusion regularization (FRSMNMF) and its two specific variants are introduced. Section 4 gives the corresponding optimizing strategy, convergence proof and computational complexity analysis. In Section 5, the experimental results are demonstrated. Finally, the conclusion of this article is made in Section 6.

    2 Related Work

    In this section, we briefly review the related unsupervised and semi-supervised multi-view clustering methods.

    2.1 Unsupervised Multi-view Clustering

    Most of unsupervised MVNMF methods are based on the idea of aligning the multiple views in a shared common subspace to fuse the information. In [32], Liu et al. proposed a pioneer MVNMF method which tries to learn a consensus representation by minimizing a centroid co-regularization. With this regularization, the feature matrices of the different views are pushed toward a consensus feature matrix, and then multiview alignment is achieved. Furthermore, a feature scale normalization matrix is introduced to constrain the feature scales of different views to be similar hoping to facilitate the feature alignment. In [39], Rai et al. have adopted the same strategy to make the scales of different views comparable. In [52], Yang et al. have explored another way to reduce the distribution divergences of the feature matrix. They presented a MVNMF based on nonnegative matrix tri-factorization [53]. In order to restrict the feature scale of different views to be similar, the column-summation of the product matrix of basis matrix and shared embedding matrix are constrained to be 1. Essentially, Liu et al. [32], Rai et al. [39], and Yang et al. [52] have adopted the same feature normalizing strategy, in which the column-summation of the basis matrix is constrained to be 1, i.e., \(\sum \nolimits _i {{{\bf W}}_{i\cdot }^v = 1}\) , and the norms are compensated to the coefficient matrix in the co-regularization term. Actually, to improve the performance of unsupervised MVNMFs, various normalizing operations are explored. For example, to guarantee the numerical stability of NMF, Shao et al. [41] have constrained the column-summation of the basis matrix to be 1 and do not compensate the norms to the coefficient matrix in the co-regularization term. To ensure the solution uniqueness of NMF, Hu et al. [17] have constrained the row-summation of the coefficient matrix to be 1, i.e., \(\sum \nolimits _j {{{\bf H}}_{\cdot j}^v = 1}\) . Most of existing methods are based on the centroid co-regularization. While, in [46], a pair-wise co-regularization was proposed to align the multiple views pair-wisely. By minimizing this co-regularization, the alignment is acquired through pushing the representations of different views together. Until now, we can find that, most of the MVNMF methods were developed from aligning the multiple views, and ignored the diversity of the different views. Addressing this issue, in [42], an MVNMF called diverse nonnegative matrix factorization has been proposed. In this method, a diverse term is designed to encourage the heterogeneity of different views and try to learn some more comprehensive information. Recently, with the flourish of deep learning in computer vision and pattern recognition, some “deep” models are proposed for multi-view clustering task [13]. For example, a deep multi-view semi-nonnegative matrix factorization is proposed in [60]. In this method, an adaptive weighting strategy is adopted to balance the effect of different views. Similarly, a deep MVNMF is proposed in [26]. The graph regularization is applied in all layers not only in the final layer like [60] and all factor matrices are restricted to be nonnegative. All of the previously reviewed methods focus on dealing with “well-established” data. It means that all views have the corrected correspondence on sample-level. In [20], Huang et al. have developed a partially view-aligned clustering method which can tackle the multi-view data not fully view-aligned. “Partially view-aligned” means that only a part of data have the corrected correspondence on sample-level.

    2.2 Semi-Supervised Multi-View Clustering

    In recent years, several semi-supervised MVNMFs are proposed and promising results are obtained with label information. Wang et al. [43] have proposed a semi-supervised MVNMF based on CNMF for the first time. In this work, an adaptive weighting strategy is applied to balance the effect of different views. To learn a consensus representation, a similar centroid co-regularization like [32] is utilized in their method. In [5] and [6], Cai et al. have developed two semi-supervised MVNMF methods; these two methods are also based on CNMF. Both methods have adopted pair-wise co-regularization with Euclidean distance to align multiple views like [46]. The difference between [5] and [6] is the second regularizing term. In [5], \(\ell _{2,1}\) -norm regularization of the auxiliary matrix is used, and an orthonormality constraint on the auxiliary matrix is used in [6]. The former one tries to get sparse representations for each view, and the latter one tries to constrain the feature scale of different views to be similar. Yang et al. [54] have improved the work in [6] by relaxing the label constraint matrix. The common drawback of above methods is that they all have ignored the discriminative information of data. To make use of this information, Jiang et al. [22] have proposed a unified latent factor learning model trying to reconstruct the label matrix with the product of partial shared coefficient matrix and an auxiliary matrix. Further, Liu et al. [31] have improved the above work by splitting the coefficient matrix into the private part and the common part. Based on [31], Liang et al. [28] have tried to improve the performance of the method by imposing a graph regularization on the coefficient matrix. The problem with [28] is that it uses too many hyper-parameters, which makes this method very complex and hard to fine-tune. Wang et al. [44] have presented a semi-supervised multi-view clustering model with weighted anchor graph embedding, in which the anchors are constructed based on label information. Nie et al. [36] have developed a semi-supervised multi-view learning method which can model the structure of data adaptively. Based on this work, Liang et al. [29] have presented a semi-supervised MVNMF method based on label propagation. Zhang et al. [58] have proposed a semi-supervised MVNMF method, called dual hypergraph regularized partially shared NMF, in which hypergraph regularization was imposed on both the coefficient matrix and the basis matrix. Similar to [28], these two methods also suffer from involving too many hyper-parameters. Recently, some deep multi-view clustering methods were also developed. Zhao et al. [62] have proposed a deep semi-supervised MVNMF method which encodes the discriminative information of data by constructing an affinity graph for intra-class compactness and a penalty graph for inter-class distinctness. Chen et al. [11] have presented an autoencoder-based semi-supervised multi-view clustering method which encodes the discriminative information of data by introducing a pairwise constraint.
    From above review, we can find that, although various matrix normalizing strategies are widely explored in many unsupervised MVNMF methods, these stategies are rarely discussed under the semi-supervised MVNMF framework. In these previously introduced normalizing strategies, the column-summation of the basis matrix is usually normalized to be 1, and sometimes the norms are compensated to the coefficient matrix in the co-regularization term. However, it is more reasonable to constrain the columns of the basis to be unit vectors, i.e., \(||{{\bf W}}_{\cdot j}^v|{|_2} = 1\) . Additionally, recently proposed semi-supervised MVNMF methods have involved too many hyper-parameters, which has made these methods very complex and hard to fine-tune. Addressing these issues, in the following sections, we will introduce a novel semi-supervised MVNMF framework with fusion regularization. This framework can be implemented with two different feature normalizing strategies. And the fusion regularization can make use of the discriminative information of data and reduce the number of hyper-parameters.

    3 Semi-Supervised Multi-View Clustering with Fusion Regularization

    For a traditional MVNMF framework, the objective function can be written as follows:
    \begin{equation} \sum \limits _v {{\rm {||}}{{\bf X}^v} - {{\bf W}^v}{{\bf H}^v}{\rm {||}}_F^2} + \lambda \mathcal {R}({\bf H}^v), \end{equation}
    (1)
    where \({\bf X}^v\in \mathbb {R}_{+}^{ m^v\times n}\) , \({\bf W}^v\in \mathbb {R}_{+}^{ m^v\times d^v}\) and \({\bf H}^v\in \mathbb {R}_{+}^{ d^v\times n}\) are the data matrix, the basis matrix and the coefficient matrix of the \(v\) th view, respectively. \(m^v\) and \(d^v\) are the dimensions of original space and latent space for the \(v\) th view, and \(n\) is the size of the dataset. \(\mathcal {R}({\bf H}^v)\) denotes the regularization term which is employed to introduce extra prior or constrains for enhancing the clustering performance, such as manifold regularization, co-regularization. \(\lambda\) is the hyper-parameter which regulates the participation of these constraints in the optimization process. Both the selection of hyper-parameter \(\lambda\) and regularization prior \(\mathcal {R}({\bf H}^v)\) play the important role in the design of the model. In order to achieve satisfactory clustering result, enlarge the inter-class distinction, shrink the distance of intra-class, and achieve multi-view feature alignment effectively are crucial. However, most of the existing multi-view NMF methods have failed to consider discriminative information and feature alignment simultaneously. Although some algorithms like AMVNMF, MVCNMF, and MVOCNMF have adopted label information of the data, they have not further considered the distinguishability among classes. Therefore, performance of clustering is limited. In addition, the model should avoid introducing too many hyper-parameters while integrating more regular priors \(\mathcal {R}({\bf H}^v)\) for feature learning.
    For enforcing the inter-class distinctness with multi-view features, in this article, we first construct a novel fusion regularization which considers the distinct information among clusters and alignment of different views together. It can be formulated as follows:
    \begin{equation} {\rm {||}}[{\bf Y}, {\bf H}_c] -[{{\bf H}^v_l}, {{\bf H}^v_{ul}}]{\rm {||}}_F^2 \end{equation}
    (2)
    where \({\bf Y}\in \lbrace 0, 1\rbrace ^{d^v\times n_l}\) is the label matrix and \({\bf H}^v_l\in \mathbb {R}_{+}^{d^v\times n_l}\) is the feature matrix of labeled samples for the \(v\) th view. \(n_l\) is the number of labeled data points. \({\bf H}_c\in \mathbb {R}_{+}^{d^v\times n_{ul}}\) denotes the common feature matrix and \({\bf H}^v_{ul}\in \mathbb {R}_{+}^{d^v\times n_{ul}}\) is the feature matrix of unlabeled data points for the \(v\) th view. \(n_{ul}\) is the number of unlabeled data points. Obviously, in this work, \(d^1=d^2=\cdots =d^V=C\) . \(C\) is the number of classes. A general way to simultaneously consider discriminative information and feature alignment is to define them separately as follows:
    \begin{equation} ||{{\bf Y}} - {{\bf H}}_l^v{\rm {||}}_F^2 + ||{{\bf H}}_c^* - {{{\bf H}}^v}{\rm {||}}_F^2, \end{equation}
    (3)
    where \({{{\bf H}}^v} = [{{\bf H}}_l^v,{{\bf H}}_{ul}^v]\) and \({{\bf H}}_c^* = [{{\bf H}}_{cl}^*,{{\bf H}}_{c}]\) . Then, Equation (3) can be further written as follows:
    \begin{equation} ||{{\bf Y}} - {{\bf H}}_l^v{\rm {||}}_F^2 + ||[{{\bf H}}_{cl}^*,{{\bf H}}_{c}] - [{{\bf H}}_l^v,{{\bf H}}_{ul}^v]{\rm {||}}_F^2. \end{equation}
    (4)
    In Equation (4), the first term is used to align the coefficient matrix of the labeled samples (i.e., \({{\bf H}}_l^v\) ) to the label matrix (i.e., \({\bf Y}\) ), and the second term is used to align the whole coefficient matrix of all samples (i.e., \({{{\bf H}}^v} = [{{\bf H}}_l^v,{{\bf H}}_{ul}^v]\) ) to the common consensus matrix \({{\bf H}}_c^* = [{{\bf H}}_{cl}^*,{{\bf H}}_c]\) . We can see that the representation of the labeled samples ( \({{\bf H}}_l^v\) ) is aligned by both \({\bf Y}\) and \({{\bf H}}_{cl}^*\) . The goal of aligning \({{\bf H}}_l^v\) to \({\bf Y}\) is to discover the discriminative information of data. While, the goal of aligning \({{\bf H}}_l^v\) to \({{\bf H}}_{cl}^*\) is to align the multiple views to push the representations of different views ( \({{\bf H}}_l^v\) , \(v=1, 2, \ldots , V\) ) close to each other. Actually, by aligning \({{\bf H}}_l^v\) to \({\bf Y}\) can also achieve this goal, because the label matrix \({\bf Y}\) is shared by all views. With this intuition, we substitute \({{\bf H}}_{cl}^*\) with \({\bf Y}\) in Equation (4) and remove the first term, then, we can get the proposed Equation (2). Until now, we can see that by aligning \({{\bf H}}_l^v\) with \({\bf Y}\) , the discriminative information can be discovered and \({{\bf H}}_l^v\) ( \(v=1, 2, \ldots , V\) ) of different views can be pushed close to each other. And \({{\bf H}}_{ul}^v\) ( \(v=1, 2, \ldots , V\) ) of different views can be pushed together by aligning them to \({{\bf H}}_c\) . The meaning of fusion regularization has two folds: (1) the features of different views are fused by this regularization; (2) the feature fusion constraint and discriminative constraint are integrated as one constraint with this regularization, which reduces the introduction of hyper-parameters.
    In Equations (2), (3), and (4), the label matrix \({\bf Y}\) is approximated by \({{\bf H}}_l^v\) . We can denote the data matrix \({{\bf X}}^v\) as \({{{\bf X}}^v} = [{{\bf X}}_l^v,{{\bf X}}_{ul}^v]\) corresponding to the low dimensional representation \({{{\bf H}}^v} = [{{\bf H}}_l^v,{{\bf H}}_{ul}^v]\) . Then \({{\bf H}}_l^v\) is the low dimensional representation of \({{\bf X}}_l^v\) . Actually, the label matrix \({\bf Y}\) can also be seen as a feature representation of \({{\bf X}}_l^v\) , which contains the discriminative information of data. So, by subtracting \({\bf Y}\) with \({{\bf H}}_l^v\) , i.e., minimizing \(||{{\bf Y}} - {{\bf H}}_l^v{\rm {||}}_F^2\) , \({\bf Y}\) is reconstructed by \({{\bf H}}_l^v\) , then the discriminative information in \({\bf Y}\) can be learned by \({{\bf H}}_l^v\) . Note that, we do not constrain the column summation of \({{\bf H}}_l^v\) to be 1, i.e., \(\sum \nolimits _i {{\rm {H}}_{l(i)}^v \ne 1}\) , so \({{\bf H}}_l^v\) is a relaxed representation of \({\bf Y}\) .
    Additionally, to make use of the geometry information of the multi-view data in each view for decreasing the intra-class diversities, a graph regularizer is constructed based on both labeled samples and unlabeled samples, which is defined as follows:
    \begin{equation} \sum \limits _{i = 1}^n {\sum \limits _{j = 1}^n {\big |\big |h_i^v - h_j^v\big |\big |_2^2{\bf S}_{ij}^v} } = {\rm {tr}}({{\bf H}^v}^T{{\bf L}^v}{{\bf H}^v}), \end{equation}
    (5)
    where \({\bf L}^v={\bf D}^v-{\bf S}^v\) , \({\bf D}_{ii}^v=\sum _{j}{\bf S}_{ij}^v\) (or \({\bf D}_{ii}^v=\sum _{j}{\bf S}_{ji}^v\) ). \({\bf S}^v\) is defined as follows:
    \begin{equation} {\bf S}_{ij}^v \!=\! \left\lbrace \!\! \begin{array}{*{20}{l}} {{1}}&{{\rm {if}} \ {h_i^v} \!\in \! {N_k}(h_j^v) \ {\rm {or}} \ {h_j^v} \!\in \! {N_k}(h_i^v)}\\ 0&{{\rm {otherwise}}} \end{array} \right., \end{equation}
    (6)
    where \({N_k}(h_i^v)\) consists of \(k\) nearest neighbors of \(h^v_i\) .
    Considering the above properties of inter- and intra-class, the overall objective function of our proposed semi-supervised multi-view nonnegative matrix factorization with fusion regularization (FRSMNMF) can be described as follows:
    \begin{equation} \begin{array}{l} {O_{{\rm {FRSMNMF}}}} = \sum \limits _{v = 1}^{{V}} {{ ||{{\bf X}^v} - {{\bf W}^v}{{\bf H}^v}||_{F}^2} + \alpha {\rm {tr}}({{\bf H}^v}^T{{\bf L}^v}{{\bf H}^v})\\ +\,\, \beta ||[{\bf Y}, {\bf H}_c] - [{{\bf H}^v_l}, {{\bf H}^v_{ul}}]||_F^2 } \\ s.t. \ {\bf W}^v,{\bf H}^v,{{\bf H}_c} \ge 0. \end{array} \end{equation}
    (7)
    To tackle multi-view tasks, another important issue is to align multiple features effectively for aggregating the information of different views. Instead of optimizing Equation (7) directly, we first restrict the scales of multiple features to be comparable-level. For this target, the column vectors of \({\bf W}^v\) are constrained be \(||{\bf W}_{\cdot j}^v||_2=1\) or \(||{\bf W}_{\cdot j}^v||_1=1\) . One possible way to account for the above constraints is to add an extra regularization term to Equation (7), but this would make the optimization problem very complicated. Instead of taking the direct strategy, an alternative scheme is to compensate the norms of the basis matrix into the coefficient matrix. Then, the objective function of FRSMNMF in Equation (7) can be rewritten as:
    \begin{equation} \begin{array}{l} {O_{{\rm {FRSMNMF}}}} = \sum \limits _{v = 1}^{{V}} {\lbrace ||{{\bf X}^v} - {{\bf W}^v}{{\bf H}^v}||_{F}^2} + \alpha {\rm {tr}}({{\bf Q}^v}{{\bf H}^v}{{\bf L}^v}{{\bf H}^v}^T{{\bf Q}^v}^T)\\ \qquad \qquad \qquad +\,\, \beta ||[{\bf Y}, {\bf H}_c] - {{\bf Q}^v}{{\bf H}^v}||_F^2\rbrace \\ s.t. \ {{\bf W}^v}, {{\bf H}^v},{{\bf H}_c} \ge 0, \end{array} \end{equation}
    (8)
    where \({\bf Q}^v\) is defined as
    \begin{equation} {{{\bf Q}}^v} = Diag\left(\sqrt {\sum \limits _{i = 1}^{{m^v}} {{{\bf W}}{{_{i,1}^v}^2}} } ,\sqrt {\sum \limits _{i = 1}^{{m^v}} {{{\bf W}}{{_{i,2}^v}^2}} } ,\ldots ,\sqrt {\sum \limits _{i = 1}^{{m^v}} {{{\bf W}}{{_{i,d^v}^v}^2}} }\right)\!, \end{equation}
    (9)
    or
    \begin{equation} {{{\bf Q}}^v} = Diag\left({\sum \limits _{i = 1}^{{m^v}} {{{\bf W}}{_{i,1}^v} } , {\sum \limits _{i = 1}^{{m^v}} {{{\bf W}}{_{i,2}^v}} } ,\ldots , {\sum \limits _{i = 1}^{{m^v}} {{{\bf W}}{_{i,d^v}^v}} }}\right)\!. \end{equation}
    (10)
    For simplicity, the proposed methods with \(||{\bf W}_{\cdot j}^v||_2=1\) and \(||{\bf W}_{\cdot j}^v||_1=1\) are denoted as FRSMNMF_N1 and FRSMNMF_N2, respectively, corresponding to two different normalization manners. In following sections, the optimization algorithms of FRSMNMF_N1 and FRSMNMF_N2 will be introduced in detail.

    4 Optimization Procedure

    4.1 Optimization Algorithm of FRSMNMF_N1

    The optimization problem of minimizing Equation (8) is not direct. In this section, an alternating optimization strategy is developed, which breaks the original problem into several subproblems such that each subproblem is tractable.
    For a specific view \(v\) , its optimization with \({\bf W}^{v}\) and \({\bf H}^{v}\) does not depend on other views. The problem of minimizing Equation (8) can be written as follows:
    \begin{equation} \begin{array}{l} \mathop {\min }\limits _{{{\bf W}^v},{{\bf H}^v},{{\bf H}_c} \ge 0} ||{{\bf X}^v} - {{\bf W}^v}{{\bf H}^v}||_{F}^2 + \alpha {\rm {tr}}({{\bf Q}^v}{{\bf H}^v}{{\bf L}^v}{{\bf H}^v}^T{{\bf Q}^v}^T) \\ \qquad \qquad +\,\, \beta ||[{\bf Y}, {\bf H}_c] - {{\bf Q}^v}{{\bf H}^v}||_F^2 \end{array} \end{equation}
    (11)

    4.1.1 Fixing Hc, updating Wv and Hv.

    (1) Fixing \({\textbf {H}_c}\) and \({\bf H}^v\) , updating \({\bf W}^v\)
    When \({{\bf H}_c}\) and \({\bf H}^v\) are fixed, the above equation is equivalent to the following problem:
    \begin{equation} \begin{array}{l} \mathop {\min }\limits _{{{\bf W}^v}\ge 0} ||{{\bf X}^v} - {{\bf W}^v}{{\bf H}^v}||_{F}^2 + \alpha {\rm {tr}}({{\bf Q}^v}{{\bf H}^v}{{\bf L}^v}{{\bf H}^v}^T{{\bf Q}^v}^T)\\ \qquad +\,\, \beta {\rm {tr}}({{\bf Q}^v}{{\bf H}^v}{{\bf H}^v}^T{{\bf Q}^v}^T - 2{{\bf H}^v}{{\bf H}_{yc}^T}{{\bf Q}^v}^T), \end{array} \end{equation}
    (12)
    where \({\bf H}_{yc}=[{\bf Y}, {\bf H}_c]\) .
    Let
    \begin{equation} \begin{array}{l} {{\bf Y}^v_1} = [{{\bf H}^v}{{\bf L}^v}{{\bf H}^v}^T] \odot {\bf I}\\ \quad \;\,={\bf Y}^{v+}_1-{\bf Y}^{v-}_1\\ {{\bf Y}^v_2} = [{{\bf H}^v}{{\bf H}^v}^T] \odot {\bf I}, \end{array} \end{equation}
    (13)
    where \(\odot\) is the element-wise product operator, \({\bf I}\) is the identity matrix and the matrices \({\bf Y}^{v+}_1\) and \({\bf Y}^{v-}_1\) are defined as
    \begin{equation} \begin{array}{l} {\bf Y}^{v+}_1 = [{{\bf H}^v}{{\bf D}^v}{{\bf H}^v}^T] \odot {\bf I}\\ {\bf Y}^{v-}_1 = [{{\bf H}^v}{{\bf S}^v}{{\bf H}^v}^T] \odot {\bf I}. \end{array} \end{equation}
    (14)
    Then, Equation (12) with \({\bf Q}^v\) defined as in Equation (9) can be further rewritten as
    \begin{equation} \begin{array}{l} \mathop {\min }\limits _{{{\bf W}^v} \ge 0} ||{{\bf X}^v} - {{\bf W}^v}{{\bf H}^v}||_{F}^2 + \alpha {\rm {tr}}({{\bf W}^v}{{\bf Y}^v_1}{{\bf W}^v}^T)\\ \qquad +\,\, \beta {\rm {tr}}({{\bf W}^v}{{\bf Y}^v_2}{{\bf W}^v}^T - 2{{\bf H}^v}{{\bf H}_{yc}^T}{{\bf Q}^v}^T). \end{array} \end{equation}
    (15)
    As \({{\bf W}^v} \ge 0\) , introducing the Lagrangian multipliers \(\psi _{ih}\) for the constraints \({\bf W}^v_{ih} \ge 0\) . Let \(\Psi =[\psi _{ih}]\) . Then, the Lagrangian function \({\mathcal {L}}\) is as follows:
    \begin{equation} \begin{array}{l} {\mathcal {L}} = ||{{\bf X}^v} - {{\bf W}^v}{{\bf H}^v}||_{F}^2 + \alpha {\rm {tr}}({{\bf W}^v}{{\bf Y}^v_1}{{\bf W}^v}^T)\\ \qquad +\,\, \beta {\rm {tr}}({{\bf W}^v}{{\bf Y}^v_2}{{\bf W}^v}^T - 2{{\bf H}^v}{{\bf H}_{yc}^T}{{\bf Q}^v}^T) + {\rm {tr}}(\Psi {{\bf W}^v}^T) \end{array} \end{equation}
    (16)
    The partial derivative of \({\rm {tr}}({{\bf H}^v}{{\bf H}_{yc}^T}{{\bf Q}^v}^T)\) with respect to \({\bf W}^v_{ih}\) is
    \begin{equation} \begin{array}{l} \displaystyle {\frac{{\partial {\rm {tr}}({{{\bf H}}^v}{{\bf H}}_{yc}^T{{{\bf Q}}^v}^T)}}{{\partial {{\bf W}}_{ih}^v}} = \frac{{\partial {{({{\bf H}^v}{{\bf H}_{yc}^T})}_{hh}}\sqrt {\sum \nolimits _{k = 1}^{{m^v}} {{\bf W}{{_{kh}^v}^2}} } }}{{\partial {\bf W}_{ih}^v}}}\\ \quad \;\,\displaystyle {= {({{\bf H}^v}{{\bf H}_{yc}^T})_{hh}}\frac{{{\bf W}_{ih}^v}}{{\sqrt {\sum \nolimits _{k = 1}^{{m^v}} {{\bf W}{{_{kh}^v}^2}} } }}}\\ \quad \;\,= ({\bf W}^v({{\bf Q}^v})^{-1}({{\bf H}^v}{{\bf H}_{yc}^T}\odot {\bf I}))_{ih}\\ \quad \;\,= ({\bf W}^v({{\bf Q}^v})^{-1}{{{\bf Y}}^v_3})_{ih}, \end{array} \end{equation}
    (17)
    where \({{{\bf Y}}^v_3}=[{{\bf H}^v}{{\bf H}_{yc}^T}]\odot {\bf I}\) .
    Until now, the partial derivative of (16) with respect to \({\bf W}^v\) can be written as follows:
    \begin{equation} \begin{array}{*{20}{l}} \displaystyle {\frac{{\partial \mathcal {L}}}{{\partial {{{\bf W}}^v}}}}=&\!\!\!\! 2{{\bf W}}^v{{\bf H}}^v{{\bf H}^v}^T - 2{{\bf X}^v}{{\bf H}^v}^T + 2\alpha {{{\bf W}}^v}{{{\bf Y}}^v_1}\\ &\!\!\!\!+\,\, 2\beta {{{\bf W}}^v}{{{\bf Y}}^v_2} { - 2\beta {{{\bf W}}^v}({{\bf Q}^v})^{ - 1}{{{\bf Y}}^v_3}} { + \Psi .} \end{array} \end{equation}
    (18)
    Setting above equation to zero and using the Karush–Kuhn–Tucker (KKT) condition [3] of \(\psi _{ih} {\bf W}^v_{ih}=0\) , then have
    \begin{equation} \begin{array}{l} 2({{\bf W}}^v{{\bf H}}^v{{\bf H}^v}^T)_{ih}{{\bf W}^v_{ih}} - 2({{\bf X}^v}{{\bf H}^v}^T)_{ih}{{\bf W}^v_{ih}}\\ +\,\, 2\alpha ({{\bf W}^v}{{\bf Y}^v_1})_{ih}{{\bf W}^v_{ih}} + 2\beta ({{\bf W}^v}{{\bf Y}^v_2})_{ih}{{\bf W}^v_{ih}}\\ -\,\, 2\beta ({{{\bf W}}^v}({{\bf Q}^v})^{ - 1}{{{\bf Y}}^v_3})_{ih}{{\bf W}^v_{ih}}\\ =0, \end{array} \end{equation}
    (19)
    which leads to the following update rule:
    \begin{equation} {{\bf W}}_{ih}^v = {{\bf W}}_{ih}^v\frac{{{{({{{\bf X}}^v}{{{\bf H}}^v}^T + \alpha {{{\bf W}}^v}{{\bf Y}}_1^{v - } + \beta {{{\bf W}}^v}({{{\bf Q}}^v})^{ - 1}{{\bf Y}}_3^v)}_{ih}}}}{{{{({{{\bf W}}^v}{{{\bf H}}^v}{{{\bf H}}^v}^T + \alpha {{{\bf W}}^v}{{\bf Y}}_1^{v + } + \beta {{{\bf W}}^v}{{\bf Y}}_2^v)}_{ih}}}} \end{equation}
    (20)
    (2) Fixing \({\textbf {H}_c}\) and \({\bf W}^v\) , updating \({\bf H}^v\)
    After the updating of \({\bf W}^v\) , the column vectors of \({\bf W}^v\) are normalized with \({\bf Q}^v\) in Equation (9) and then the norm is conveyed to the coefficient matrix \({\bf H}^v\) , that is:
    \begin{equation} {{{\bf W}}^v} \Leftarrow {{{\bf W}}^v}({{{\bf Q}}^v})^{ - 1},{{{\bf H}}^v} \Leftarrow {{{\bf Q}}^v}{{{\bf H}}^v}. \end{equation}
    (21)
    When \({{\bf H}_c}\) and \({\bf W}^v\) are fixed, Equation (11) is equivalent to the following problem:
    \begin{equation} \begin{array}{l} \mathop {\min }\limits _{{{\bf H}^v} \ge 0} ||{{\bf X}^v} - {{\bf W}^v}{{\bf H}^v}||_{F}^2 + \alpha {\rm {tr}}({{\bf H}^v}{{\bf L}^v}{{\bf H}^v}^T)\\ \qquad +\,\, \beta {\rm {tr}}({{\bf H}^v}{{\bf H}^v}^T - 2{{\bf H}^v}{{\bf H}_{yc}^{T}}). \end{array} \end{equation}
    (22)
    As \({{\bf H}^v} \ge 0\) , introducing the Lagrangian multipliers \(\phi _{jh}\) for the constraints \({\bf H}^v_{jh} \ge 0\) . Let \(\Phi =[\phi _{jh}]\) . Then, the Lagrangian function \({\mathcal {L}}\) is as follows:
    \begin{equation} \begin{array}{l} {\mathcal {L}} = ||{{\bf X}^v} - {{\bf W}^v}{{\bf H}^v}||_{F}^2 + \alpha {\rm {tr}}({{\bf H}^v}{{\bf L}^v}{{\bf H}^v}^T)\\ \qquad +\,\, \beta {\rm {tr}}({{\bf H}^v}{{\bf H}^v}^T - 2{{\bf H}^v}{{\bf H}_{yc}^{T}}) + {\rm {tr}}(\Phi {{\bf H}^v}^T). \end{array} \end{equation}
    (23)
    The partial derivative of (23) with respect to \({\bf H}^v\) is as follows:
    \begin{equation} \begin{array}{*{20}{l}} \displaystyle {\frac{{\partial \mathcal {L}}}{{\partial {{{\bf H}}^v}}} }=&\!\!\!\! 2{{{\bf W}}^v}^T{{{\bf W}}^v}{{{\bf H}}^v} -2{{{\bf W}}^v}^T{{{\bf X}}^v}\\ &\!\!\!\!+\,\, 2\alpha {{{\bf H}}^v}{{{\bf D}}^v} - 2\alpha {{{\bf H}}^v}{{\bf S}^v} + 2\beta {{{\bf H}}^v} - 2\beta {{{\bf H}}_{yc}} + \Phi . \end{array} \end{equation}
    (24)
    Similarly, setting \(\displaystyle {\frac{{\partial \mathcal {L}}}{{\partial {{{\bf H}}^v}}}} = 0\) and using KKT condition of \(\phi _{hj} {\bf H}^v_{hj}=0\) , then we have
    \begin{equation} \begin{array}{l} 2({{{\bf W}}^v}^T{{{\bf W}}^v}{{{\bf H}}^v})_{hj}{{\bf H}^v_{hj}} -2({{{\bf W}}^v}^T{{{\bf X}}^v})_{hj}{{\bf H}^v_{hj}} + 2\alpha ({{{\bf H}}^v}{{{\bf D}}^v})_{hj}{{\bf H}^v_{hj}}\\ -\,\, 2\alpha ({{{\bf H}}^v}{{\bf S}^v})_{hj}{{\bf H}^v_{hj}} + 2\beta ({{{\bf H}}^v})_{hj}{{\bf H}^v_{hj}} - 2\beta ({{{\bf H}}_{yc}})_{hj}{{\bf H}^v_{hj}}\\ =0, \end{array} \end{equation}
    (25)
    which leads to the following update rules:
    \begin{equation} \begin{array}{l} {{\bf H}}_{hj}^v = \displaystyle {{{\bf H}}_{hj}^v\frac{{({{{\bf W}}^v}^T{{{\bf X}}^v} + \alpha {{{\bf H}}^v}{{{\bf S}}^v} + \beta {{{\bf H}}_{yc}}{)_{hj}}}}{{({{{\bf W}}^v}^T{{{\bf W}}^v}{{{\bf H}}^v} + \alpha {{{\bf H}}^v}{{{\bf D}}^v} + \beta {{{\bf H}}^v}{)_{hj}}}}}. \end{array} \end{equation}
    (26)

    4.1.2 Fixing W and Hv, updating Hc.

    As \({\bf W}^v\) is normalized in each iteration, the partial derivative of (8) with respect to \({\bf H}_c\) is as follows:
    \begin{equation} \begin{array}{{l}} \displaystyle {\frac{{\partial {O_{\rm {FRSMNMF}}}}}{{\partial {{{\bf H}}_c}}} = \frac{{\partial \sum \nolimits _{v = 1}^{{V}} {\beta ||{{{\bf H}_l^v}} - {{{\bf H}}_c}||_F^2} }}{{\partial {{{\bf H}}_c}}}}\\ \qquad \qquad \quad = \sum \limits _{v = 1}^{{V}} {[ - 2\beta {{{\bf H}_l^v}} + 2\beta {{{\bf H}}_c}]} = 0. \end{array} \end{equation}
    (27)
    Then, the exact solution for \({{\bf H}}_c\) is
    \begin{equation} {{{\bf H}}_c} = \frac{{\sum \nolimits _{v = 1}^{{V}} {{{{\bf H}_l^v}}} }}{{{V}}} \ge 0. \end{equation}
    (28)
    The optimizing scheme of FRSMNMF_N1 is summarized in Algorithm 1.

    4.2 Optimization Algorithm of FRSMNMF_N2

    Comparing with FRSMNMF_N1, the main difference of optimizing procedure for FRSMNMF_N2 lies in the updating rule of \({\bf W}^v\) . So, in this section, only the updating rule of \({\bf W}^v\) is deduced. Fixing \({\bf H}_c\) and \({\bf H}^v\) , the objective function of FRSMNMF_N2 with respect to \({\bf W}^v\) is written as follows:
    \begin{equation} \begin{array}{l} \mathop {\min }\limits _{{{\bf W}^v} \ge 0} ||{{\bf X}^v} - {{\bf W}^v}{{\bf H}^v}||_{F}^2 + \alpha {\rm {tr}}({{\bf Q}^v}{{\bf H}^v}{{\bf L}^v}{{\bf H}^v}^T{{\bf Q}^v}^T)\\ {[}10pt] +\,\, \beta ||[{\bf Y}, {\bf H}_c] -{{\bf Q}^v}{{\bf H}^v}||_F^2, \end{array} \end{equation}
    (29)
    where \({\bf Q}^v\) is defined as in Equation (10).
    As \({{\bf W}^v} \ge 0\) , introducing the Lagrangian multipliers \(\psi _{ih}\) for the constraints \({\bf W}^v_{ih} \ge 0\) . Let \(\Psi =[\psi _{ih}]\) . Then, the Lagrangian function \({\mathcal {L}}\) is as follows:
    \begin{equation} \begin{array}{l} {\mathcal {L}} = ||{{\bf X}^v} - {{\bf W}^v}{{\bf H}^v}||_{F}^2 + \alpha {\rm {tr}}({{\bf Q}^v}{{\bf H}^v}{{\bf L}^v}{{\bf H}^v}^T{{\bf Q}^v}^T)\\ {[}5pt] \qquad +\,\, \beta {\rm {tr}}({{\bf Q}^v}{{\bf H}^v}{{\bf H}^v}^T{{\bf Q}^v}^T - 2{{\bf H}^v}{{\bf H}_{yc}}^T{{\bf Q}^v}^T) + {\rm {tr}}(\Psi {{\bf W}^v}^T). \end{array} \end{equation}
    (30)
    The partial derivative of (30) with respect to \({\bf W}^v\) can be written as follows:
    \begin{equation} \begin{array}{l} \displaystyle {\frac{{\partial L}}{{\partial {{\bf W}}_{ih}^v}} }= 2{({{{\bf W}}^v}{{{\bf H}}^v}{{{\bf H}}^v}^T)_{ih}} - 2({{{\bf X}}^v}{{{\bf H}}^{vT}})_{ih}\\ {[}5pt] +\,\, 2\alpha {({{\bf Q}^v}{{\bf Y}}_1^v)_{hh}} + 2\beta {({{\bf Q}^v}{{\bf Y}}_2^v)_{hh}} - 2\beta {({{\bf Y}}_3^v)_{hh}} + {\psi _{ih}}, \end{array} \end{equation}
    (31)
    where \({\bf Y}^v_1\) , \({\bf Y}^v_2\) and \({{\bf Y}^v_3}\) are defined in Equation (13) and Equation (17).
    Setting the above equation to zero and using the KKT condition of \(\psi _{ih} {\bf W}^v_{ih}=0\) , then we have
    \begin{equation} \begin{array}{l} 2{({{{\bf W}}^v}{{{\bf H}}^v}{{{\bf H}}^v}^T)_{ih}}{{\bf W}}_{ih}^v - 2{({{{\bf X}}^v}{{{\bf H}}^{vT}})_{ih}}{{\bf W}}_{ih}^v\\ {[}3pt] +\,\, 2\alpha {({{\bf Q}^v}{{\bf Y}}_1^v)_{hh}} + 2\beta {({{\bf Q}^v}{{\bf Y}}_2^v)_{hh}}{{\bf W}}_{ih}^v\\ {[}3pt] -\,\, 2\beta {({{\bf Y}}_3^v)_{hh}} = 0, \end{array} \end{equation}
    (32)
    which leads to the following update rule:
    \begin{equation} {{\bf W}}_{ih}^v = {{\bf W}}_{ih}^v\frac{{{{({{{\bf X}}^v}{{{\bf H}}^{vT}})}_{ih}} + {{(\alpha {{{\bf Q}}^v}{{\bf Y}}_1^{v - } + \beta {{\bf Y}}_3^v)}_{hh}}}}{{{{({{{\bf W}}^v}{{{\bf H}}^v}{{{\bf H}}^v}^T)}_{ih}} + {{(\alpha {{{\bf Q}}^v}{{\bf Y}}_1^{v + } + \beta {{{\bf Q}}^v}{{\bf Y}}_2^v)}_{hh}}}}. \end{equation}
    (33)
    The updating rules of \({{\bf H}^v}\) and \({{\bf H}_c}\) in FRSMNMF_N2 are the same as those in Equation (26) and Equation (28). The optimizing scheme of FRSMNMF_N2 is summarized in Algorithm 2. The convergence proofs for FRSMNMF_N1 and FRSMNMF_N2 are presented in the Appendix.

    4.3 Computational Complexity Analysis of FRSMNMF_N1 and FRSMNMF_N2

    In this section, the computational complexity of FRSMNMF_N1 and FRSMNMF_N2 is analyzed. The computational complexity is expressed in a big \(O\) notation [12]. For a specific view \(v\) of FRSMNMF_N1 in one iteration, updating \({\bf W}^v\) needs to calculate \({{\bf W}^v}{{\bf H}^v}{{\bf H}^v}^T\) , \({{\bf X}^v}{{\bf H}^v}^T\) , \({{\bf W}^v}{\bf Y}_1^v\) , \({{\bf W}^v}{\bf Y}_2^v,\) and \({{\bf W}^v}({{\bf Q}^v})^{ - 1}{\bf Y}_3^v\) . \({{\bf W}^v}{{\bf H}^v}{{\bf H}^v}^T\) and \({{\bf X}^v}{{\bf H}^v}^T\) cost \(O((n + m^v)d^{v2})\) and \(O(m^v n d^v),\) respectively. Total cost of \({{\bf W}^v}{\bf Y}_1^v\) , \({{\bf W}^v}{\bf Y}_2^v\) and \({{\bf W}^v}({{\bf Q}^v})^{ - 1}{\bf Y}_3^v\) is \(O(m^vd^v + Kd^v n + d^{v2} n)\) . \(K\) is the number of \(k\) nearest neighbors in graph. So, the cost for updating \({\bf W}^v\) is \(O(Kd ^v n + m^v n d^v)\) . Normalize \({\bf W}^v\) and \({\bf H}^v\) in Equation (21) needs \(O(nd^v + m^v d^v)\) computation. The main cost of updating \({\bf H}^v\) is on the calculation of \({{\bf W}^v}^T{{\bf X}^v}\) , \({{\bf W}^v}^T{{\bf W}^v}{{\bf H}^v}\) , \({\bf H}^v{\bf S}^v\) and \({\bf H}^v{\bf D}^v\) . The cost on \({{\bf W}^v}^T{{\bf X}^v}\) and \({{\bf W}^v}^T{{\bf W}^v}{{\bf H}^v}\) are \(O(m^v n d^v)\) and \(O((n + m^v)d^{v2})\) . While \({{\bf H}^v}{{\bf S}^v}\) and \({{\bf H}^v}{{\bf D}^v}\) need \(O(Kd^v n)\) and \(O(d^v n)\) computation respectively. The cost for updating \({\bf H}^v\) is \(O(Kd^v n + m^v n d^v)\) . Comparing with the cost of updating \({\bf W}^v\) and \({\bf H}^v\) , the computation cost on updating \({\bf H}_c\) is ignorable. So, the final cost for FRSMNMF_N1 is \(O(t V m^v n d^v)\) . \(t\) is the number of iterations and \(V\) denotes the number of views.
    For FRSMNMF_N2, the main difference of computation cost resides on the updating of \({\bf W}^v\) . Except the computation cost on calculating \({{\bf W}^v}{{\bf H}^v}{{\bf H}^v}^T\) and \({{\bf X}^v}{{\bf H}^v}^T\) , it also need to compute \({{\bf Q}^v}{\bf Y}_1^v\) , \({{\bf Q}^v}{\bf Y}_2^v\) and \({\bf Y}_3^v\) . The total computation cost on these additional terms is \(O(Kd^v n + d^{v2}n)\) . So, the cost for updating \({\bf W}^v\) in FRSMNMF_N2 is \(O(Kd^v n + m^v n d^v)\) . The final cost for FRSMNMF_N2 is \(O(t V m^v n d^v)\) .

    5 Experiments

    5.1 Evaluation Metrics

    Two widely used clustering metrics are adopted to evaluate the performance of the methods: clustering accuracy (AC) [49] and normalized mutual information (NMI) [40].
    AC is defined as
    \begin{equation} {\rm {AC}} = \frac{{\sum \nolimits _{i = 1}^n {\delta (gn{d_i},map({z_i}))} }}{n}, \end{equation}
    (34)
    where \(n\) is the number of samples in each multiview dataset. \(gnd_i\) is the ground truth label of \(x_i\) . \(z_i\) corresponds to the label that is obtained by clustering methods due to the sample \(x_i\) . \(map(\cdot)\) is an optimal mapping function [33] which can map the obtained label \(z_i\) to match the ground truth label provided by the datasets. \(\delta (a,b)\) is an indicator function. \(\delta (a,b) = 1\) , when \(a=b\) , otherwise, \(\delta (a,b) = 0\) .
    NMI is defined as follows:
    \begin{equation} {\rm {NMI}}(C,\hat{C}){\rm { = }}\frac{{{\rm {MI}}(C,\hat{C})}}{{\sqrt {{\rm {H}}(C){\rm {H}}(\hat{C})} }}, \end{equation}
    (35)
    where \(C\) and \(\hat{C}\) are two cluster sets, one for ground truth and another for the label obtained by clustering methods. \(\rm {H}(C)\) and \(\rm {H}(\hat{C})\) denote the entropy of cluster sets \(C\) and \(\hat{C}\) . \(\rm {MI}(C,\hat{C})\) is the mutual information between \(C\) and \(\hat{C}\) , which is defined as
    \begin{equation} {\rm {MI}}(C,\hat{C}) = \sum \limits _{{c_i} \in C,{{\hat{c}}_j} \in \hat{C}} {p({c_i},{{\hat{c}}_j}){{\log }_2}\frac{{p({c_i},{{\hat{c}}_j})}}{{p({c_i})p({{\hat{c}}_j})}}}, \end{equation}
    (36)
    where \(p(c_i,c_j)\) denotes the joint probability that a randomly selected item belongs to \(c_i\) and \(c_j\) simultaneously, while \(p(c_i)\) and \(p(c_j)\) denote the probability that a randomly selected item belongs to \(c_i\) and \(c_j\) , respectively.

    5.2 Datasets

    In the experiments, six multiview datasets are used to evaluate the effectiveness of the proposed method on the clustering task. The statistics are summarized in Table 1.
    Table 1.
    datasetSizeViewsFeaturesClasses
    Yale16532048/256/102415
    ORL40032048/256/102440
    FEI70032048/256/102450
    3sources16933560/3631/30686
    Texas1872187/17035
    Carotid1378217/172
    Table 1. Database Description
    (1)
    Yale 1 Dataset. This dataset contains 165 grayscale images collected from 15 people. Each person has 11 images with different facial expressions or configurations and all these images are normalized to 32 \(\times\) 32 pixel array.
    (2)
    FEI Part 1 2 Dataset. The FEI part 1 dataset is a subset of the original FEI database. This dataset contains 700 color images captured from 50 people. Each person has 14 images taken from different views. The original 640 \(\times\) 480 resolution images were downsampled to 32 \(\times\) 24 grayscale pixels.
    (3)
    ORL 3 Dataset. This dataset has 400 grayscale face images collected from 40 people, each 10 images. These images are taken in different light conditions, with different facial expression and with/without glasses. From this database, two subsets are produced for testing.
    (4)
    Texas4 Dataset. The dataset contains 187 documents over the 5 labels (student, project, course, staff, faculty). The documents are described by 1,703 words in the content view, and by 187 links between them in cites views.
    (5)
    3sources [16] Dataset. This dataset has 948 texts collected from three news sources, i.e., BBC, Reuters, and the Guardian. In this article, the experiments are conducted on the subset with 169 news articles and six topical labels that are reported in all three sources. The dimensions of this dataset’s three views are 3,560; 3,631; and 3,068, respectively.
    (6)
    Carotid Dataset. This dataset is an Electronic Medical Record dataset which has 1,378 records with 34 attributes. This dataset is a subset of the raw data collected from stroke screening and prevention project conducted by Shenzhen Second People’s Hospital (Ethical approval Number: 20200116002). There are two classes in this dataset, i.e., abnormal carotid artery and normal carotid artery. Two views are constructed by splitting the whole attributes evenly, each has 17 attributes. The whole attributes are ranked by mean feature importance score of six feature importance scoring methods [19]. Odd-ranked attributes are used to construct one view and even-ranked attributes are used to construct the other view. The feature importance scores are provided by the work in Reference [19].
    For Yale, ORL, and FEI datasets, the multiple views are the image intensity, Gabor [23], and LBP [37] with the dimensions of 1,024; 2,048; and 256, respectively.
    The clustering results are obtained by conducting k-means clustering method on the learned representations of above algorithms. To avoid errors from randomness, all unsupervised methods were tested 10 times, and the average value is reported. To evaluate the performance of semi-supervised methods, for parameter analysis experiments, 10 \(\%\) , 20 \(\%,\) and 30 \(\%\) data points are labeled randomly for 5 times, and the average clustering results are reported. For comparison experiments, 10 \(\%\) , 20 \(\%,\) and 30 \(\%\) data points are labeled randomly for 20 times, and the average clustering results and statistically significant differences (p-value) are reported.

    5.3 Comparative Algorithms

    To evaluate the effectiveness of the proposed methods, they are compared with several representative NMF-based multi-view clustering approaches, including seven unsupervised ones and four semi-supervised ones. These methods are described as follows:
    VAGNMF. Conduct GNMF [4] on each view individually and average feature of multiple views are treated as final representation.
    VCGNMF. Conduct GNMF on each view individually and concatenate the features of multiple views as final representation.
    LP-DiNMF [42]. The objective function of this method involves a diverse term which is defined to enforce the heterogeneity of the different views. Local geometric structure information is also utilized to improve the performance.
    rNNMF [10]. This method attempts to deal with the noise and outliers among the views through defining a reconstruction term and a neighbor-structure-preserving term with \(\ell _{2,1}\) -norm.
    MPMNMF_1 [46]. This method has adopted the Euclidean distance based pair-wise co-regularization to align multiple features. Graph regularization is constructed to encode the geometry information of each view.
    MPMNMF_2 [46]. This method has adopted the kernel based pair-wise co-regularization to align multiple features. Graph regularization is constructed to encoding the geometry information of each view.
    UDNMF [52]. This method factorizes the each view into three matrices and constrain the columns of product matrix of basis matrix and shared embedding matrix to be unit vector. Graph regularization is imposed on the coefficient matrix of each view. Centroid co-regularization is used to learn a common consensus matrix as the final representation.
    MVCDMF [60]. This is a deep multi-view semi-NMF algorithm. “semi-nonnegative” means that it only constrains the shared coefficient matrix to be nonnegative. Graph regularization is also used to utilize the geometrical structure information of data. The shared coefficient matrix is used as the final representation.
    MvDGNMF [26]. This is a deep multi-view NMF algorithm. It iteratively factorizes the coefficient matrix in each layer to get hierarchical representations of data. Similar to UDNMF, it adopts centroid co-regularization to fuse the multiple views to obtain the final representation.
    AMVNMF [43]. This method factorizes each view in CNMF [30] framework and try to learn a consensus auxiliary matrix. The entries summation of each columns of auxiliary matrix is constrained to be 1. An auto-weighting strategy is imposed on the consensus learning terms.
    MVCNMF [5]. This method factorizes each view in CNMF framework and align multiple views with Euclidean distance based pair-wise co-regularization. \(\ell _{2,1}\) -norm regularization is imposed on the auxiliary matrix in each view.
    MVOCNMF [6]. This method factorizes each view in CNMF framework and align multiple views with Euclidean distance based pair-wise co-regularization. An orthonormality constraint is imposed on the auxiliary matrix in each view to normalize the scale of the feature.
    GPSNMF [28]. This method is a semi-supervised partially shared MVNMF, which tries to make use of both distinct and shared information of multiple views. A graph regularization is also constructed for each view.
    The results of above methods with optimal parameter settings (obtained by grid search) are reported. Note that, before conducting all methods, the data points of the original data are normalized as unit vectors. The source code of this paper is available at: https://github.com/GuoshengCui/FRSMNMF.

    5.4 Convergence Analysis

    To verify the convergence of FRSMNMF_N1 and FRSMNMF_N2, the convergence curves on six datasets with labeling ratios of 10 \(\%\) , 20 \(\%,\) and 30 \(\%\) are visualized in Figure 1. We also visualize the variation of clustering performance with different iterations. In each figure, the left y-axis denotes the objective function value, the right y-axis denotes the clustering accuracy and the x-axis is the iteration number. We can see that the proposed methods converge on all datasets and the clustering performances on all datasets become stable in 300 iterations.
    Fig. 1.
    Fig. 1. Convergence curves and clustering performances of FRSMNMF_N1 and FRSMNMF_N2 with labeling ratios of 10 \(\%\) , 20 \(\%\) and 30 \(\%\) on six datasets. (Best viewed in color.)

    5.5 Parameter Analysis and Comparisons

    5.5.1 Parameter Analysis.

    There are three hyper-parameters, i.e., the graph regularization parameter \(\alpha\) , the fusion regularization parameter \(\beta\) and \(k\) nearest neighbor parameter, in FRSMNMF_N1 and FRSMNMF_N2. The number of hyper-parameters in our methods is the same as the most unsupervised MVNMFs, such as LP-DiNMF, MPMNMF \(\_\) 1, MPMNMF \(\_\) 2 and UDNMF; and less than the most semi-supervised MVNMFs, such as MVCNMF, MVOCNMF, GPSNMF, and the method in [31]. Figure 2 demonstrates the influence of parameters \(\alpha\) and \(\beta\) to FRSMNMF_N1 and FRSMNMF_N2 on six datasets with different labeling ratios, i.e., 10 \(\%\) , 20 \(\%\) , and 30 \(\%\) . From this figure, we can see that the performance varying trends of the proposed methods on \(\alpha\) and \(\beta\) are very similar. This phenomenon indicates that the value of \(\alpha /\beta\) is relatively stable on all datasets. The best parameter settings of the proposed methods on six datasets with different labeling ratios are demonstrated on Table 2. We can see that the value of \(\alpha /\beta\) with best results are always locating in the sets of \(\lbrace 1, 10, 100\rbrace\) with \(\alpha\) around \(10^5\) or \(10^1\) .
    Table 2.
     FRSMNMF_N1FRSMNMF_N2
     10 \(\%\) 20 \(\%\) 30 \(\%\) 10 \(\%\) 20 \(\%\) 30 \(\%\)
    Yale(0, \(-\) 1)(5, 4)(5, 4)(2, 1)(2, 0)(2, 1)
    ORL(0, \(-\) 1)(0, \(-\) 1)(0, \(-\) 1)(3, 2)(5, 5)(5, 4)
    FEI(5, 4)(5, 3)(5, 4)(5, 4)(5, 3)(5, 4)
    3sources(0, \(-\) 1)(0, \(-\) 1)(0, \(-\) 1)(0, \(-\) 2)(0, \(-\) 2)(0, \(-\) 1)
    Texas(5, 5)(1, \(-\) 1)(0, \(-\) 1)(5, 5)(5, 5)(1, \(-\) 1)
    Carotid(5, 5)(5, 5)(5, 5)(5, 5)(5, 5)(5, 5)
    Table 2. The Best Parameter Settings of FRSMNMF_N1 and FRSMNMF_N2 on Six Datasets, the Labeling Ratios are 10 \(\%\) , 20 \(\%\) , and 30 \(\%\)
    The values in brackets are (log( \(\alpha\) ), \(\\)\) log( \(\beta\) )).
    Fig. 2.
    Fig. 2. Clustering performances of FRSMNMF_N1 and FRSMNMF_N2 versus parameters \(\alpha\) and \(\beta\) with labeling ratios of 10 \(\%\) , 20 \(\%,\) and 30 \(\%\) on six datasets.
    Figure 3 has demonstrated the performance variation of GPSNMF, FRSMNMF_N1, and FRSMNMF_N2 versus the parameter \(k\) on six datasets with different labeling ratios. We can see that on most datasets, FRSMNMF_N1 and FRSMNMF_N2 have obtained better performance than GPSNMF on a large range of \(k\) . Basically, the varying trends of these three methods on six datasets are familiar.
    Fig. 3.
    Fig. 3. Clustering performances of FRSMNMF_N1, FRSMNMF_N2, and GPSNMF versus parameter \(k\) nearest neighbors with labeling ratios of 10 \(\%\) , 20 \(\%\) and 30 \(\%\) on six datasets.

    5.5.2 Comparisons.

    In Table 3, the comparing results of our methods and several recently proposed semi-supervised MVNMFs are demonstrated. Statistical significant differences (p-value) between FRSMNMF_N1 and the other methods are also reported. AMVNMF, MVCNMF, and MVOCNMF are all CNMF-based MVNMFs, and their performances are obviously not as good as GPSNMF, FRSMNMF_N1, and FRSMNMF_N2. One reason is that, the former three methods have not considered geometrical information of the data. The second reason is that, the former three methods have all failed to explore the discriminative information of the data, although they have considered intra-class compactness due to the basis of CNMF. On Yale, ORL, FEI, and 3sources datasets, the advantage of FRSMNMF_N1 is significant comparing to the other semi-supervised MVNMF methods (except FRSMNMF_N2). On Texas and Carotid datasets, the advantage of FRSMNMF_N1 is significant comparing to AMVNMF, MVCNMF and MVOCNMF. On these two datasets, the performances of GPSNMF are approaching FRSMNMF_N1 as the labeling ratios increase. Overall, on all datasets, FRSMNMF_N1 usually report the best results. And in many cases, the performance differences of FRSMNMF_N1 and FRSMNMF_N2 are not that significant. Although FRSMNMF_N1 is slightly better than FRSMNMF_N2 which indicates that the former feature normalizing strategy works better.
    Table 3.
    DatasetsYaleORL
     10 \(\%\) 20 \(\%\) 30 \(\%\) 10 \(\%\) 20 \(\%\) 30 \(\%\)
     ACp-valueACp-valueACp-valueACp-valueACp-valueACp-value
    AMVNMF [43]49.97 \(\lt\) 0.00155.80 \(\lt\) 0.00162.95 \(\lt\) 0.00163.65 \(\lt\) 0.00168.02 \(\lt\) 0.00172.57 \(\lt\) 0.001
    MVCNMF [5]50.93 \(\lt\) 0.00157.93 \(\lt\) 0.00163.56 \(\lt\) 0.00166.17 \(\lt\) 0.00170.07 \(\lt\) 0.00174.22 \(\lt\) 0.001
    MVOCNMF [6]51.14 \(\lt\) 0.00158.03 \(\lt\) 0.00164.98 \(\lt\) 0.00166.40 \(\lt\) 0.00170.75 \(\lt\) 0.00175.30 \(\lt\) 0.001
    GPSNMF [28]57.73 \(\lt\) 0.00167.50 \(\lt\) 0.00174.06 \(\lt\) 0.00168.82 \(\lt\) 0.00181.24 \(\lt\) 0.00184.09 \(\lt\) 0.001
    FRSMNMF_N262.250.77770.790.48478.000.90372.05 \(\lt\) 0.00182.65 \(\lt\) 0.00187.980.015
    FRSMNMF_N162.0171.4178.0776.4085.4889.24
     NMIp-valueNMIp-valueNMIp-valueNMIp-valueNMIp-valueNMIp-value
    AMVNMF [43]53.11 \(\lt\) 0.00159.62 \(\lt\) 0.00166.42 \(\lt\) 0.00179.27 \(\lt\) 0.00182.23 \(\lt\) 0.00184.81 \(\lt\) 0.001
    MVCNMF [5]54.60 \(\lt\) 0.00161.62 \(\lt\) 0.00167.19 \(\lt\) 0.00181.07 \(\lt\) 0.00183.69 \(\lt\) 0.00186.01 \(\lt\) 0.001
    MVOCNMF [6]54.59 \(\lt\) 0.00161.51 \(\lt\) 0.00168.13 \(\lt\) 0.00181.13 \(\lt\) 0.00184.02 \(\lt\) 0.00186.64 \(\lt\) 0.001
    GPSNMF [28]59.27 \(\lt\) 0.00166.00 \(\lt\) 0.00172.05 \(\lt\) 0.00183.35 \(\lt\) 0.00190.04 \(\lt\) 0.00191.46 \(\lt\) 0.001
    FRSMNMF_N261.110.00168.030.29875.150.98083.97 \(\lt\) 0.00189.77 \(\lt\) 0.00192.330.020
    FRSMNMF_N163.1468.7775.1686.6390.9693.05
    DatasetsFEI3sources
     10 \(\%\) 20 \(\%\) 30 \(\%\) 10 \(\%\) 20 \(\%\) 30 \(\%\)
     ACp-valueACp-valueACp-valueACp-valueACp-valueACp-value
    AMVNMF [43]56.65 \(\lt\) 0.00160.06 \(\lt\) 0.00166.36 \(\lt\) 0.00159.27 \(\lt\) 0.00165.83 \(\lt\) 0.00170.79 \(\lt\) 0.001
    MVCNMF [5]59.19 \(\lt\) 0.00163.58 \(\lt\) 0.00169.60 \(\lt\) 0.00161.65 \(\lt\) 0.00169.83 \(\lt\) 0.00172.72 \(\lt\) 0.001
    MVOCNMF [6]59.88 \(\lt\) 0.00163.47 \(\lt\) 0.00169.68 \(\lt\) 0.00157.44 \(\lt\) 0.00172.30 \(\lt\) 0.00181.12 \(\lt\) 0.001
    GPSNMF [28]81.45 \(\lt\) 0.00184.56 \(\lt\) 0.00188.84 \(\lt\) 0.00180.79 \(\lt\) 0.00188.30 \(\lt\) 0.00191.05 \(\lt\) 0.001
    FRSMNMF_N283.670.50687.030.42891.780.79687.910.02092.060.08393.45 \(\lt\) 0.001
    FRSMNMF_N183.8886.7991.8389.9193.3095.24
     NMIp-valueNMIp-valueNMIp-valueNMIp-valueNMIp-valueNMIp-value
    AMVNMF [43]76.13 \(\lt\) 0.00178.16 \(\lt\) 0.00182.36 \(\lt\) 0.00155.62 \(\lt\) 0.00161.02 \(\lt\) 0.00165.74 \(\lt\) 0.001
    MVCNMF [5]77.43 \(\lt\) 0.00180.33 \(\lt\) 0.00184.35 \(\lt\) 0.00160.00 \(\lt\) 0.00166.76 \(\lt\) 0.00166.67 \(\lt\) 0.001
    MVOCNMF [6]77.79 \(\lt\) 0.00180.16 \(\lt\) 0.00184.37 \(\lt\) 0.00151.77 \(\lt\) 0.00159.36 \(\lt\) 0.00168.13 \(\lt\) 0.001
    GPSNMF [28]90.040.00991.510.11193.60 \(\lt\) 0.00167.39 \(\lt\) 0.00175.00 \(\lt\) 0.00178.56 \(\lt\) 0.001
    FRSMNMF_N290.530.73991.930.65995.040.89574.57 \(\lt\) 0.00182.210.02984.04 \(\lt\) 0.001
    FRSMNMF_N190.8891.8495.0679.3984.7088.06
    DatasetsTexasCarotid
     10 \(\%\) 20 \(\%\) 30 \(\%\) 10 \(\%\) 20 \(\%\) 30 \(\%\)
     ACp-valueACp-valueACp-valueACp-valueACp-valueACp-value
    AMVNMF [43]45.55 \(\lt\) 0.00149.66 \(\lt\) 0.00155.73 \(\lt\) 0.00153.47 \(\lt\) 0.00156.61 \(\lt\) 0.00154.53 \(\lt\) 0.001
    MVCNMF [5]64.61 \(\lt\) 0.00167.80 \(\lt\) 0.00170.77 \(\lt\) 0.00156.47 \(\lt\) 0.00160.49 \(\lt\) 0.00161.65 \(\lt\) 0.001
    MVOCNMF [6]66.32 \(\lt\) 0.00167.91 \(\lt\) 0.00169.80 \(\lt\) 0.00156.43 \(\lt\) 0.00160.01 \(\lt\) 0.00165.45 \(\lt\) 0.001
    GPSNMF [28]64.57 \(\lt\) 0.00170.12 \(\lt\) 0.00175.250.03270.230.06275.560.08678.880.194
    FRSMNMF_N269.290.22472.230.00273.54 \(\lt\) 0.00171.040.87075.910.68779.191.000
    FRSMNMF_N169.9673.9677.2071.1176.0379.19
     NMIp-valueNMIp-valueNMIp-valueNMIp-valueNMIp-valueNMIp-value
    AMVNMF [43]20.31 \(\lt\) 0.00123.37 \(\lt\) 0.00128.57 \(\lt\) 0.0010.93 \(\lt\) 0.0013.41 \(\lt\) 0.0012.11 \(\lt\) 0.001
    MVCNMF [5]32.27 \(\lt\) 0.00136.23 \(\lt\) 0.00141.71 \(\lt\) 0.0011.98 \(\lt\) 0.0014.88 \(\lt\) 0.0018.65 \(\lt\) 0.001
    MVOCNMF [6]32.49 \(\lt\) 0.00136.14 \(\lt\) 0.00140.78 \(\lt\) 0.0011.90 \(\lt\) 0.0014.51 \(\lt\) 0.0017.06 \(\lt\) 0.001
    GPSNMF [28]26.440.09839.960.18553.80 \(\lt\) 0.00112.500.12520.290.33326.000.501
    FRSMNMF_N228.870.73137.01 \(\lt\) 0.00147.100.52513.410.94520.490.65626.311.000
    FRSMNMF_N129.1641.5246.4213.4520.7026.31
    Table 3. Clustering Performance (AC( \(\%\) ) and NMI( \(\%\) )) of Several Semi-supervised MVNMFs on Six Datasets with Different Labeling Ratios
    The best results are highlighted in bold and the second best results are in italic.
    Our methods are also compared with several recently proposed unsupervised MVNMFs. The comparing results are shown in Table 4. The last two rows are our methods with 10 \(\%\) labeled data points. Similarly, statistical significant differences (p-value) between FRSMNMF_N1 and the other methods are also reported. We can see that the advantage of the proposed FRSMNMF_N1 and FRSMNMF_N2 is very obvious comparing to the other methods. On ORL and 3sources datasets, FRSMNMF_N1 is superior to FRSMNMF_N2. On Yale, FEI, Texas and Carotid datasets, the performance differences of FRSMNMF_N1 and FRSMNMF_N2 are not that significant. And on all datasets, FRSMNMF_N1 is significantly better than the other unsupervised MVNMFs.
    Table 4.
    DatasetsYaleORLFEI3sourcesTexasCarotid
     ACp-valueACp-valueACp-valueACp-valueACp-valueACp-value
    VCGNMF [4]54.24 \(\lt\) 0.00171.33 \(\lt\) 0.00169.54 \(\lt\) 0.00159.17 \(\lt\) 0.00162.51 \(\lt\) 0.00161.51 \(\lt\) 0.001
    VAGNMF [4]50.00 \(\lt\) 0.00169.13 \(\lt\) 0.00168.83 \(\lt\) 0.00162.66 \(\lt\) 0.00154.28 \(\lt\) 0.00157.17 \(\lt\) 0.001
    LP-DiNMF [42]50.85 \(\lt\) 0.00170.43 \(\lt\) 0.00169.54 \(\lt\) 0.00162.78 \(\lt\) 0.00155.35 \(\lt\) 0.00157.03 \(\lt\) 0.001
    rNNMF [10]50.42 \(\lt\) 0.00165.58 \(\lt\) 0.00155.26 \(\lt\) 0.00162.07 \(\lt\) 0.00160.37 \(\lt\) 0.00157.89 \(\lt\) 0.001
    UDNMF [52]47.88 \(\lt\) 0.00167.95 \(\lt\) 0.00174.71 \(\lt\) 0.00162.78 \(\lt\) 0.00154.71 \(\lt\) 0.00160.86 \(\lt\) 0.001
    MPMNMF \(\_\) 1 [46]50.85 \(\lt\) 0.00170.40 \(\lt\) 0.00169.04 \(\lt\) 0.00172.13 \(\lt\) 0.00165.72 \(\lt\) 0.00157.20 \(\lt\) 0.001
    MPMNMF \(\_\) 2 [46]50.12 \(\lt\) 0.00169.20 \(\lt\) 0.00168.76 \(\lt\) 0.00165.27 \(\lt\) 0.00161.71 \(\lt\) 0.00156.99 \(\lt\) 0.001
    MVCDMF [60]53.52 \(\lt\) 0.00169.28 \(\lt\) 0.00168.84 \(\lt\) 0.00177.28 \(\lt\) 0.00156.74 \(\lt\) 0.00157.49 \(\lt\) 0.001
    MvDGNMF [26]50.79 \(\lt\) 0.00170.45 \(\lt\) 0.00174.17 \(\lt\) 0.00171.66 \(\lt\) 0.00158.13 \(\lt\) 0.00156.79 \(\lt\) 0.001
    FRSMNMF_N262.250.77772.05 \(\lt\) 0.00183.670.25487.91 \(\lt\) 0.00169.290.05071.040.333
    FRSMNMF_N162.0176.4083.8889.9169.9671.11
     NMIp-valueNMIp-valueNMIp-valueNMIp-valueNMIp-valueNMIp-value
    VCGNMF [4]57.55 \(\lt\) 0.00184.68 \(\lt\) 0.00185.40 \(\lt\) 0.00158.68 \(\lt\) 0.00128.70 \(\lt\) 0.0013.97 \(\lt\) 0.001
    VAGNMF [4]52.74 \(\lt\) 0.00183.67 \(\lt\) 0.00184.48 \(\lt\) 0.00156.50 \(\lt\) 0.00121.48 \(\lt\) 0.0012.36 \(\lt\) 0.001
    LP-DiNMF [42]53.81 \(\lt\) 0.00184.65 \(\lt\) 0.00184.50 \(\lt\) 0.00156.37 \(\lt\) 0.00124.49 \(\lt\) 0.0012.34 \(\lt\) 0.001
    rNNMF [10]52.56 \(\lt\) 0.00180.91 \(\lt\) 0.00174.13 \(\lt\) 0.00157.83 \(\lt\) 0.00127.89 \(\lt\) 0.0012.31 \(\lt\) 0.001
    UDNMF [52]50.69 \(\lt\) 0.00182.22 \(\lt\) 0.00187.17 \(\lt\) 0.00158.41 \(\lt\) 0.00122.93 \(\lt\) 0.0014.44 \(\lt\) 0.001
    MPMNMF \(\_\) 1 [46]54.39 \(\lt\) 0.00184.21 \(\lt\) 0.00185.09 \(\lt\) 0.00165.71 \(\lt\) 0.00131.48 \(\lt\) 0.0012.38 \(\lt\) 0.001
    MPMNMF \(\_\) 2 [46]52.54 \(\lt\) 0.00184.46 \(\lt\) 0.00185.08 \(\lt\) 0.00158.61 \(\lt\) 0.00126.46 \(\lt\) 0.0012.59 \(\lt\) 0.001
    MVCDMF [60]56.05 \(\lt\) 0.00184.27 \(\lt\) 0.00184.54 \(\lt\) 0.00171.59 \(\lt\) 0.00125.12 \(\lt\) 0.0012.16 \(\lt\) 0.001
    MvDGNMF [26]53.44 \(\lt\) 0.00184.53 \(\lt\) 0.00187.15 \(\lt\) 0.00161.67 \(\lt\) 0.00128.39 \(\lt\) 0.0012.11 \(\lt\) 0.001
    FRSMNMF_N261.110.00183.97 \(\lt\) 0.00190.530.37274.57 \(\lt\) 0.00128.870.02613.410.691
    FRSMNMF_N163.1486.6390.8879.3929.1613.45
    Table 4. Clustering Performance (AC( \(\%\) ) and NMI( \(\%\) )) of the Proposed Methods and Several Unsupervised MVNMFs on Six Datasets
    The best results are highlighted in bold and the second best results are in italic.
    We also have tested the performance variation of all semi-supervised MVNMFs with different labeling ratios. The varying curves are demonstrated in Figure 4 with the mean results of all unsupervised methods as baseline. We can see that on the whole labeling ratios, GPSNMF, FRSMNMF_N1 and FRSMNMF_N2 are above the baseline. When the number of labeled data points is small, AMVNMF, MVCNMF and MVOCNMF sometimes behave worse than the baseline. Note that all unsupervised MVNMFs have considered geometrical information of the data. This means that both label information and geometrical information of the data are important for the learning of meaningful representation.
    Fig. 4.
    Fig. 4. Clustering performances of FRSMNMF_N1 and FRSMNMF_N2 versus different labeling ratios on six datasets.

    5.5.3 Studying the Effects of Individual Views.

    To study the effect of each individual view, we apply k-means algorithm on the feature of each view and report the clustering results. To validate the clustering performance of FRSMNMF_N1 and FRSMNMF_N2, we select MVCNMF and MVOCNMF as baselines. Statistical significant differences (p-value) between FRSMNMF_N1 and the other methods are also reported. Table 5 provides the clustering results on Yale, ORL, 3sources and Carotid datasets. The labeling ratios are all set to 30 \(\%\) . From Table 5, we can see that the proposed methods FRSMNMF_N1 and FRSMNMF_N2 outperform MVCNMF and MVOCNMF in each view. This phenomenon verifies the feature quality of each view in FRSMNMF_N1 and FRSMNMF_N2.
    Table 5.
    DatasetsYaleORL
     view1view2view3view1view2view3
     ACp-valueACp-valueACp-valueACp-valueACp-valueACp-value
    MVCNMF [5]60.95 \(\lt\) 0.00164.70 \(\lt\) 0.00156.56 \(\lt\) 0.00174.13 \(\lt\) 0.00174.54 \(\lt\) 0.00174.14 \(\lt\) 0.001
    MVOCNMF [6]63.91 \(\lt\) 0.00163.97 \(\lt\) 0.00164.07 \(\lt\) 0.00175.06 \(\lt\) 0.00175.37 \(\lt\) 0.00174.42 \(\lt\) 0.001
    FRSMNMF_N277.840.95477.960.95476.710.95488.010.13388.410.13387.350.133
    FRSMNMF_N177.9877.9376.6789.1689.6188.17
     NMIp-valueNMIp-valueNMIp-valueNMIp-valueNMIp-valueNMIp-value
    MVCNMF [5]64.68 \(\lt\) 0.00168.03 \(\lt\) 0.00161.01 \(\lt\) 0.00185.99 \(\lt\) 0.00186.22 \(\lt\) 0.00186.02 \(\lt\) 0.001
    MVOCNMF [6]67.64 \(\lt\) 0.00167.64 \(\lt\) 0.00167.63 \(\lt\) 0.00186.65 \(\lt\) 0.00186.70 \(\lt\) 0.00186.28 \(\lt\) 0.001
    FRSMNMF_N274.690.95274.930.95273.880.95292.240.08492.550.08491.770.084
    FRSMNMF_N174.7674.9273.8592.9293.3292.34
    Datasets3sourcesCarotid
     view1view2view3view1view2
     ACp-valueACp-valueACp-valueACp-valueACp-value
    MVCNMF [5]71.65 \(\lt\) 0.00169.54 \(\lt\) 0.00168.89 \(\lt\) 0.00163.78 \(\lt\) 0.00164.73 \(\lt\) 0.001
    MVOCNMF [6]81.75 \(\lt\) 0.00181.72 \(\lt\) 0.00181.74 \(\lt\) 0.00165.22 \(\lt\) 0.00165.22 \(\lt\) 0.001
    FRSMNMF_N293.010.00492.860.00493.300.00476.581.00072.961.000
    FRSMNMF_N195.1894.5894.8776.5872.96
     NMIp-valueNMIp-valueNMIp-valueNMIp-valueNMIp-value
    MVCNMF [5]67.38 \(\lt\) 0.00165.82 \(\lt\) 0.00164.77 \(\lt\) 0.00112.01 \(\lt\) 0.00110.71 \(\lt\) 0.001
    MVOCNMF [6]69.13 \(\lt\) 0.00169.07 \(\lt\) 0.00169.09 \(\lt\) 0.0016.82 \(\lt\) 0.0016.82 \(\lt\) 0.001
    FRSMNMF_N282.64 \(\lt\) 0.00182.54 \(\lt\) 0.00183.82 \(\lt\) 0.00121.811.00015.961.000
    FRSMNMF_N187.9986.1187.4621.8115.96
    Table 5. Individual View Clustering Performance (AC( \(\%\) ) and NMI( \(\%\) )) of MVCNMF, MVOCNMF, FRSMNMF_N1, and FRSMNMF_N2 on Yale, ORL, 3sources and Carotid Datasets
    The labeling ratios are all set to 30 \(\%\) . The best results are highlighted in bold and the second best results are in italic.

    5.5.4 Ablation Study.

    In this subsection, we will further explore the influences of graph regularization, fusion regularization, the constraints \(||{\bf W}_{\cdot j}^v||_2=1\) and \(||{\bf W}_{\cdot j}^v||_1=1\) . We denote FRSMNMF_N1 w \(|\) o GR as FRSMNMF_N1 without graph regularization, FRSMNMF_N1 w \(|\) o FR as FRSMNMF_N1 without fusion regularization, FRSMNMF_N2 w \(|\) o GR as FRSMNMF_N2 without graph regularization, FRSMNMF_N2 w \(|\) o FR as FRSMNMF_N2 without fusion regularization. We further denote FRSMNMF as FRSMNMF_N1 or FRSMNMF_N2 without the constraint \(||{\bf W}_{\cdot j}^v||_2=1\) or \(||{\bf W}_{\cdot j}^v||_1=1\) . The performances of above methods on Yale, ORL, FEI and Carotid datasets with the labeling ratios of 10 \(\%\) are reported. The experimental results for the ablation study of FRSMNMF N1 and FRSMNMF N2 on Yale, ORL, 3sources and Carotid datasets are reported in Table 6. From Table 6, we can see that, on all datasets, the performances of FRSMNMF_N1 and FRSMNMF_N2 decrease without graph regularization or fusion regularization. This indicates that both graph regularization and fusion regularization contribute to the performances of FRSMNMF_N1 and FRSMNMF_N2. Comparing the results of FRSMNMF_N1, FRSMNMF_N2, and FRSMNMF, it can be seen that the performance of FRSMNMF decreases without the constraint \(||{\bf W}_{\cdot j}^v||_2=1\) or \(||{\bf W}_{\cdot j}^v||_1=1\) . It means that the constraints \(||{\bf W}_{\cdot j}^v||_2=1\) and \(||{\bf W}_{\cdot j}^v||_1=1\) contribute to the performances of FRSMNMF_N1 and FRSMNMF_N2, respectively.
    Table 6.
      YaleORL
    NormMethodACNMIACNMI
    NoneFRSMNMF61.55 \(\pm\) 3.0260.97 \(\pm\) 2.1769.69 \(\pm\) 2.0382.35 \(\pm\) 1.40
    \(||{\bf W}_{\cdot j}^v||_2=1\) FRSMNMF_N162.01 \(\pm\) 2.3163.14 \(\pm\) 1.6276.40 \(\pm\) 1.3986.63 \(\pm\) 0.71
    FRSMNMF_N1 w \(|\) o GR58.43 \(\pm\) 2.4059.74 \(\pm\) 1.9171.24 \(\pm\) 1.5082.97 \(\pm\) 0.73
    FRSMNMF_N1 w \(|\) o FR49.78 \(\pm\) 0.9153.23 \(\pm\) 0.5467.97 \(\pm\) 0.7182.96 \(\pm\) 0.29
    \(||{\bf W}_{\cdot j}^v||_1=1\) FRSMNMF_N262.25 \(\pm\) 3.0161.11 \(\pm\) 2.0072.05 \(\pm\) 1.5183.97 \(\pm\) 0.76
    FRSMNMF_N2 w \(|\) o GR44.48 \(\pm\) 1.1448.05 \(\pm\) 0.7162.56 \(\pm\) 0.7978.60 \(\pm\) 0.40
    FRSMNMF_N2 w \(|\) o FR51.47 \(\pm\) 0.9954.97 \(\pm\) 0.8668.64 \(\pm\) 0.7183.51 \(\pm\) 0.28
      3sourcesCarotid
    NormMethodACNMIACNMI
    NoneFRSMNMF84.35 \(\pm\) 3.9067.82 \(\pm\) 5.6869.07 \(\pm\) 1.4810.91 \(\pm\) 1.74
    \(||{\bf W}_{\cdot j}^v||_2=1\) FRSMNMF_N189.91 \(\pm\) 2.4979.39 \(\pm\) 3.4171.11 \(\pm\) 1.4913.45 \(\pm\) 1.96
    FRSMNMF_N1 w \(|\) o GR85.04 \(\pm\) 4.1773.76 \(\pm\) 3.6361.11 \(\pm\) 1.094.14 \(\pm\) 0.58
    FRSMNMF_N1 w \(|\) o FR60.55 \(\pm\) 2.5356.42 \(\pm\) 1.8655.76 \(\pm\) 0.122.54 \(\pm\) 0.07
    \(||{\bf W}_{\cdot j}^v||_1=1\) FRSMNMF_N287.91 \(\pm\) 2.7174.57 \(\pm\) 3.9571.04 \(\pm\) 1.4613.41 \(\pm\) 1.90
    FRSMNMF_N2 w \(|\) o GR53.68 \(\pm\) 1.7650.50 \(\pm\) 1.6058.48 \(\pm\) 0.293.32 \(\pm\) 0.17
    FRSMNMF_N2 w \(|\) o FR59.10 \(\pm\) 0.6455.08 \(\pm\) 0.8557.17 \(\pm\) 0.812.60 \(\pm\) 0.41
    Table 6. Ablation Study of FRSMNMF_N1 and FRSMNMF_N2 on Yale, ORL, 3sources and Carotid Datasets
    The labeling ratio is set to 10 \(\%\) . The best results are highlighted in bold and the second best results are in italic.

    5.5.5 Performance on Noise Datasets.

    In this section, we will evaluate the performances of the proposed methods on noise datasets. Specifically, we manually pollute Yale, ORL, FEI and Carotid datasets with different levels of noise. And the different levels of pollution are simulated by randomly discarding 10 \(\%\) , 20 \(\%\) , and 30 \(\%\) data points in each view. The performances of FRSMNMF_N1 and FRSMNMF_N2 are evaluated on above constructed noise datasets with MVCNMF and MVOCNMF as baselines. The labeling ratios of all methods are set to 20 \(\%\) . Before applying above methods, the missing data points in each view are filled with the average value of existing data points in the corresponding view. The experimental results on noise datasets are reported in Table 7. As can be seen from Table 7, the performances of all methods decrease with the increase of noise level. FRSMNMF_N1 is superior to FRSMNMF_N2 in most cases. On the Yale dataset, we find that the performance of FRSMNMF_N1 decreases dramatically, indicating that the constraint \(||{\bf W}_{\cdot j}^v||_2=1\) is more robust-to-noise than the constraint \(||{\bf W}_{\cdot j}^v||_1=1\) .
    Table 7.
     Yale
    Noise10 \(\%\) 20 \(\%\) 30 \(\%\)
    MethodACNMIACNMIACNMI
    MVCNMF [5]56.45 \(\pm\) 1.6960.18 \(\pm\) 1.6853.32 \(\pm\) 1.7157.85 \(\pm\) 1.3449.01 \(\pm\) 1.4653.40 \(\pm\) 1.17
    MVOCNMF [6]56.61 \(\pm\) 1.6660.44 \(\pm\) 1.3753.82 \(\pm\) 1.6858.22 \(\pm\) 1.5350.02 \(\pm\) 1.5754.22 \(\pm\) 1.31
    FRSMNMF_N257.59 \(\pm\) 2.5456.16 \(\pm\) 1.7954.60 \(\pm\) 2.7852.14 \(\pm\) 2.2352.37 \(\pm\) 2.6250.03 \(\pm\) 2.40
    FRSMNMF_N165.59 \(\pm\) 1.8664.85 \(\pm\) 1.9761.71 \(\pm\) 1.9461.15 \(\pm\) 1.8959.38 \(\pm\) 2.5558.39 \(\pm\) 2.03
    ORL
    Noise10 \(\%\) 20 \(\%\) 30 \(\%\)
    MethodACNMIACNMIACNMI
    MVCNMF [5]67.06 \(\pm\) 1.1181.10 \(\pm\) 0.5164.21 \(\pm\) 1.0978.64 \(\pm\) 0.6958.94 \(\pm\) 1.1674.74 \(\pm\) 0.49
    MVOCNMF [6]66.92 \(\pm\) 0.7680.92 \(\pm\) 0.4564.12 \(\pm\) 1.0078.64 \(\pm\) 0.5358.90 \(\pm\) 0.9274.92 \(\pm\) 0.41
    FRSMNMF_N278.42 \(\pm\) 1.6985.35 \(\pm\) 1.1275.70 \(\pm\) 1.9283.24 \(\pm\) 1.1271.82 \(\pm\) 1.5180.01 \(\pm\) 1.14
    FRSMNMF_N182.26 \(\pm\) 1.4588.68 \(\pm\) 0.8480.09 \(\pm\) 1.2086.91 \(\pm\) 0.8275.43 \(\pm\) 1.2483.69 \(\pm\) 0.78
    FEI
    Noise10 \(\%\) 20 \(\%\) 30 \(\%\)
    MethodACNMIACNMIACNMI
    MVCNMF [5]60.09 \(\pm\) 0.8577.25 \(\pm\) 0.5154.76 \(\pm\) 1.2073.23 \(\pm\) 0.6251.48 \(\pm\) 0.8770.29 \(\pm\) 0.54
    MVOCNMF [6]59.85 \(\pm\) 0.9077.04 \(\pm\) 0.6054.63 \(\pm\) 1.0873.27 \(\pm\) 0.5151.08 \(\pm\) 0.8670.22 \(\pm\) 0.57
    FRSMNMF_N283.68 \(\pm\) 1.0590.18 \(\pm\) 0.7079.54 \(\pm\) 1.3687.16 \(\pm\) 0.7676.93 \(\pm\) 1.2184.74 \(\pm\) 0.75
    FRSMNMF_N183.79 \(\pm\) 1.1390.21 \(\pm\) 0.7279.72 \(\pm\) 1.5987.20 \(\pm\) 0.8176.87 \(\pm\) 1.1184.70 \(\pm\) 0.67
    Carotid
    Noise10 \(\%\) 20 \(\%\) 30 \(\%\)
    MethodACNMIACNMIACNMI
    MVCNMF [5]59.88 \(\pm\) 1.654.54 \(\pm\) 1.0659.07 \(\pm\) 1.123.89 \(\pm\) 0.7858.62 \(\pm\) 0.815.06 \(\pm\) 0.65
    MVOCNMF [6]59.69 \(\pm\) 1.544.39 \(\pm\) 0.9658.82 \(\pm\) 1.123.71 \(\pm\) 0.7459.54 \(\pm\) 0.564.18 \(\pm\) 0.31
    FRSMNMF_N274.56 \(\pm\) 1.0218.28 \(\pm\) 1.4773.88 \(\pm\) 1.3617.26 \(\pm\) 1.9072.74 \(\pm\) 1.9715.70 \(\pm\) 2.71
    FRSMNMF_N174.54 \(\pm\) 0.8818.23 \(\pm\) 1.2873.95 \(\pm\) 0.9617.34 \(\pm\) 1.3973.23 \(\pm\) 1.2916.33 \(\pm\) 1.86
    Table 7. Performances of FRSMNMF_N1 and FRSMNMF_N2 on Yale, ORL, FEI and Carotid datasets polluted by different levels of noises (10 \(\%\) , 20 \(\%\) and 30 \(\%\) ). The labeling ratio is set to 20 \(\%\) .
    The best results are highlighted in bold and the second best results are in italic.

    5.5.6 Feature Visualization.

    In this section, we visualize the fused features learned by MVOCNMF, MVCNMF, FRSMNMF_N1, and FRSMNMF_N2 in Figure 5 (on the Yale dataset) and Figure 6 (on the FEI dataset). In this figure, rows are features and columns are data points. In Figures 5(a) and 5(b), the features of the first 60 data points, whose labels are given in the dataset, are with clear block diagonal structure. The block diagonal structure means that the distances between the data points with the same labels are minimized, approaching zero, while the distances between the data points with the different labels are maximized. In Figures 5(a) and 5(b), the feature matrix of the rest unlabeled data points also shows block diagonal structure, although with some noises. From Figures 5(c) and 5(d), we can see that, for the first 60 data points, although the distances of the data points with the same labels are zero (the data with same labels are represented with the same feature vectors), the distances of the data points with different classes are not minimized. This means that the label information is not effectively used in MVOCNMF and MVCNMF. The similar phenomenon can also be found in Figure 6, in which the labels of the top 250 data points are given.
    Fig. 5.
    Fig. 5. The feature visualization of FRSMNMF_N1, FRSMNMF_N2, MVCNMF, and MVOCNMF on the Yale dataset. The labeling ratio is 30 \(\%\) . Rows are features and columns are data points.
    Fig. 6.
    Fig. 6. The feature visualization of FRSMNMF_N1, FRSMNMF_N2, MVCNMF, and MVOCNMF on FEI dataset. The labeling ratio is 30 \(\%\) . Rows are features and columns are data points.

    6 Conclusion

    In this article, a novel semi-supervised MVNMF framework with fusion regularization (FRSMNMF) is proposed. In our work, the discriminative term and the feature alignment term are fused as one regularizing term, this effectively enhances the learning of discriminative feature and reduces the number of hyper-parameters which makes the proposed framework more easy to fine tune than existing semi-supervised MVNMFs. The geometrical information is also considered by constructing graph regularizer for each view. To align multiple views effectively, two feature scale normalizing strategies are adopted, two corresponding specific implementations and iterative optimizing schemes of the proposed framework are presented. The effectiveness of our methods is evaluated by comparing with several state-of-the-art unsupervised and semi-supervised MVNMFs on six datasets.

    Acknowledgments

    Thanks to doctor Lijie Ren of Shenzhen Second People’s Hospital for helping to collect and organize the medical dataset used in this paper.

    Footnotes

    References

    [1]
    H. F. Bassani and A. F. R. Araujo. 2015. Dimension selective self-organizing maps with time-varying structure for subspace and projected clustering. IEEE Transactions on Neural Networks and Learning Systems 26, 3 (2015), 458–471.
    [2]
    Steffen Bickel and Tobias Scheffer. 2004. Multi-view clustering. In Proceedings of the 2004 SIAM International Conference on Data Mining. SIAM, 19–26.
    [3]
    Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press.
    [4]
    Deng Cai, Xiaofei He, Jiawei Han, and Thomas S. Huang. 2011. Graph regularized nonnegative matrix factorization for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 8 (2011), 1548–1560.
    [5]
    Hao Cai, Bo Liu, Yanshan Xiao, and LuYue Lin. 2019. Semi-supervised multi-view clustering based on constrained nonnegative matrix factorization. Knowledge-Based Systems 182 (2019), 104798.
    [6]
    Hao Cai, Bo Liu, Yanshan Xiao, and LuYue Lin. 2020. Semi-supervised multi-view clustering based on orthonormality-constrained nonnegative matrix factorization. Information Sciences 536 (2020), 171–184.
    [7]
    Xiao Cai, Feiping Nie, and Heng Huang. 2013. Multi-view k-means clustering on big data. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Citeseer.
    [8]
    Xiaochun Cao, Changqing Zhang, Huazhu Fu, Si Liu, and Hua Zhang. 2015. Diversity-induced multi-view subspace clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 586–594.
    [9]
    Guoqing Chao, Shiliang Sun, and Jinbo Bi. 2021. A survey on multi-view clustering. IEEE Transactions on Artificial Intelligence 2, 2 (2021), 146–168.
    [10]
    Feiqiong Chen, Guopeng Li, Shuaihui Wang, and Zhisong Pan. 2019. Multiview clustering via robust neighboring constraint nonnegative matrix factorization. Mathematical Problems in Engineering 2019 (2019).
    [11]
    Rui Chen, Yongqiang Tang, Wensheng Zhang, and Wenlong Feng. 2022. Deep multi-view semi-supervised clustering with sample pairwise constraints. Neurocomputing 500 (2022), 832–845.
    [12]
    Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms. MIT Press.
    [13]
    Beilei Cui, Hong Yu, Tiantian Zhang, and Siwen Li. 2019. Self-weighted multi-view clustering with deep matrix factorization. In Proceedings of the Asian Conference on Machine Learning. 567–582.
    [14]
    Guosheng Cui and Ye Li. 2022. Nonredundancy regularization based nonnegative matrix factorization with manifold learning for multiview data representation. Information Fusion 82 (2022), 86–98.
    [15]
    Charles Elkan. 2003. Using the triangle inequality to accelerate k-means. In Proceedings of the 20th International Conference on Machine Learning, Vol. 3. 147–153.
    [16]
    Derek Greene and Padraig Cunningham. 2006. Practical solutions to the problem of diagonal dominance in kernel document clustering. In Proceedings of the 23rd International Conference on Machine Learning, Vol. 148. ACM, 377–384.
    [17]
    Menglei Hu and Songcan Chen. 2018. Doubly aligned incomplete multi-view clustering. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI). 2262–2268.
    [18]
    Ling Huang, Hong-Yang Chao, and Chang-Dong Wang. 2019. Multi-view intact space clustering. Pattern Recognition 86 (2019), 344–353.
    [19]
    Xiaoxiang Huang, Guosheng Cui, Dan Wu, and Ye Li. 2020. A semi-supervised approach for early identifying the abnormal carotid arteries using a modified variational autoencoder. In Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine. IEEE, 595–600.
    [20]
    Zhenyu Huang, Peng Hu, Joey Tianyi Zhou, Jiancheng Lv, and Xi Peng. 2020. Partially view-aligned clustering. Advances in Neural Information Processing Systems 33 (2020), 2892–2902.
    [21]
    A. K. Jain, M. N. Murty, and P. J. Flynn. 1999. Data clustering: A review. ACM Computing Surveys. 31, 3 (1999).
    [22]
    Yu Jiang, Jing Liu, Zechao Li, and Hanqing Lu. 2014. Semi-supervised unified latent factor learning with multi-view data. Machine Vision and Applications 25 (2014), 1635–1645.
    [23]
    Martin Lades, Jan C. Vorbruggen, Joachim Buhmann, Jörg Lange, Christoph Von Der Malsburg, Rolf P. Wurtz, and Wolfgang Konen. 1993. Distortion invariant object recognition in the dynamic link architecture. IEEE Trans. Comput.3 (1993), 300–311.
    [24]
    Daniel D. Lee and H. Sebastian Seung. 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401, 6755 (1999), 788–791.
    [25]
    Daniel D. Lee and H. Sebastian Seung. 2000. Algorithms for non-negative matrix factorization. In Proceedings of the Advances in Neural Information Processing Systems. 556–562.
    [26]
    Jianqiang Li, Guoxu Zhou, Yuning Qiu, Yanjiao Wang, Yu Zhang, and Shengli Xie. 2020. Deep graph regularized non-negative matrix factorization for multi-view clustering. Neurocomputing 390 (2020), 108–116.
    [27]
    Naiyao Liang, Zuyuan Yang, Zhenni Li, Weijun Sun, and Shengli Xie. 2020. Multi-view clustering by non-negative matrix factorization with co-orthogonal constraints. Knowledge-Based Systems 194 (2020), 105582.
    [28]
    Naiyao Liang, Zuyuan Yang, Zhenni Li, Shengli Xie, and Chun-Yi Su. 2020. Semi-supervised multi-view clustering with graph-regularized partially shared non-negative matrix factorization. Knowledge-Based Systems 190 (2020), 105185.
    [29]
    Naiyao Liang, Zuyuan Yang, Zhenni Li, Shengli Xie, and Weijun Sun. 2021. Semi-supervised multi-view learning by using label propagation based non-negative matrix factorization. Knowledge-Based Systems 228 (2021), 107244.
    [30]
    Haifeng Liu, Zhaohui Wu, Xuelong Li, Deng Cai, and Thomas S. Huang. 2012. Constrained nonnegative matrix factorization for image representation. Transactions on Pattern Analysis and Machine Intelligence 34, 7 (2012), 1299–1311.
    [31]
    Jing Liu, Yu Jiang, Zechao Li, Zhi-Hua Zhou, and Hanqing Lu. 2014. Partially shared latent factor learning with multiview data. IEEE Transactions on Neural Networks and Learning Systems 26, 6 (2014), 1233–1246.
    [32]
    Jialu Liu, Chi Wang, Jing Gao, and Jiawei Han. 2013. Multi-view clustering via joint nonnegative matrix factorization. In Proceedings of the 2013 SIAM International Conference on Data Mining. SIAM, 252–260.
    [33]
    László Lovász and Michael D. Plummer. 2009. Matching Theory. Vol. 367. American Mathematical Society.
    [34]
    Chun-Ta Lu, Lifang He, Hao Ding, Bokai Cao, and Philip S. Yu. 2018. Learning from multi-view multi-way data via structural factorization machines. In Proceedings of the 2018 World Wide Web Conference. 1593–1602.
    [35]
    Peng Luo, Jinye Peng, Ziyu Guan, and Jianping Fan. 2018. Dual regularized multi-view non-negative matrix factorization for clustering. Neurocomputing 294 (2018), 1–11.
    [36]
    Feiping Nie, Guohao Cai, Jing Li, and Xuelong Li. 2017. Auto-weighted multi-view learning for image clustering and semi-supervised classification. IEEE Transactions on Image Processing 27, 3 (2017), 1501–1511.
    [37]
    Timo Ojala, Matti Pietikäinen, and Topi Mäenpää. 2002. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence7 (2002), 971–987.
    [38]
    Weihua Ou, Shujian Yu, Gai Li, Jian Lu, Kesheng Zhang, and Gang Xie. 2016. Multi-view non-negative matrix factorization by patch alignment framework with view consistency. Neurocomputing 204 (2016), 116–124.
    [39]
    Nishant Rai, Sumit Negi, Santanu Chaudhury, and Om Deshmukh. 2016. Partial multi-view clustering using graph regularized NMF. In Proceedings of the International Conference on Pattern Recognition (ICPR). 2192–2197.
    [40]
    Farial Shahnaz, Michael W. Berry, V. Paul Pauca, and Robert J. Plemmons. 2006. Document clustering using non-negative matrix factorization. Information Processing and Management 42, 2 (2006), 373–386.
    [41]
    Weixiang Shao, Lifang He, and S. Yu Philip. 2015. Multiple incomplete views clustering via weighted nonnegative matrix factorization with \(L_{2, 1}\) regularization. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 318–334.
    [42]
    Jing Wang, Feng Tian, Hongchuan Yu, Chang Hong Liu, Kun Zhan, and Xiao Wang. 2017. Diverse non-negative matrix factorization for multiview data representation. IEEE Transactions on Cybernetics 48, 9 (2017), 2620–2632.
    [43]
    Jing Wang, Xiao Wang, Feng Tian, Chang Hong Liu, Hongchuan Yu, and Yanbei Liu. 2016. Adaptive multi-view semi-supervised nonnegative matrix factorization. In Proceedings of the International Conference on Neural Information Processing. Springer, 435–444.
    [44]
    Senhong Wang, Jiangzhong Cao, Fangyuan Lei, Qingyun Dai, Shangsong Liang, and Bingo Wing-Kuen Ling. 2021. Semi-supervised multi-view clustering with weighted anchor graph embedding. Computational Intelligence and Neuroscience 2021 (2021).
    [45]
    Xiaobo Wang, Xiaojie Guo, Zhen Lei, Changqing Zhang, and Stan Z. Li. 2017. Exclusivity-consistency regularized multi-view subspace clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 923–931.
    [46]
    Xiumei Wang, Tianzhen Zhang, and Xinbo Gao. 2019. Multiview clustering based on non-negative matrix factorization and pairwise measurements. IEEE Transactions on Cybernetics 49, 9 (2019), 3333–3346.
    [47]
    Chang Xu, Dacheng Tao, and Chao Xu. 2013. A survey on multi-view learning. CoRR abs/1304.5634 (2013).
    [48]
    Jinglin Xu, Junwei Han, and Feiping Nie. 2016. Discriminatively embedded k-means for multi-view clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5356–5364.
    [49]
    Wei Xu, Xin Liu, and Yihong Gong. 2003. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 267–273.
    [50]
    Xiaojun Yang, Weizhong Yu, Rong Wang, Guohao Zhang, and Feiping Nie. 2020. Fast spectral clustering learning with hierarchical bipartite graph for large-scale data. Pattern Recognition Letters 130 (2020), 345–352.
    [51]
    Yan Yang and Hao Wang. 2018. Multi-view clustering: A survey. Big Data Mining and Analytics 1, 2 (2018), 83–107.
    [52]
    Zuyuan Yang, Naiyao Liang, Wei Yan, Zhenni Li, and Shengli Xie. 2020. Uniform distribution non-negative matrix factorization for multiview clustering. IEEE Transactions on Cybernetics 51, 6 (2020), 3249–3262.
    [53]
    Zuyuan Yang, Yong Xiang, Kan Xie, and Yue Lai. 2016. Adaptive method for nonsmooth nonnegative matrix factorization. IEEE Transactions on Neural Networks and Learning Systems 28, 4 (2016), 948–960.
    [54]
    Zuyuan Yang, Huimin Zhang, Naiyao Liang, Zhenni Li, and Weijun Sun. 2023. Semi-supervised multi-view clustering by label relaxation based non-negative matrix factorization. The Visual Computer 39, 4 (2023), 1409–1422.
    [55]
    Shan Zeng, Xiuying Wang, Hui Cui, Chaojie Zheng, and David Feng. 2017. A unified collaborative multikernel fuzzy clustering for multiview data. IEEE Transactions on Fuzzy Systems 26, 3 (2017), 1671–1687.
    [56]
    Hongyuan Zha, Xiaofeng He, Chris Ding, Ming Gu, and Horst D. Simon. 2002. Spectral relaxation for k-means clustering. In Proceedings of the Advances in Neural Information Processing Systems. 1057–1064.
    [57]
    Changqing Zhang, Qinghua Hu, Huazhu Fu, Pengfei Zhu, and Xiaochun Cao. 2017. Latent multi-view subspace clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4279–4287.
    [58]
    DongPing Zhang, YiHao Luo, YuYuan Yu, QiBin Zhao, and GuoXu Zhou. 2022. Semi-supervised multi-view clustering with dual hypergraph regularized partially shared non-negative matrix factorization. Science China Technological Sciences 65, 6 (2022), 1349–1365.
    [59]
    Zheng Zhang, Li Liu, Fumin Shen, Heng Tao Shen, and Ling Shao. 2018. Binary multi-view clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 7 (2018), 1774–1782.
    [60]
    Handong Zhao, Zhengming Ding, and Yun Fu. 2017. Multi-view clustering via deep matrix factorization. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.
    [61]
    Jing Zhao, Xijiong Xie, Xin Xu, and Shiliang Sun. 2017. Multi-view learning overview: Recent progress and new challenges. Information Fusion 38 (2017), 43–54.
    [62]
    Wei Zhao, Cai Xu, Ziyu Guan, and Ying Liu. 2020. Multiview concept learning via deep matrix factorization. IEEE Transactions on Neural Networks and Learning Systems 32, 2 (2020), 814–825.
    [63]
    X. Zhu, S. Zhang, Y. Li, J. Zhang, L. Yang, and Y. Fang. 2019. Low-rank sparse subspace for spectral clustering. IEEE Transactions on Knowledge and Data Engineering 31, 8 (2019), 1532–1543.
    [64]
    Linlin Zong, Xianchao Zhang, Long Zhao, Hong Yu, and Qianli Zhao. 2017. Multi-view clustering via multi-manifold regularized non-negative matrix factorization. Neural Networks 88 (2017), 74–89.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 18, Issue 6
    July 2024
    760 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/3613684
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 April 2024
    Online AM: 18 March 2024
    Accepted: 08 March 2024
    Revised: 11 September 2023
    Received: 19 October 2022
    Published in TKDD Volume 18, Issue 6

    Check for updates

    Author Tags

    1. Multi-view
    2. semi-supervised clustering
    3. fusion regularization
    4. scale normalization

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 498
      Total Downloads
    • Downloads (Last 12 months)498
    • Downloads (Last 6 weeks)137
    Reflects downloads up to 09 Aug 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media