TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Large-scale spectral clustering

Large-scale spectral clustering

GeneralIntroduced 20005 papers
Source Paper

Description

Spectral Clustering

Spectral clustering aims to partition the data points into kkk clusters using the spectrum of the graph Laplacians Given a dataset XXX with NNN data points, spectral clustering algorithm first constructs similarity matrix W{W}W, where wij{w_{ij}}wij​ indicates the similarity between data points xix_ixi​ and xjx_jxj​ via a similarity measure metric.

Let L=D−WL=D-WL=D−W, where LLL is called graph Laplacian and D{D}D is a diagonal matrix with dii=∑j=1nwijd_{ii} = \sum_ {j=1}^n w_{ij}dii​=∑j=1n​wij​. The objective function of spectral clustering can be formulated based on the graph Laplacian as follow: \begin{equation} \label{eq:SC_obj} {\max_{{U}} \operatorname{tr}\left({U}^{T} {L} {U}\right)}, \ {\text { s.t. } \quad {U}^{T} {{U}={I}}}, \end{equation} where tr(⋅)⁡\operatorname{tr(\cdot)}tr(⋅) denotes the trace norm of a matrix. The rows of matrix U{U}U are the low dimensional embedding of the original data points. Generally, spectral clustering computes U{U}U as the bottom kkk eigenvectors of L{L}L, and finally applies kkk-means on U{U}U to obtain the clustering results.

Large-scale Spectral Clustering

To capture the relationship between all data points in XXX, an N×NN\times NN×N similarity matrix is needed to be constructed in conventional spectral clustering, which costs O(N2d)O(N^2d)O(N2d) time and O(N2)O(N^2)O(N2) memory and is not feasible for large-scale clustering tasks. Instead of a full similarity matrix, many accelerated spectral clustering methods are using a similarity sub-matrix to represent each data points by the cross-similarity between data points and a set of representative data points (i.e., landmarks) via some similarity measures, as \begin{equation} \label{eq: cross-similarity} B = \Phi(X,R), \end{equation} where R={r1,r2,…,rp}R = \{r_1,r_2,\dots, r_p \}R={r1​,r2​,…,rp​} (p≪Np \ll Np≪N) is a set of landmarks with the same dimension to XXX, Φ(⋅)\Phi(\cdot)Φ(⋅) indicate a similarity measure metric, and B∈RN×pB\in \mathbb{R}^{N\times p}B∈RN×p is the similarity sub-matrix to represent the X∈RN×dX \in \mathbb{R}^{N\times d}X∈RN×d with respect to the R∈Rp×dR\in \mathbb{R}^{p\times d}R∈Rp×d.

For large-scale spectral clustering using such similarity matrix, a symmetric similarity matrix WWW can be designed as \begin{equation} \label{eq: WusedB } W=\left[\begin{array}{ll} \mathbf{0} & B ; \ B^{T} & \mathbf{0} \end{array}\right]. \end{equation} The size of matrix WWW is (N+p)×(N+p)(N+p)\times (N+p)(N+p)×(N+p). Taking the advantage of the bipartite structure, some fast eigen-decomposition methods can then be used to obtain the spectral embedding. Finally, kkk-means is conducted on the embedding to obtain clustering results.

The clustering result is directly related to the quality of BBB that consists of the similarities between data points and landmarks. Thus, the performance of landmark selection is crucial to the clustering result.

Papers Using This Method

Clustering and classification of low-dimensional data in explicit feature map domain: intraoperative pixel-wise diagnosis of adenocarcinoma of a colon in a liver2022-03-07LSEC: Large-scale spectral ensemble clustering2021-06-18Ultra-Scalable Spectral Clustering and Ensemble Clustering2019-03-04Large-scale spectral clustering using diffusion coordinates on landmark-based bipartite graphs2018-06-01