Home > Research > Publications & Outputs > Subspace Clustering and Active Learning with Co...

Electronic data

  • 2020pengphd

    Final published version, 2.2 MB, PDF document

Text available via DOI:

View graph of relations

Subspace Clustering and Active Learning with Constraints

Research output: ThesisDoctoral Thesis

Published

Standard

Subspace Clustering and Active Learning with Constraints. / Peng, Hankui.

Lancaster University, 2020. 201 p.

Research output: ThesisDoctoral Thesis

Harvard

APA

Vancouver

Author

Bibtex

@phdthesis{72bc1315d40d4f02b2c5ab960155e730,
title = "Subspace Clustering and Active Learning with Constraints",
abstract = "Data representations can often be high-dimensional, whether it is due to the large number of collected / recorded features or due to how the data sources (e.g. images, texts) are processed. It is often the case that the main structure of the data can be summarised well in a lower dimensional subspace or multiple lower dimensional subspaces. Subspace clustering addresses the problem of simultaneously uncovering multiple subspace structures in the data and grouping the data according to their underlying subspace structures.The first contribution of this thesis is the development of a Subspace Clustering with Active Learning (SCAL) framework that is designed for Subspace Clustering. This framework allows clustering performance to improve in an effective and efficient manner over time, with the need to query only a small amount of labelling information. It also has the potential to be applied to more general subspace clustering methods, which has been further explored and developed in our next methodological contribution.The second contribution of this thesis is a unified active learning and constrained clustering framework for spectral-based subspace clustering methods. In this work, we propose a spectral-based subspace clustering methodology named Weighted Sparse Simplex Representation (WSSR). It has been demonstrated to have favourable performance against state-of-the-art spectral-based subspace clustering methods on both synthetic and real data. We also propose a flexible weighting scheme that can incorporate external information into the problem formulation, which leads to a constrained clustering extension of WSSR. We show that it can be applied in conjunction with our previously proposed SCAL strategy when labelling information can be queried sequentially.The third contribution of this thesis is the development of an algebraic subspace clustering methodology – Minimum Angle Clustering (MAC). It is motivated by the application of clustering Amazon products based on their titles when represented using the TF-IDF matrix, which is both sparse and high-dimensional. The proposed methodology is composed of two stages. In the first stage, it identifies a large number of subspaces in the data through the Reduced Row Echelon Form technique. In the second stage, we propose a new subspace proximity measure to construct an affinity matrix for the formed subspaces before spectral clustering is applied to obtain the final cluster labels. The proposed methodology has been shown to enjoy competitive performance against a number of well-established subspace clustering and document clustering techniques on the application of clustering Amazon product names.",
author = "Hankui Peng",
year = "2020",
month = dec,
day = "9",
doi = "10.17635/lancaster/thesis/1206",
language = "English",
publisher = "Lancaster University",
school = "Lancaster University",

}

RIS

TY - THES

T1 - Subspace Clustering and Active Learning with Constraints

AU - Peng, Hankui

PY - 2020/12/9

Y1 - 2020/12/9

N2 - Data representations can often be high-dimensional, whether it is due to the large number of collected / recorded features or due to how the data sources (e.g. images, texts) are processed. It is often the case that the main structure of the data can be summarised well in a lower dimensional subspace or multiple lower dimensional subspaces. Subspace clustering addresses the problem of simultaneously uncovering multiple subspace structures in the data and grouping the data according to their underlying subspace structures.The first contribution of this thesis is the development of a Subspace Clustering with Active Learning (SCAL) framework that is designed for Subspace Clustering. This framework allows clustering performance to improve in an effective and efficient manner over time, with the need to query only a small amount of labelling information. It also has the potential to be applied to more general subspace clustering methods, which has been further explored and developed in our next methodological contribution.The second contribution of this thesis is a unified active learning and constrained clustering framework for spectral-based subspace clustering methods. In this work, we propose a spectral-based subspace clustering methodology named Weighted Sparse Simplex Representation (WSSR). It has been demonstrated to have favourable performance against state-of-the-art spectral-based subspace clustering methods on both synthetic and real data. We also propose a flexible weighting scheme that can incorporate external information into the problem formulation, which leads to a constrained clustering extension of WSSR. We show that it can be applied in conjunction with our previously proposed SCAL strategy when labelling information can be queried sequentially.The third contribution of this thesis is the development of an algebraic subspace clustering methodology – Minimum Angle Clustering (MAC). It is motivated by the application of clustering Amazon products based on their titles when represented using the TF-IDF matrix, which is both sparse and high-dimensional. The proposed methodology is composed of two stages. In the first stage, it identifies a large number of subspaces in the data through the Reduced Row Echelon Form technique. In the second stage, we propose a new subspace proximity measure to construct an affinity matrix for the formed subspaces before spectral clustering is applied to obtain the final cluster labels. The proposed methodology has been shown to enjoy competitive performance against a number of well-established subspace clustering and document clustering techniques on the application of clustering Amazon product names.

AB - Data representations can often be high-dimensional, whether it is due to the large number of collected / recorded features or due to how the data sources (e.g. images, texts) are processed. It is often the case that the main structure of the data can be summarised well in a lower dimensional subspace or multiple lower dimensional subspaces. Subspace clustering addresses the problem of simultaneously uncovering multiple subspace structures in the data and grouping the data according to their underlying subspace structures.The first contribution of this thesis is the development of a Subspace Clustering with Active Learning (SCAL) framework that is designed for Subspace Clustering. This framework allows clustering performance to improve in an effective and efficient manner over time, with the need to query only a small amount of labelling information. It also has the potential to be applied to more general subspace clustering methods, which has been further explored and developed in our next methodological contribution.The second contribution of this thesis is a unified active learning and constrained clustering framework for spectral-based subspace clustering methods. In this work, we propose a spectral-based subspace clustering methodology named Weighted Sparse Simplex Representation (WSSR). It has been demonstrated to have favourable performance against state-of-the-art spectral-based subspace clustering methods on both synthetic and real data. We also propose a flexible weighting scheme that can incorporate external information into the problem formulation, which leads to a constrained clustering extension of WSSR. We show that it can be applied in conjunction with our previously proposed SCAL strategy when labelling information can be queried sequentially.The third contribution of this thesis is the development of an algebraic subspace clustering methodology – Minimum Angle Clustering (MAC). It is motivated by the application of clustering Amazon products based on their titles when represented using the TF-IDF matrix, which is both sparse and high-dimensional. The proposed methodology is composed of two stages. In the first stage, it identifies a large number of subspaces in the data through the Reduced Row Echelon Form technique. In the second stage, we propose a new subspace proximity measure to construct an affinity matrix for the formed subspaces before spectral clustering is applied to obtain the final cluster labels. The proposed methodology has been shown to enjoy competitive performance against a number of well-established subspace clustering and document clustering techniques on the application of clustering Amazon product names.

U2 - 10.17635/lancaster/thesis/1206

DO - 10.17635/lancaster/thesis/1206

M3 - Doctoral Thesis

PB - Lancaster University

ER -