Subspace Clustering of Very Sparse High-Dimensional Data

Associated organisational units

Electronic data

Peng2018_final
Rights statement: ©2018 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Accepted author manuscript, 110 KB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Keywords

Subspace Clustering, Principal Angles, High-dimensionality, Short texts

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Standard

Subspace Clustering of Very Sparse High-Dimensional Data. / Peng, Hankui ; Pavlidis, Nicos Georgios ; Eckley, Idris Arthur et al.
2018 IEEE International Conference on Big Data (Big Data). IEEE, 2019. p. 3780-3783.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Harvard

Peng, H , Pavlidis, NG , Eckley, IA & Tsalamanis, I 2019, Subspace Clustering of Very Sparse High-Dimensional Data. in 2018 IEEE International Conference on Big Data (Big Data). IEEE, pp. 3780-3783, Advances in High-Dimensional Big Data in conjunction with the 2018 IEEE International Conference on Big Data (IEEE BigData 2018), Seattle, United States, 10/12/18.

APA

Peng, H., Pavlidis, N. G., Eckley, I. A., & Tsalamanis, I. (2019). Subspace Clustering of Very Sparse High-Dimensional Data. In 2018 IEEE International Conference on Big Data (Big Data) (pp. 3780-3783). IEEE.

Vancouver

Peng H , Pavlidis NG , Eckley IA, Tsalamanis I. Subspace Clustering of Very Sparse High-Dimensional Data. In 2018 IEEE International Conference on Big Data (Big Data). IEEE. 2019. p. 3780-3783

Author

Peng, Hankui ; Pavlidis, Nicos Georgios ; Eckley, Idris Arthur et al. / Subspace Clustering of Very Sparse High-Dimensional Data. 2018 IEEE International Conference on Big Data (Big Data). IEEE, 2019. pp. 3780-3783

Bibtex

@inproceedings{4db983bebf864925a3675adc4bbd4f53,

title = "Subspace Clustering of Very Sparse High-Dimensional Data",

abstract = "In this paper we consider the problem of clustering collections of very short texts using subspace clustering. This problem arises in many applications such as product categorisation, fraud detection, and sentiment analysis. The main challenge lies in the fact that the vectorial representation of short texts is both high-dimensional, due to the large number of unique terms in the corpus, and extremely sparse, as each text contains a verysmall number of words with no repetition. We propose a new, simple subspace clustering algorithm that relies on linear algebra to cluster such datasets. Experimental results on identifying product categories from product names obtained from the US Amazon website indicate that the algorithm can be competitive against state-of-the-art clustering algorithms.",

keywords = "Subspace Clustering, Principal Angles, High-dimensionality, Short texts",

author = "Hankui Peng and Pavlidis, {Nicos Georgios} and Eckley, {Idris Arthur} and Ioannis Tsalamanis",

note = "{\textcopyright}2018 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.; Advances in High-Dimensional Big Data in conjunction with the 2018 IEEE International Conference on Big Data (IEEE BigData 2018) ; Conference date: 10-12-2018 Through 13-12-2018",

year = "2019",

month = jan,

day = "24",

language = "English",

pages = "3780--3783",

booktitle = "2018 IEEE International Conference on Big Data (Big Data)",

publisher = "IEEE",

url = "https://sites.google.com/site/adhdbigdata3/home",

}

RIS

TY - GEN

T1 - Subspace Clustering of Very Sparse High-Dimensional Data

AU - Peng, Hankui

AU - Pavlidis, Nicos Georgios

AU - Eckley, Idris Arthur

AU - Tsalamanis, Ioannis

N1 - ©2018 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

PY - 2019/1/24

Y1 - 2019/1/24

N2 - In this paper we consider the problem of clustering collections of very short texts using subspace clustering. This problem arises in many applications such as product categorisation, fraud detection, and sentiment analysis. The main challenge lies in the fact that the vectorial representation of short texts is both high-dimensional, due to the large number of unique terms in the corpus, and extremely sparse, as each text contains a verysmall number of words with no repetition. We propose a new, simple subspace clustering algorithm that relies on linear algebra to cluster such datasets. Experimental results on identifying product categories from product names obtained from the US Amazon website indicate that the algorithm can be competitive against state-of-the-art clustering algorithms.

AB - In this paper we consider the problem of clustering collections of very short texts using subspace clustering. This problem arises in many applications such as product categorisation, fraud detection, and sentiment analysis. The main challenge lies in the fact that the vectorial representation of short texts is both high-dimensional, due to the large number of unique terms in the corpus, and extremely sparse, as each text contains a verysmall number of words with no repetition. We propose a new, simple subspace clustering algorithm that relies on linear algebra to cluster such datasets. Experimental results on identifying product categories from product names obtained from the US Amazon website indicate that the algorithm can be competitive against state-of-the-art clustering algorithms.

KW - Subspace Clustering

KW - Principal Angles

KW - High-dimensionality

KW - Short texts

M3 - Conference contribution/Paper

SP - 3780

EP - 3783

BT - 2018 IEEE International Conference on Big Data (Big Data)

PB - IEEE

T2 - Advances in High-Dimensional Big Data in conjunction with the 2018 IEEE International Conference on Big Data (IEEE BigData 2018)

Y2 - 10 December 2018 through 13 December 2018

ER -

Research

Associated organisational units

Electronic data

Keywords