Low-Density Cluster Separators for Large, High-Dimensional, Mixed and Non-Linearly Separable Data.

Associated organisational unit

STOR-i Centre for Doctoral Training

Electronic data

2018katieyatesphd
Final published version, 14.7 MB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Text available via DOI:

https://doi.org/10.17635/lancaster/thesis/204
Final published version

View graph of relations

Research output: Thesis › Doctoral Thesis

Published

Katie Yates

More...

Publication date	2018
Number of pages	218
Qualification	PhD
Awarding Institution	Lancaster University
Supervisors/Advisors	Pavlidis, Nicos, Supervisor Sherlock, Chris, Supervisor
Place of Publication	Lancaster
Publisher	Lancaster University
<mark>Original language</mark>	English

Abstract

The location of groups of similar observations (clusters) in data is a well-studied problem,
and has many practical applications. There are a wide range of approaches to clustering,
which rely on different definitions of similarity, and are appropriate for datasets with different
characteristics. Despite a rich literature, there exist a number of open problems in
clustering, and limitations to existing algorithms.
This thesis develops methodology for clustering high-dimensional, mixed datasets with
complex clustering structures, using low-density cluster separators that bi-partition datasets
using cluster boundaries that pass through regions of minimal density, separating regions of
high probability density, associated with clusters. The bi-partitions arising from a succession
of minimum density cluster separators are combined using divisive hierarchical and partitional
algorithms, to locate a complete clustering, while estimating the number of clusters.
The proposed algorithms locate cluster separators using one-dimensional arbitrarily oriented
subspaces, circumventing the challenges associated with clustering in high-dimensional
spaces. This requires continuous observations; thus, to extend the applicability of the proposed
algorithms to mixed datasets, methods for producing an appropriate continuous
representation of datasets containing non-continuous features are investigated. The exact
evaluation of the density intersected by a cluster boundary is restricted to linear separators.
This limitation is lifted by a non-linear mapping of the original observations into a feature
space, in which a linear separator permits the correct identification of non-linearly separable
clusters in the original dataset.
In large, high-dimensional datasets, searching for one-dimensional subspaces, which result
in a minimum density separator is computationally expensive. Therefore, a computationally
efficient approach to low-density cluster separation using approximately optimal
projection directions is proposed, which searches over a collection of one-dimensional random
projections for an appropriate subspace for cluster identification. The proposed approaches
produce high-quality partitions, that are competitive with well-established and
state-of-the-art algorithms.

Research

Associated organisational unit

Electronic data

Text available via DOI:

Low-Density Cluster Separators for Large, High-Dimensional, Mixed and Non-Linearly Separable Data.

Abstract

Quick Links

Connect With Us

Faculties & Depts

Contact Us