Final published version
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Incremental estimation of low-density separating hyperplanes for clustering large data sets
AU - Hofmeyr, David P.
PY - 2023/7/31
Y1 - 2023/7/31
N2 - An efficient unsupervised method for obtaining low-density hyperplane separators is proposed. The method is based on a modified stochastic gradient descent applied on a convolution of the empirical distribution function with a smoothing kernel. Low-density hyperplanes are motivated by the fact that they avoid intersecting high density regions, and so tend to pass between high density clusters, thus separating them from one another, while keeping the individual clusters intact. Multiple hyperplanes can be combined in a hierarchical model to obtain a complete clustering solution. A simple post-processing of solutions induced by large collections of hyperplanes yields an efficient and accurate clustering method, capable of automatically selecting the number of clusters. Experiments show that the proposed method is highly competitive in terms of both speed and accuracy when compared with relevant benchmarks. Code is available in the form of an R package at https://github.com/DavidHofmeyr/iMDH
AB - An efficient unsupervised method for obtaining low-density hyperplane separators is proposed. The method is based on a modified stochastic gradient descent applied on a convolution of the empirical distribution function with a smoothing kernel. Low-density hyperplanes are motivated by the fact that they avoid intersecting high density regions, and so tend to pass between high density clusters, thus separating them from one another, while keeping the individual clusters intact. Multiple hyperplanes can be combined in a hierarchical model to obtain a complete clustering solution. A simple post-processing of solutions induced by large collections of hyperplanes yields an efficient and accurate clustering method, capable of automatically selecting the number of clusters. Experiments show that the proposed method is highly competitive in terms of both speed and accuracy when compared with relevant benchmarks. Code is available in the form of an R package at https://github.com/DavidHofmeyr/iMDH
U2 - 10.1016/j.patcog.2023.109471
DO - 10.1016/j.patcog.2023.109471
M3 - Journal article
VL - 139
JO - Pattern Recognition
JF - Pattern Recognition
SN - 0031-3203
M1 - 109471
ER -