Stochastic Neighbourhood Components Analysis

Management Science

Associated organisational units

Electronic data

SNCA_accepted_manuscript.pdf
Accepted author manuscript, 4.03 MB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Text available via DOI:

https://doi.org/10.1287/ijds.2023.0018
Final published version

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

E-pub ahead of print

Standard

Stochastic Neighbourhood Components Analysis. / Laidler, Graham ; Morgan, Lucy ; Pavlidis, Nicos et al.
In: INFORMS journal on Data Science, 05.05.2025.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Laidler, G , Morgan, L , Pavlidis, N & Nelson, B 2025, 'Stochastic Neighbourhood Components Analysis', INFORMS journal on Data Science. https://doi.org/10.1287/ijds.2023.0018

APA

Laidler, G., Morgan, L., Pavlidis, N., & Nelson, B. (2025). Stochastic Neighbourhood Components Analysis. INFORMS journal on Data Science. Advance online publication. https://doi.org/10.1287/ijds.2023.0018

Vancouver

Laidler G , Morgan L , Pavlidis N , Nelson B. Stochastic Neighbourhood Components Analysis. INFORMS journal on Data Science. 2025 May 5. Epub 2025 May 5. doi: 10.1287/ijds.2023.0018

Author

Laidler, Graham ; Morgan, Lucy ; Pavlidis, Nicos et al. / Stochastic Neighbourhood Components Analysis. In: INFORMS journal on Data Science. 2025.

Bibtex

@article{8c510b4d88724d7bbe902996c7dc1d7c,

title = "Stochastic Neighbourhood Components Analysis",

abstract = "Distance metric learning is a fundamental task in data mining and is known to enhance the performance of various distance-based algorithms. In this paper, we consider stochastic training data in which repeated feature vectors can belong to different classes, a scenario in which existing methods of metric learning are known to struggle. This type of data is common in stochastic simulations, where multidimensional, recurrent system states are subject to inherent randomness. Classification models on such high-resolution simulation-generated data play a critical role in real-time decision making across diverse applications. This paper presents and implements a stochastic version of the popular neighbourhood components analysis. We demonstrate its behaviour on stochastic data using simulation models and reveal its advantages when used for nearest neighbour classification. Meanwhile, the assumptions of stochastic labelling and repeated feature vectors extend to data from various domains, suggesting that the method can attain broad impact. For example, beyond its applications to system control and decision making with digital twin simulation, it may enhance the analysis of data from sensor networks, recommender systems, and crowdsourced platforms, where stochasticity and recurring feature patterns are typical.",

author = "Graham Laidler and Lucy Morgan and Nicos Pavlidis and Barry Nelson",

year = "2025",

month = may,

day = "5",

doi = "10.1287/ijds.2023.0018",

language = "English",

journal = "INFORMS journal on Data Science",

issn = "2694-4030",

publisher = "INFORMS Institute for Operations Research and the Management Sciences",

}

RIS

TY - JOUR

T1 - Stochastic Neighbourhood Components Analysis

AU - Laidler, Graham

AU - Morgan, Lucy

AU - Pavlidis, Nicos

AU - Nelson, Barry

PY - 2025/5/5

Y1 - 2025/5/5

N2 - Distance metric learning is a fundamental task in data mining and is known to enhance the performance of various distance-based algorithms. In this paper, we consider stochastic training data in which repeated feature vectors can belong to different classes, a scenario in which existing methods of metric learning are known to struggle. This type of data is common in stochastic simulations, where multidimensional, recurrent system states are subject to inherent randomness. Classification models on such high-resolution simulation-generated data play a critical role in real-time decision making across diverse applications. This paper presents and implements a stochastic version of the popular neighbourhood components analysis. We demonstrate its behaviour on stochastic data using simulation models and reveal its advantages when used for nearest neighbour classification. Meanwhile, the assumptions of stochastic labelling and repeated feature vectors extend to data from various domains, suggesting that the method can attain broad impact. For example, beyond its applications to system control and decision making with digital twin simulation, it may enhance the analysis of data from sensor networks, recommender systems, and crowdsourced platforms, where stochasticity and recurring feature patterns are typical.

AB - Distance metric learning is a fundamental task in data mining and is known to enhance the performance of various distance-based algorithms. In this paper, we consider stochastic training data in which repeated feature vectors can belong to different classes, a scenario in which existing methods of metric learning are known to struggle. This type of data is common in stochastic simulations, where multidimensional, recurrent system states are subject to inherent randomness. Classification models on such high-resolution simulation-generated data play a critical role in real-time decision making across diverse applications. This paper presents and implements a stochastic version of the popular neighbourhood components analysis. We demonstrate its behaviour on stochastic data using simulation models and reveal its advantages when used for nearest neighbour classification. Meanwhile, the assumptions of stochastic labelling and repeated feature vectors extend to data from various domains, suggesting that the method can attain broad impact. For example, beyond its applications to system control and decision making with digital twin simulation, it may enhance the analysis of data from sensor networks, recommender systems, and crowdsourced platforms, where stochasticity and recurring feature patterns are typical.

U2 - 10.1287/ijds.2023.0018

DO - 10.1287/ijds.2023.0018

M3 - Journal article

JO - INFORMS journal on Data Science

JF - INFORMS journal on Data Science

SN - 2694-4030

ER -

Research

Associated organisational units

Electronic data

Links

Text available via DOI: