Rights statement: This is an Accepted Manuscript of an article published by Taylor & Francis in Journal of Computational and Graphical Statistics on 19/11/2021, available online: https://www.tandfonline.com/doi/full/10.1080/10618600.2021.1987257
Accepted author manuscript, 2.76 MB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License
Final published version
Licence: CC BY: Creative Commons Attribution 4.0 International License
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Subset Multivariate Collective And Point Anomaly Detection
AU - Fisch, Alex
AU - Eckley, Idris
AU - Fearnhead, Paul
N1 - This is an Accepted Manuscript of an article published by Taylor & Francis in Journal of Computational and Graphical Statistics on 19/11/2021, available online: https://www.tandfonline.com/doi/full/10.1080/10618600.2021.1987257
PY - 2022/6/30
Y1 - 2022/6/30
N2 - In recent years, there has been a growing interest in identifying anomalous structure within multivariate data sequences. We consider the problem of detecting collective anomalies, corresponding to intervals where one, or more, of the data sequences behaves anomalously. We first develop a test for a single collective anomaly that has power to simultaneously detect anomalies that are either rare, that is affecting few data sequences, or common. We then show how to detect multiple anomalies in a way that is computationally efficient but avoids the approximations inherent in binary segmentation-like approaches. This approach is shown to consistently estimate the number and location of the collective anomalies -- a property that has not previously been shown for competing methods. Our approach can be made robust to point anomalies and can allow for the anomalies to be imperfectly aligned. We show the practical usefulness of allowing for imperfect alignments through a resulting increase in power to detect regions of copy number variation.
AB - In recent years, there has been a growing interest in identifying anomalous structure within multivariate data sequences. We consider the problem of detecting collective anomalies, corresponding to intervals where one, or more, of the data sequences behaves anomalously. We first develop a test for a single collective anomaly that has power to simultaneously detect anomalies that are either rare, that is affecting few data sequences, or common. We then show how to detect multiple anomalies in a way that is computationally efficient but avoids the approximations inherent in binary segmentation-like approaches. This approach is shown to consistently estimate the number and location of the collective anomalies -- a property that has not previously been shown for competing methods. Our approach can be made robust to point anomalies and can allow for the anomalies to be imperfectly aligned. We show the practical usefulness of allowing for imperfect alignments through a resulting increase in power to detect regions of copy number variation.
KW - Copy number variations
KW - Dynamic programming
KW - Epidemic changepoints
KW - Outliers
KW - Robust statistics
U2 - 10.1080/10618600.2021.1987257
DO - 10.1080/10618600.2021.1987257
M3 - Journal article
VL - 31
SP - 574
EP - 585
JO - Journal of Computational and Graphical Statistics
JF - Journal of Computational and Graphical Statistics
SN - 1061-8600
IS - 2
ER -