Home > Research > Publications & Outputs > Empirical data analysis

Electronic data

  • 1008

    Rights statement: ©2016 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

    Accepted author manuscript, 1.06 MB, PDF document

    Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Links

Text available via DOI:

View graph of relations

Empirical data analysis: a new tool for data analytics

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

Empirical data analysis: a new tool for data analytics. / Angelov, Plamen Parvanov; Gu, Xiaowei; Principe, Jose et al.
2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2016. p. 52-59 1008.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Angelov, PP, Gu, X, Principe, J & Kangin, D 2016, Empirical data analysis: a new tool for data analytics. in 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC)., 1008, IEEE, pp. 52-59, IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, Hungary, 9/10/16. https://doi.org/10.1109/SMC.2016.7844219

APA

Angelov, P. P., Gu, X., Principe, J., & Kangin, D. (2016). Empirical data analysis: a new tool for data analytics. In 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC) (pp. 52-59). Article 1008 IEEE. https://doi.org/10.1109/SMC.2016.7844219

Vancouver

Angelov PP, Gu X, Principe J, Kangin D. Empirical data analysis: a new tool for data analytics. In 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE. 2016. p. 52-59. 1008 doi: 10.1109/SMC.2016.7844219

Author

Angelov, Plamen Parvanov ; Gu, Xiaowei ; Principe, Jose et al. / Empirical data analysis : a new tool for data analytics. 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2016. pp. 52-59

Bibtex

@inproceedings{195627bd64a74f9d89f0cb98541e23b0,
title = "Empirical data analysis: a new tool for data analytics",
abstract = "In this paper, a novel empirical data analysis approach (abbreviated as EDA) is introduced which is entirely data-driven and free from restricting assumptions and predefined problem- or user-specific parameters and thresholds. It is well known that the traditional probability theory is restricted by strong prior assumptions which are often impractical and do not hold in real problems. Machine learning methods, on the other hand, are closer to the real problems but they usually rely on problem- or user-specific parameters or thresholds making it rather art than science. In this paper we introduce a theoretically sound yet practically unrestricted and widely applicable approach that is based on the density in the data space. Since the data may have exactly the same value multiple times we distinguish between the data points and unique locations in thedata space. The number of data points, k is larger or equal to the number of unique locations, l and at least one data point occupies each unique location. The number of different data points thathave exactly the same location in the data space (equal value), f can be seen as frequency. Through the combination of the spatial density and the frequency of occurrence of discrete data points, anew concept called multimodal typicality, τ MM is proposed in this paper. It offers a closed analytical form that represents ensemble properties derived entirely from the empirical observations of data. Moreover, it is very close (yet different) from the histograms, from the probability density function (pdf) as well as from fuzzy set membership functions. Remarkably, there is no need to perform complicated pre-processing like clustering to get the multimodal representation. Moreover, the closed form for the case of Euclidean, Mahalanobis type of distance as well as some other forms (e.g. cosine-based dissimilarity) can be expressed recursively making it applicable to data streams and online algorithms. Inference/estimation of the typicality of data points that were not present in the data so far can be made. This new concept allows to rethink the very foundations of statistical and machine learning as well as to develop a series of anomalydetection, clustering, classification, prediction, control and other algorithms. ",
keywords = "empirical data analysis, multimodal typicality, data-driven, recursive calculation, inference, estimation",
author = "Angelov, {Plamen Parvanov} and Xiaowei Gu and Jose Principe and Dmitry Kangin",
note = "{\textcopyright}2016 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.; IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS ; Conference date: 09-10-2016",
year = "2016",
month = oct,
day = "9",
doi = "10.1109/SMC.2016.7844219",
language = "English",
isbn = "9781509018987",
pages = "52--59",
booktitle = "2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC)",
publisher = "IEEE",

}

RIS

TY - GEN

T1 - Empirical data analysis

T2 - IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS

AU - Angelov, Plamen Parvanov

AU - Gu, Xiaowei

AU - Principe, Jose

AU - Kangin, Dmitry

N1 - ©2016 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

PY - 2016/10/9

Y1 - 2016/10/9

N2 - In this paper, a novel empirical data analysis approach (abbreviated as EDA) is introduced which is entirely data-driven and free from restricting assumptions and predefined problem- or user-specific parameters and thresholds. It is well known that the traditional probability theory is restricted by strong prior assumptions which are often impractical and do not hold in real problems. Machine learning methods, on the other hand, are closer to the real problems but they usually rely on problem- or user-specific parameters or thresholds making it rather art than science. In this paper we introduce a theoretically sound yet practically unrestricted and widely applicable approach that is based on the density in the data space. Since the data may have exactly the same value multiple times we distinguish between the data points and unique locations in thedata space. The number of data points, k is larger or equal to the number of unique locations, l and at least one data point occupies each unique location. The number of different data points thathave exactly the same location in the data space (equal value), f can be seen as frequency. Through the combination of the spatial density and the frequency of occurrence of discrete data points, anew concept called multimodal typicality, τ MM is proposed in this paper. It offers a closed analytical form that represents ensemble properties derived entirely from the empirical observations of data. Moreover, it is very close (yet different) from the histograms, from the probability density function (pdf) as well as from fuzzy set membership functions. Remarkably, there is no need to perform complicated pre-processing like clustering to get the multimodal representation. Moreover, the closed form for the case of Euclidean, Mahalanobis type of distance as well as some other forms (e.g. cosine-based dissimilarity) can be expressed recursively making it applicable to data streams and online algorithms. Inference/estimation of the typicality of data points that were not present in the data so far can be made. This new concept allows to rethink the very foundations of statistical and machine learning as well as to develop a series of anomalydetection, clustering, classification, prediction, control and other algorithms.

AB - In this paper, a novel empirical data analysis approach (abbreviated as EDA) is introduced which is entirely data-driven and free from restricting assumptions and predefined problem- or user-specific parameters and thresholds. It is well known that the traditional probability theory is restricted by strong prior assumptions which are often impractical and do not hold in real problems. Machine learning methods, on the other hand, are closer to the real problems but they usually rely on problem- or user-specific parameters or thresholds making it rather art than science. In this paper we introduce a theoretically sound yet practically unrestricted and widely applicable approach that is based on the density in the data space. Since the data may have exactly the same value multiple times we distinguish between the data points and unique locations in thedata space. The number of data points, k is larger or equal to the number of unique locations, l and at least one data point occupies each unique location. The number of different data points thathave exactly the same location in the data space (equal value), f can be seen as frequency. Through the combination of the spatial density and the frequency of occurrence of discrete data points, anew concept called multimodal typicality, τ MM is proposed in this paper. It offers a closed analytical form that represents ensemble properties derived entirely from the empirical observations of data. Moreover, it is very close (yet different) from the histograms, from the probability density function (pdf) as well as from fuzzy set membership functions. Remarkably, there is no need to perform complicated pre-processing like clustering to get the multimodal representation. Moreover, the closed form for the case of Euclidean, Mahalanobis type of distance as well as some other forms (e.g. cosine-based dissimilarity) can be expressed recursively making it applicable to data streams and online algorithms. Inference/estimation of the typicality of data points that were not present in the data so far can be made. This new concept allows to rethink the very foundations of statistical and machine learning as well as to develop a series of anomalydetection, clustering, classification, prediction, control and other algorithms.

KW - empirical data analysis

KW - multimodal typicality

KW - data-driven

KW - recursive calculation

KW - inference

KW - estimation

U2 - 10.1109/SMC.2016.7844219

DO - 10.1109/SMC.2016.7844219

M3 - Conference contribution/Paper

SN - 9781509018987

SP - 52

EP - 59

BT - 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

PB - IEEE

Y2 - 9 October 2016

ER -