Home > Research > Publications & Outputs > Unsupervised classification of data streams bas...

Electronic data

  • bare_conf

    Rights statement: ©2016 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

    Final published version, 911 KB, PDF document

    Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Links

Text available via DOI:

View graph of relations

Unsupervised classification of data streams based on typicality and eccentricity data analytics

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paper

Published
Close
NullPointerException

Abstract

In this paper, we propose a novel approach to unsupervised and online data classification. The algorithm is based on the statistical analysis of selected features and development of a self-evolving fuzzy-rule-basis. It starts learning from an empty rule basis and, instead of offline training, it learns “on-the-fly”.

It is free of parameters and, thus, fuzzy rules, number, size or radius of the classes do not need to be pre-defined. It is very suitable for the classification of online data streams with realtime
constraints. The past data do not need to be stored in memory, since that the algorithm is recursive, which makes it memory and computational power efficient. It is able to handle
concept-drift and concept-evolution due to its evolving nature, which means that, not only rules/classes can be updated, but new classes can be created as new concepts emerge from the
data. It can perform fuzzy classification/soft-labeling, which is preferred over traditional crisp classification in many areas of application. The algorithm was validated with an industrial pilot
plant, where online calculated period and amplitude of control signal were used as input to a fault diagnosis application. The approach, however, is generic and can be applied to different
problems and with much higher dimensional inputs. The results obtained from the real data are very significant.

Bibliographic note

©2016 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.