Home > Research > Publications & Outputs > Hybrid self-organizing feature map (SOM) for an...

Electronic data

  • Binder1

    Rights statement: This is the author’s version of a work that was accepted for publication in Information Sciences. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Information Sciences, 494, 2019 DOI: 10.1016/j.ins.2019.03.069

    Accepted author manuscript, 1.75 MB, PDF document

    Available under license: CC BY-NC-ND

Links

Text available via DOI:

View graph of relations

Hybrid self-organizing feature map (SOM) for anomaly detection in cloud infrastructures using granular clustering based upon value-difference metrics

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published
Close
<mark>Journal publication date</mark>1/08/2019
<mark>Journal</mark>Information Sciences
Volume494
Number of pages31
Pages (from-to)247-277
Publication StatusPublished
Early online date18/04/19
<mark>Original language</mark>English

Abstract

We have witnessed an increase in the availability of data from diverse sources over the past few years. Cloud computing, big data and Internet-of-Things (IoT) are distinctive cases of such an increase which demand novel approaches for data analytics in order to process and analyze huge volumes of data for security and business use. Cloud computing has been becoming popular for critical structure IT mainly due to cost savings and dynamic scalability. Current offerings, however, are not mature enough with respect to stringent security and resilience requirements. Mechanisms such as anomaly detection hybrid systems are required in order to protect against various challenges that include network based attacks, performance issues and operational anomalies. Such hybrid AI systems include Neural Networks, blackboard systems, belief (Bayesian) networks, case-based reasoning and rule-based systems and can be implemented in a variety of ways. Traffic in the cloud comes from multiple heterogeneous domains and changes rapidly due to the variety of operational characteristics of the tenants using the cloud and the elasticity of the provided services. The underlying detection mechanisms rely upon measurements drawn from multiple sources. However, the characteristics of the distribution of measurements within specific subspaces might be unknown. We argue in this paper that there is a need to cluster the observed data during normal network operation into multiple subspaces each one of them featuring specific local attributes, i.e. granules of information. Clustering is implemented by the inference engine of a model hybrid NN system. Several variations of the so-called value-difference metric (VDM) are investigated like local histograms and the Canberra distance for scalar attributes, the Jaccard distance for binary word attributes, rough sets as well as local histograms over an aggregate ordering distance and the Canberra measure for vectorial attributes. Low-dimensional subspace representations of each group of points (measurements) in the context of anomaly detection in critical cloud implementations is based upon VD metrics and can be either parametric or non-parametric. A novel application of a Self-Organizing-Feature Map (SOFM) of reduced/aggregate ordered sets of objects featuring VD metrics (as obtained from distributed network measurements) is proposed. Each node of the SOFM stands for a structured local distribution of such objects within the input space. The so-called Neighborhood-based Outlier Factor (NOOF) is defined for such reduced/aggregate ordered sets of objects as a value-difference metric of histogrammes. Measurements that do not belong to local distributions are detected as anomalies, i.e. outliers of the trained SOFM. Several methods of subspace clustering using Expectation-Maximization Gaussian Mixture Models (a parametric approach) as well as local data densities (a non-parametric approach) are outlined and compared against the proposed method using data that are obtained from our cloud testbed in emulated anomalous traffic conditions. The results—which are obtained from a model NN system—indicate that the proposed method performs well in comparison with conventional techniques.

Bibliographic note

This is the author’s version of a work that was accepted for publication in Information Sciences. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Information Sciences, 494, 2019 DOI: 10.1016/j.ins.2019.03.069