Home > Research > Publications & Outputs > Big Data Analytics for Electricity Theft Detect...

Electronic data

  • PowerTech_Revised

    Rights statement: ©2021 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

    Accepted author manuscript, 1.05 MB, PDF document

    Available under license: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License


Text available via DOI:

View graph of relations

Big Data Analytics for Electricity Theft Detection in Smart Grids

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Publication date29/07/2021
Host publication2021 IEEE Madrid PowerTech - 14th IEEE Power and Energy Society PowerTech Conference
Number of pages6
ISBN (Electronic)9781665435970
<mark>Original language</mark>English


In Smart Grids (SG), Electricity Theft Detection (ETD) is of great importance because it makes the SG cost-efficient. Existing methods for ETD cannot efficiently handle data imbalance, missing values, variance and non-linear data problems in the smart meter data. Therefore, an effective integrated strategy is required to address underlying issues and accurately detect electricity theft using big data. In this work, a simple yet effective approach is proposed by integrating two different modules, such as data pre-processing and classification, in a single framework. The first module involves data imputation, outliers handling, standardization and class balancing steps to generate quality data for classifier training. The second module classifies honest and dishonest users with a Support Vector Machine (SVM) classifier. To improve the classifier’s learning trend and accuracy, a Bayesian optimization algorithm is used to tune SVM’s hyperparameters. Simulation results confirm that the proposed framework for ETD significantly outperforms previous machine learning approaches such as random forest, logistic regression and SVM in terms of accuracy.