Home > Research > Publications & Outputs > A Bayesian Nonparametric Approach to Differenti...

Links

Text available via DOI:

View graph of relations

A Bayesian Nonparametric Approach to Differentially Private Data

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published
Close
Publication date16/09/2020
Host publicationPrivacy in Statistical Databases: UNESCO Chair in Data Privacy, International Conference, PSD 2020, Proceedings
EditorsJosep Domingo-Ferrer, Krishnamurty Muralidhar
PublisherSpringer
Pages32-48
Number of pages17
ISBN (print)9783030575205
<mark>Original language</mark>English
EventInternational Conference on Privacy in Statistical Databases, PSD 2020 - Tarragona, Spain
Duration: 23/09/202025/09/2020

Conference

ConferenceInternational Conference on Privacy in Statistical Databases, PSD 2020
Country/TerritorySpain
CityTarragona
Period23/09/2025/09/20

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12276 LNCS
ISSN (Print)0302-9743
ISSN (electronic)1611-3349

Conference

ConferenceInternational Conference on Privacy in Statistical Databases, PSD 2020
Country/TerritorySpain
CityTarragona
Period23/09/2025/09/20

Abstract

The protection of private and sensitive data is an important problem of increasing interest due to the vast amount of personal data collected. Differential Privacy is arguably the most dominant approach to address privacy protection, and is currently implemented in both industry and government. In a decentralized paradigm, the sensitive information belonging to each individual will be locally transformed by a known privacy-maintaining mechanism Q. The objective of differential privacy is to allow an analyst to recover the distribution of the raw data, or some functionals of it, while only having access to the transformed data. In this work, we propose a Bayesian nonparametric methodology to perform inference on the distribution of the sensitive data, reformulating the differentially private estimation problem as a latent variable Dirichlet Process mixture model. This methodology has the advantage that it can be applied to any mechanism Q and works as a “black box” procedure, being able to estimate the distribution and functionals thereof using the same MCMC draws and with very little tuning. Also, being a fully nonparametric procedure, it requires very little assumptions on the distribution of the raw data. For the most popular mechanisms Q, like Laplace and Gaussian, we describe efficient specialized MCMC algorithms and provide theoretical guarantees. Experiments on both synthetic and real dataset show a good performance of the proposed method.