Using saturated models for data synthesis

School Of Mathematical Sciences

Associated organisational unit

Medical and Social Statistics

Electronic data

Jackson et al. (2022) IWSM
Final published version, 711 KB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Keywords

Synthetic data, Data privacy, Count models

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper

Published

James Jackson
Brian Francis
Robin Mitra
Iain Dove

More...

Publication date	18/07/2022
Host publication	Proceedings of the 36th International Workshop on Statistical Modelling: July 18-22, 2022 - Trieste, Italy
Editors	Nicola Torelli, Ruggero Bellio, Vito Muggeo
Publisher	EUT Edizioni Università di Trieste, Trieste 2022
Pages	205-210
Number of pages	6
ISBN (electronic)	9788855113090
<mark>Original language</mark>	English
Event	36th International Workshop on Statistical Modelling: July 18-22, 2022 - Trieste, Italy - Università di Trieste, Trieste, Italy Duration: 18/07/2022 → 22/07/2022 Conference number: 36 https://www.iwsm2022.com/

Conference

Conference	36th International Workshop on Statistical Modelling
Abbreviated title	IWSM
Country/Territory	Italy
City	Trieste
Period	18/07/22 → 22/07/22
Internet address	https://www.iwsm2022.com/

Conference

Conference	36th International Workshop on Statistical Modelling
Abbreviated title	IWSM
Country/Territory	Italy
City	Trieste
Period	18/07/22 → 22/07/22
Internet address	https://www.iwsm2022.com/

Abstract

The use of synthetic data sets are becoming ever more prevalent,
as regulations such as the General Data Protection Regulation (GDPR), which place greater demands on the protection of individuals’ personal data, are coupled with the conflicting demand to make more data available to researchers. This paper discusses the approach of synthesizing categorical data at the aggregated
(contingency table) level using a saturated count model, which adds noise - and hence protection - to cell counts. The paper also discusses how distributional properties of synthesis models are intrinsic to generating synthetic data with suitable risk and utility profiles.

Research

Associated organisational unit

Electronic data

Keywords