Home > Research > Publications & Outputs > Using saturated models for data synthesis

Electronic data

  • Jackson et al. (2022) IWSM

    Final published version, 711 KB, PDF document

    Available under license: CC BY: Creative Commons Attribution 4.0 International License

View graph of relations

Using saturated models for data synthesis

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paper

Published

Standard

Using saturated models for data synthesis. / Jackson, James; Francis, Brian; Mitra, Robin et al.
Proceedings of the 36th International Workshop on Statistical Modelling: July 18-22, 2022 - Trieste, Italy. ed. / Nicola Torelli; Ruggero Bellio; Vito Muggeo. EUT Edizioni Università di Trieste, Trieste 2022, 2022. p. 205-210 34.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paper

Harvard

Jackson, J, Francis, B, Mitra, R & Dove, I 2022, Using saturated models for data synthesis. in N Torelli, R Bellio & V Muggeo (eds), Proceedings of the 36th International Workshop on Statistical Modelling: July 18-22, 2022 - Trieste, Italy., 34, EUT Edizioni Università di Trieste, Trieste 2022, pp. 205-210, 36th International Workshop on Statistical Modelling, Trieste, Italy, 18/07/22.

APA

Jackson, J., Francis, B., Mitra, R., & Dove, I. (2022). Using saturated models for data synthesis. In N. Torelli, R. Bellio, & V. Muggeo (Eds.), Proceedings of the 36th International Workshop on Statistical Modelling: July 18-22, 2022 - Trieste, Italy (pp. 205-210). Article 34 EUT Edizioni Università di Trieste, Trieste 2022.

Vancouver

Jackson J, Francis B, Mitra R, Dove I. Using saturated models for data synthesis. In Torelli N, Bellio R, Muggeo V, editors, Proceedings of the 36th International Workshop on Statistical Modelling: July 18-22, 2022 - Trieste, Italy. EUT Edizioni Università di Trieste, Trieste 2022. 2022. p. 205-210. 34

Author

Jackson, James ; Francis, Brian ; Mitra, Robin et al. / Using saturated models for data synthesis. Proceedings of the 36th International Workshop on Statistical Modelling: July 18-22, 2022 - Trieste, Italy. editor / Nicola Torelli ; Ruggero Bellio ; Vito Muggeo. EUT Edizioni Università di Trieste, Trieste 2022, 2022. pp. 205-210

Bibtex

@inproceedings{1080db38d5534d348c495f3f8f80b50e,
title = "Using saturated models for data synthesis",
abstract = "The use of synthetic data sets are becoming ever more prevalent,as regulations such as the General Data Protection Regulation (GDPR), which place greater demands on the protection of individuals{\textquoteright} personal data, are coupled with the conflicting demand to make more data available to researchers. This paper discusses the approach of synthesizing categorical data at the aggregated(contingency table) level using a saturated count model, which adds noise - and hence protection - to cell counts. The paper also discusses how distributional properties of synthesis models are intrinsic to generating synthetic data with suitable risk and utility profiles.",
keywords = "Synthetic data, Data privacy, Count models",
author = "James Jackson and Brian Francis and Robin Mitra and Iain Dove",
year = "2022",
month = jul,
day = "18",
language = "English",
pages = "205--210",
editor = "Nicola Torelli and Ruggero Bellio and Vito Muggeo",
booktitle = "Proceedings of the 36th International Workshop on Statistical Modelling",
publisher = "EUT Edizioni Universit{\`a} di Trieste, Trieste 2022",
note = "36th International Workshop on Statistical Modelling : July 18-22, 2022 - Trieste, Italy, IWSM ; Conference date: 18-07-2022 Through 22-07-2022",
url = "https://www.iwsm2022.com/",

}

RIS

TY - GEN

T1 - Using saturated models for data synthesis

AU - Jackson, James

AU - Francis, Brian

AU - Mitra, Robin

AU - Dove, Iain

N1 - Conference code: 36

PY - 2022/7/18

Y1 - 2022/7/18

N2 - The use of synthetic data sets are becoming ever more prevalent,as regulations such as the General Data Protection Regulation (GDPR), which place greater demands on the protection of individuals’ personal data, are coupled with the conflicting demand to make more data available to researchers. This paper discusses the approach of synthesizing categorical data at the aggregated(contingency table) level using a saturated count model, which adds noise - and hence protection - to cell counts. The paper also discusses how distributional properties of synthesis models are intrinsic to generating synthetic data with suitable risk and utility profiles.

AB - The use of synthetic data sets are becoming ever more prevalent,as regulations such as the General Data Protection Regulation (GDPR), which place greater demands on the protection of individuals’ personal data, are coupled with the conflicting demand to make more data available to researchers. This paper discusses the approach of synthesizing categorical data at the aggregated(contingency table) level using a saturated count model, which adds noise - and hence protection - to cell counts. The paper also discusses how distributional properties of synthesis models are intrinsic to generating synthetic data with suitable risk and utility profiles.

KW - Synthetic data

KW - Data privacy

KW - Count models

M3 - Conference contribution/Paper

SP - 205

EP - 210

BT - Proceedings of the 36th International Workshop on Statistical Modelling

A2 - Torelli, Nicola

A2 - Bellio, Ruggero

A2 - Muggeo, Vito

PB - EUT Edizioni Università di Trieste, Trieste 2022

T2 - 36th International Workshop on Statistical Modelling

Y2 - 18 July 2022 through 22 July 2022

ER -