Using saturated models for data synthesis

School Of Mathematical Sciences

Associated organisational unit

Medical and Social Statistics

Electronic data

Jackson et al. (2022) IWSM
Final published version, 711 KB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Keywords

Synthetic data, Data privacy, Count models

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper

Published

Standard

Using saturated models for data synthesis. / Jackson, James ; Francis, Brian; Mitra, Robin et al.
Proceedings of the 36th International Workshop on Statistical Modelling: July 18-22, 2022 - Trieste, Italy. ed. / Nicola Torelli; Ruggero Bellio; Vito Muggeo. EUT Edizioni Università di Trieste, Trieste 2022, 2022. p. 205-210 34.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper

Harvard

Jackson, J , Francis, B, Mitra, R & Dove, I 2022, Using saturated models for data synthesis. in N Torelli, R Bellio & V Muggeo (eds), Proceedings of the 36th International Workshop on Statistical Modelling: July 18-22, 2022 - Trieste, Italy., 34, EUT Edizioni Università di Trieste, Trieste 2022, pp. 205-210, 36th International Workshop on Statistical Modelling, Trieste, Italy, 18/07/22.

APA

Jackson, J., Francis, B., Mitra, R., & Dove, I. (2022). Using saturated models for data synthesis. In N. Torelli, R. Bellio, & V. Muggeo (Eds.), Proceedings of the 36th International Workshop on Statistical Modelling: July 18-22, 2022 - Trieste, Italy (pp. 205-210). Article 34 EUT Edizioni Università di Trieste, Trieste 2022.

Vancouver

Jackson J , Francis B, Mitra R, Dove I. Using saturated models for data synthesis. In Torelli N, Bellio R, Muggeo V, editors, Proceedings of the 36th International Workshop on Statistical Modelling: July 18-22, 2022 - Trieste, Italy. EUT Edizioni Università di Trieste, Trieste 2022. 2022. p. 205-210. 34

Author

Jackson, James ; Francis, Brian ; Mitra, Robin et al. / Using saturated models for data synthesis. Proceedings of the 36th International Workshop on Statistical Modelling: July 18-22, 2022 - Trieste, Italy. editor / Nicola Torelli ; Ruggero Bellio ; Vito Muggeo. EUT Edizioni Università di Trieste, Trieste 2022, 2022. pp. 205-210

Bibtex

@inproceedings{1080db38d5534d348c495f3f8f80b50e,

title = "Using saturated models for data synthesis",

abstract = "The use of synthetic data sets are becoming ever more prevalent,as regulations such as the General Data Protection Regulation (GDPR), which place greater demands on the protection of individuals{\textquoteright} personal data, are coupled with the conflicting demand to make more data available to researchers. This paper discusses the approach of synthesizing categorical data at the aggregated(contingency table) level using a saturated count model, which adds noise - and hence protection - to cell counts. The paper also discusses how distributional properties of synthesis models are intrinsic to generating synthetic data with suitable risk and utility profiles.",

keywords = "Synthetic data, Data privacy, Count models",

author = "James Jackson and Brian Francis and Robin Mitra and Iain Dove",

year = "2022",

month = jul,

day = "18",

language = "English",

pages = "205--210",

editor = "Nicola Torelli and Ruggero Bellio and Vito Muggeo",

booktitle = "Proceedings of the 36th International Workshop on Statistical Modelling",

publisher = "EUT Edizioni Universit{\`a} di Trieste, Trieste 2022",

note = "36th International Workshop on Statistical Modelling : July 18-22, 2022 - Trieste, Italy, IWSM ; Conference date: 18-07-2022 Through 22-07-2022",

url = "https://www.iwsm2022.com/",

}

RIS

TY - GEN

T1 - Using saturated models for data synthesis

AU - Jackson, James

AU - Francis, Brian

AU - Mitra, Robin

AU - Dove, Iain

N1 - Conference code: 36

PY - 2022/7/18

Y1 - 2022/7/18

N2 - The use of synthetic data sets are becoming ever more prevalent,as regulations such as the General Data Protection Regulation (GDPR), which place greater demands on the protection of individuals’ personal data, are coupled with the conflicting demand to make more data available to researchers. This paper discusses the approach of synthesizing categorical data at the aggregated(contingency table) level using a saturated count model, which adds noise - and hence protection - to cell counts. The paper also discusses how distributional properties of synthesis models are intrinsic to generating synthetic data with suitable risk and utility profiles.

AB - The use of synthetic data sets are becoming ever more prevalent,as regulations such as the General Data Protection Regulation (GDPR), which place greater demands on the protection of individuals’ personal data, are coupled with the conflicting demand to make more data available to researchers. This paper discusses the approach of synthesizing categorical data at the aggregated(contingency table) level using a saturated count model, which adds noise - and hence protection - to cell counts. The paper also discusses how distributional properties of synthesis models are intrinsic to generating synthetic data with suitable risk and utility profiles.

KW - Synthetic data

KW - Data privacy

KW - Count models

M3 - Conference contribution/Paper

SP - 205

EP - 210

BT - Proceedings of the 36th International Workshop on Statistical Modelling

A2 - Torelli, Nicola

A2 - Bellio, Ruggero

A2 - Muggeo, Vito

PB - EUT Edizioni Università di Trieste, Trieste 2022

T2 - 36th International Workshop on Statistical Modelling

Y2 - 18 July 2022 through 22 July 2022

ER -

Research

Associated organisational unit

Electronic data

Keywords