The use of saturated count models for synthesis of large confidential administrative databases

School Of Mathematical Sciences

Associated organisational unit

Medical and Social Statistics

Electronic data

2022jacksonphd
Final published version, 1.87 MB, PDF document

Text available via DOI:

https://doi.org/10.17635/lancaster/thesis/1860
Final published version

Keywords

Synthetic data, Statistical disclosure control, count distributions, tabular data

View graph of relations

Research output: Thesis › Doctoral Thesis

Published

James Jackson

More...

Publication date	2022
Number of pages	151
Qualification	PhD
Awarding Institution	Lancaster University
Supervisors/Advisors	Mitra, Robin , Supervisor, External person Francis, Brian, Supervisor Dove, Iain, Supervisor, External person
Award date	7/12/2022
Publisher	Lancaster University
<mark>Original language</mark>	English

Abstract

Synthetic data sets are being increasingly used to protect data confidentiality. In the three decades since they were first introduced, methods for synthetic data generation have evolved, but mainly within the domain of survey data sets. As greater interest is being taken in utilising administrative data for statistical purposes, there is inevitably greater interest in creating synthetic administrative databases. Yet there are characteristics of these databases that require special attention from a synthesis perspective, such as their size and the presence of structural zeros. This thesis, through the fitting of saturated models in conjunction with overdispersed count distributions, presents a mechanism that allows large administrative databases to be synthesized efficiently. This thesis also proposes a concept of satisfying risk and utility metrics a priori - that is, prior to synthetic data generation - using the synthesis mechanism’s tuning parameters, allowing a more formalized approach to synthesis. The methods are demonstrated empirically throughout, primarily through synthesizing a database that can be viewed as a close substitute to the English School Census.

Research

Associated organisational unit

Electronic data

Text available via DOI:

Keywords

The use of saturated count models for synthesis of large confidential administrative databases

Abstract

Quick Links

Connect With Us

Faculties & Depts

Contact Us