Standard
On integrating the number of synthetic data sets m into the a priori synthesis approach. /
Jackson, James; Mitra, Robin
; Francis, Brian et al.
Privacy in Statistical Databases: International Conference, PSD 2022, Paris, France, September 21–23, 2022, Proceedings. ed. / Josep Domingo-Ferrer; Maryline Laurent. Cham: Springer, 2022. p. 205-219 (Lecture Notes in Computer Science).
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
Harvard
Jackson, J, Mitra, R
, Francis, B & Dove, I 2022,
On integrating the number of synthetic data sets m into the a priori synthesis approach. in J Domingo-Ferrer & M Laurent (eds),
Privacy in Statistical Databases: International Conference, PSD 2022, Paris, France, September 21–23, 2022, Proceedings. Lecture Notes in Computer Science, Springer, Cham, pp. 205-219.
https://doi.org/10.1007/978-3-031-13945-1_15
APA
Jackson, J., Mitra, R.
, Francis, B., & Dove, I. (2022).
On integrating the number of synthetic data sets m into the a priori synthesis approach. In J. Domingo-Ferrer, & M. Laurent (Eds.),
Privacy in Statistical Databases: International Conference, PSD 2022, Paris, France, September 21–23, 2022, Proceedings (pp. 205-219). (Lecture Notes in Computer Science). Springer.
https://doi.org/10.1007/978-3-031-13945-1_15
Vancouver
Jackson J, Mitra R
, Francis B, Dove I.
On integrating the number of synthetic data sets m into the a priori synthesis approach. In Domingo-Ferrer J, Laurent M, editors, Privacy in Statistical Databases: International Conference, PSD 2022, Paris, France, September 21–23, 2022, Proceedings. Cham: Springer. 2022. p. 205-219. (Lecture Notes in Computer Science). doi: 10.1007/978-3-031-13945-1_15
Author
Jackson, James ; Mitra, Robin
; Francis, Brian et al. /
On integrating the number of synthetic data sets m into the a priori synthesis approach. Privacy in Statistical Databases: International Conference, PSD 2022, Paris, France, September 21–23, 2022, Proceedings. editor / Josep Domingo-Ferrer ; Maryline Laurent. Cham : Springer, 2022. pp. 205-219 (Lecture Notes in Computer Science).
Bibtex
@inproceedings{7802fbfdd28d4772811a22d0f319b76d,
title = "On integrating the number of synthetic data sets m into the a priori synthesis approach",
abstract = "The synthesis mechanism given in Jackson et al. (2022) uses saturated models, along with overdispersed count distributions, to generate synthetic categorical data. The mechanism is controlled by tuning parameters, which can be tuned according to a specific risk or utility metric. Thus expected properties of synthetic data sets can be determined analytically a priori, that is, before they are generated. While Jackson et al. (2022) considered the case of generating m = 1 data set, this paper considers generating m > 1 data sets. In effect, m becomes a tuning parameter and the role of m in relation to the risk-utility trade-off can be shown analytically. The paper introduces a pair of risk metrics, τ3(k,d) and τ4(k,d) that are suited to m > 1 data sets; and also considers the more general issue of how best to analyse categorical data sets: average the data sets pre-analysis or average results post-analysis. Finally, the methods are demonstrated empirically with the synthesis of a constructed data set which is used to represent the English School Census.",
author = "James Jackson and Robin Mitra and Brian Francis and Iain Dove",
year = "2022",
month = sep,
day = "14",
doi = "10.1007/978-3-031-13945-1_15",
language = "English",
isbn = "9783031139444",
series = "Lecture Notes in Computer Science",
publisher = "Springer",
pages = "205--219",
editor = "Domingo-Ferrer, {Josep } and Maryline Laurent",
booktitle = "Privacy in Statistical Databases",
}
RIS
TY - GEN
T1 - On integrating the number of synthetic data sets m into the a priori synthesis approach
AU - Jackson, James
AU - Mitra, Robin
AU - Francis, Brian
AU - Dove, Iain
PY - 2022/9/14
Y1 - 2022/9/14
N2 - The synthesis mechanism given in Jackson et al. (2022) uses saturated models, along with overdispersed count distributions, to generate synthetic categorical data. The mechanism is controlled by tuning parameters, which can be tuned according to a specific risk or utility metric. Thus expected properties of synthetic data sets can be determined analytically a priori, that is, before they are generated. While Jackson et al. (2022) considered the case of generating m = 1 data set, this paper considers generating m > 1 data sets. In effect, m becomes a tuning parameter and the role of m in relation to the risk-utility trade-off can be shown analytically. The paper introduces a pair of risk metrics, τ3(k,d) and τ4(k,d) that are suited to m > 1 data sets; and also considers the more general issue of how best to analyse categorical data sets: average the data sets pre-analysis or average results post-analysis. Finally, the methods are demonstrated empirically with the synthesis of a constructed data set which is used to represent the English School Census.
AB - The synthesis mechanism given in Jackson et al. (2022) uses saturated models, along with overdispersed count distributions, to generate synthetic categorical data. The mechanism is controlled by tuning parameters, which can be tuned according to a specific risk or utility metric. Thus expected properties of synthetic data sets can be determined analytically a priori, that is, before they are generated. While Jackson et al. (2022) considered the case of generating m = 1 data set, this paper considers generating m > 1 data sets. In effect, m becomes a tuning parameter and the role of m in relation to the risk-utility trade-off can be shown analytically. The paper introduces a pair of risk metrics, τ3(k,d) and τ4(k,d) that are suited to m > 1 data sets; and also considers the more general issue of how best to analyse categorical data sets: average the data sets pre-analysis or average results post-analysis. Finally, the methods are demonstrated empirically with the synthesis of a constructed data set which is used to represent the English School Census.
U2 - 10.1007/978-3-031-13945-1_15
DO - 10.1007/978-3-031-13945-1_15
M3 - Conference contribution/Paper
SN - 9783031139444
T3 - Lecture Notes in Computer Science
SP - 205
EP - 219
BT - Privacy in Statistical Databases
A2 - Domingo-Ferrer, Josep
A2 - Laurent, Maryline
PB - Springer
CY - Cham
ER -