Home > Research > Publications & Outputs > Make assurance double sure: combination of two ...
View graph of relations

Make assurance double sure: combination of two disclosure limitation methods and estimation of general regression models

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published
<mark>Journal publication date</mark>12/2008
<mark>Journal</mark>AStA Advances in Statistical Analysis
Issue number4
Volume92
Number of pages18
Pages (from-to)405-422
Publication StatusPublished
<mark>Original language</mark>English

Abstract

In order to guarantee confidentiality and privacy of firm-level data, statistical offices apply various disclosure limitation techniques. However, each anonymization technique has its protection limits such that the probability of disclosing the individual information for some observations is not minimized. To overcome this problem, we propose combining two separate disclosure limitation techniques, blanking and multiplication of independent noise, in order to protect the original dataset. The proposed approach yields a decrease in the probability of reidentifying/disclosing individual information and can be applied to linear and nonlinear regression models.

We show how to combine the blanking method with the multiplicative measurement error method and how to estimate the model by combining the multiplicative Simulation-Extrapolation (M-SIMEX) approach from Nolte (http://papers.ssrn.com/sol3/papers.cfm?abstract_id=969599, 2007) on the one side with the Inverse Probability Weighting (IPW) approach going back to Horwitz and Thompson (J. Am. Stat. Assoc. 47:663–685, 1952) and on the other side with matching methods, as an alternative to IPW, like the semiparametric M-Estimator proposed by Flossmann (http://papers.ssrn.com/sol3/papers.cfm?abstract_id=917326, 2007). Based on Monte Carlo simulations, we show that multiplicative measurement error combined with blanking as a masking procedure does not necessarily lead to a severe reduction in the estimation quality, provided that its effects on the data generating process are known.