We have over 12,000 students, from over 100 countries, within one of the safest campuses in the UK


93% of Lancaster students go into work or further study within six months of graduating

Home > Research > Publications & Outputs > Make assurance double sure: combination of two ...
View graph of relations

« Back

Make assurance double sure: combination of two disclosure limitation methods and estimation of general regression models

Research output: Contribution to journalJournal article


Journal publication date12/2008
JournalAStA Advances in Statistical Analysis
Number of pages18
Original languageEnglish


In order to guarantee confidentiality and privacy of firm-level data, statistical offices apply various disclosure limitation techniques. However, each anonymization technique has its protection limits such that the probability of disclosing the individual information for some observations is not minimized. To overcome this problem, we propose combining two separate disclosure limitation techniques, blanking and multiplication of independent noise, in order to protect the original dataset. The proposed approach yields a decrease in the probability of reidentifying/disclosing individual information and can be applied to linear and nonlinear regression models.

We show how to combine the blanking method with the multiplicative measurement error method and how to estimate the model by combining the multiplicative Simulation-Extrapolation (M-SIMEX) approach from Nolte (http://papers.ssrn.com/sol3/papers.cfm?abstract_id=969599, 2007) on the one side with the Inverse Probability Weighting (IPW) approach going back to Horwitz and Thompson (J. Am. Stat. Assoc. 47:663–685, 1952) and on the other side with matching methods, as an alternative to IPW, like the semiparametric M-Estimator proposed by Flossmann (http://papers.ssrn.com/sol3/papers.cfm?abstract_id=917326, 2007). Based on Monte Carlo simulations, we show that multiplicative measurement error combined with blanking as a masking procedure does not necessarily lead to a severe reduction in the estimation quality, provided that its effects on the data generating process are known.