Minimizing Disclosure Risk in HHS Open Data Initiatives. C. Recent Advances in Protecting Microdata


Much of the recent research on protecting microdata has focused on how the usefulness of the data is affected when methods of statistical disclosure limitation are applied. This topic is addressed in the next chapter. Research on ways to improve the protection afforded to public use microdata has addressed ways to enhance existing approaches rather than the development of entirely new approaches.

Singh (2009) proposes an enhanced version of MASSC that generalizes the risk measures used in altering the data to encompass cases with “partial risk,” defined as having risk scores between 0 and 1. All records with nonzero risk are subject to treatment (that is, alteration of data values), but only a random subset is actually treated. Both disclosure risk and information loss are assessed in developing the final dataset.

Machanavajjhala et al. (2005) show limitations of k-anonymity in two situations: (1) one in which the k individuals are homogeneous with respect to particular characteristics, resulting in attribute disclosure; and (2) one in which the intruder possesses background knowledge that makes it possible to differentiate between the target individual and the k-1 other individuals. To overcome these limitations, the authors propose the concept of l-diversity, which requires that the values of sensitive attributes be well-represented in each group. Further work will focus on extending the concept of l-diversity to multiple sensitive attributes and to continuous sensitive attributes.

Efforts to improve the quality of synthetic data have received attention as well. Zayatz (2008) notes that this is one of three areas of current research on disclosure avoidance at the Census Bureau (the other two being the use of noise addition for tabular magnitude data and the development of a system for remote microdata analysis). The Census Bureau uses the synthetic method to produce two databases that incorporate data from administrative records and is also applying synthetic methods to produce group quarters microdata from the American Community Survey.

View full report


"rpt_Disclosure.pdf" (pdf, 1.01Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®