Minimizing Disclosure Risk in HHS Open Data Initiatives. III. Protecting Data Against Disclosure


There are two general approaches that are used to release microdata in a way that protects the data from disclosure. One is by restricting access to the data, and the other is by restricting the data that are released for public access (National Research Council 2005). The latter approach encompasses a wide range of techniques that include suppressing variables and changing their values. Legal restrictions may be relevant to either approach, although as we note below, federal regulations place much more responsibility upon the data producer than the user.

A common way to view disclosure risk is to express the probability of disclosure as the product of two terms: (1) the probability of a successful re-identification conditional on someone trying to re-identify a record and (2) the probability that someone will try to re-identify a record (Marsh et al. 1991). A data producer can reduce the risk of disclosure by reducing either of these probabilities. For example, charging a high fee for a public use file reduces the probability that a potential intruder will even acquire the file. Sampling reduces the certainty that someone of interest is included on the file, which will also discourage potential intruders. Altering the data values in various ways reduces the likelihood of a re-identification and in so doing may also discourage attempts at re-identification. In addition, altering the data reduces the potential value of the information gained by re-identification, which may further reduce the likelihood that a would-be intruder will attempt a re-identification.

In this chapter we review strategies for restricting access and restricting the data. We also examine the legal environment and discuss approaches to assessing disclosure risk.

