Minimizing Disclosure Risk in HHS Open Data Initiatives. B. Concluding Observations


Because federal agencies have worked hard to develop, maintain, and update their procedures for protecting the confidentiality of their public use data, releasing multiple, well-protected files under the Open Data Initiative and related programs does not appear to produce a significant increase in disclosure risk. A greater threat comes from the personal information that individuals reveal about themselves and others through social media, as this information is identified. An intruder cannot know when a set of characteristics uniquely identifies that individual, however. In general, incomplete population coverage reduces the threat—just as it does with voter registration records. Files released with inadequate protection by states, local areas, and commercial organizations pose a threat as well if large numbers of records can be re-identified. The few examples discussed earlier in this report indicate that such public files are not common, however, and their rarity and generally small size limit the threat they present.

Skilled hackers present more of a concern because of their potential ability to break into nonpublic databases and obtain access to data that has not been protected by the removal of identifiers and the application of more sophisticated disclosure limitation techniques. Whether they would also have interest in uncovering identities in public use files is not clear. In fact, the highly publicized thefts of credit card numbers and other personal identifiers suggest that the threat from hackers breaking into internal, fully-identified databases may be greater than the risk of their re-identifying records in public use files protected with the most effective methods.

There is little question that the threat from these sources is growing. But at the same time, federal agencies have demonstrated that they remain vigilant and forward-looking in their evaluation and application of disclosure limitation techniques to the data that they release to the public. The track record for federal public use files is unblemished, but agencies have not rested on these accomplishments. Protective measures that have worked well in the past can become less effective over time. To guard against this possibility, many agencies regularly assess whether their procedures have become vulnerable to new data sources, new software, or expanded computational capacity. Furthermore, being well aware that once a dataset is released to the public it cannot be recalled, federal agencies have devoted resources to anticipating future threats. Such active engagement promises continued security for the data that federal agencies release.

View full report


"rpt_Disclosure.pdf" (pdf, 1.01Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®