Minimizing Disclosure Risk in HHS Open Data Initiatives. VI. Synthesis and Conclusions


The Open Data Initiative launched by the Executive Office of the President and OMB has encouraged the release of increasing numbers of datasets containing individual records (microdata) collected or sponsored by federal agencies from survey respondents, doctor and hospital visits, and medical claims. At the same time, federal agencies that release data collected from individuals and establishments have an obligation under the law to protect the confidentiality of those supplying the data as well as any information provided that could disclose the identity of individuals. The challenge faced by HHS and other federal agencies is to achieve an appropriate balance between providing the public with useful datasets and protecting the confidentiality of the individuals and establishments whose information is contained in the data.

Information in an individual dataset, in isolation, may not pose a risk of identifying an individual, but when combined with other available information, may pose such a risk. As stated by OMB in M-13-13, before disclosing potential identifiable information, “agencies must consider other publicly available data—in any medium and from any source—to determine whether some combination of existing data and the data intended to be publicly released could allow for the identification of an individual or pose another security concern” (OMB 2013a). The concern is that the datasets being released in large numbers—by mid-2014 more than 1,000 datasets had been released by HHS alone (HHS 2014)—may provide the pieces of information that, in combination with other publicly available data may disclose information that the federal government is required to maintain as confidential.

ASPE established this project in order to better understand how federal agencies are meeting the challenge of releasing more and more data while simultaneously maintaining the confidentiality of those who provided the data. The goals of this project were: (1) to obtain a balanced, scientifically sound assessment of the mosaic effect, (2) to identify any unique increased risk associated with the mosaic effect, and (3) to identify data release policies and best practices that can prevent or reduce the risk of disclosure through the mosaic effect.

In assessing whether there is an increased risk that the Open Data Initiative may create or incur through the mosaic effect, we and our expert panelists reviewed what is known about the sources of disclosure risk and the effectiveness of various ways to control such risk. From the discussion at the TEP meeting and the materials we reviewed in producing the two background papers, we have prepared a synthesis, which is presented here. We close this final chapter with some concluding observations that will assist federal agencies in going forward as they comply with open data policies while maintaining the confidentiality of the data they release.

View full report


"rpt_Disclosure.pdf" (pdf, 1.01Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®