Minimizing Disclosure Risk in HHS Open Data Initiatives. I. Introduction and Background


On May 9, 2013, President Obama issued the executive order, “Making Open and Machine Readable the New Default for Government Information,” in which he directed the Office of Management and Budget (OMB) to issue an Open Data Policy throughout the federal government. The objectives of this executive order were to advance the management of government information as an asset throughout its life cycle; to promote interoperability and openness; and, whenever possible and legally permissible, to ensure that data are released to the public in ways that make the data easy to find, accessible, and usable. Federal agencies have a long history of releasing data to the public, and they also have a legal obligation to protect the confidentiality of the individuals and organizations from which the data were collected. Federal agencies have successfully balanced these two objectives for decades. With the new emphasis on expanding public access to federal data, coupled with the increasing availability of data from other sources, federal agencies are continuing to ensure that the combination of data already available and the data they are preparing to release does not enable the identification of individuals or other entities through what has been termed the “mosaic effect.” 1

To gain more insight into the mosaic effect and its implications for the continued release of data to the public while minimizing the risk of disclosing personal information, the Office of the Assistant Secretary for Planning and Evaluation (ASPE) in the U.S. Department of Health and Human Services (HHS) contracted with Mathematica Policy Research to convene a technical expert panel (TEP), prepare background materials, and summarize what was learned from the panel discussion and the background research in a final report. 2 The goals of the project were (1) a balanced and scientifically sound assessment of the mosaic effect, (2) identification of any unique increased risk associated with the mosaic effect, and (3) identification of data release policies and best practices that can prevent or reduce disclosure due to the mosaic effect.

1 The concept of a mosaic effect is derived from the mosaic theory of intelligence gathering, in which disparate pieces of information—though individually of limited utility—become significant when combined with other types of information (Pozen 2005).

2 More specifically, the project’s components are: (1) A pair of background papers, one reviewing federal policies and procedures regarding the use and protection of personal data and the other an environmental scan of literature relevant to releasing federal microdata in light of the risks presented by the mosaic effect; (2) a TEP tasked with addressing the mosaic effect through a discussion of best practices in protecting confidentiality in open data initiatives; and (3) this report, which synthesizes the findings from the background papers and the proceedings of the TEP meeting. The TEP meeting was held on June 27, 2014. The meeting agenda is reproduced in Appendix A, and a list of attendees is included in Appendix B. Minutes from the TEP meeting are presented in Appendix C. The two background papers are included in Appendices D and E.

