Minimizing Disclosure Risk in HHS Open Data Initiatives. Wrap-Up


Jim Scanlon, ASPE

John Czajka, Mathematica



This has been the progression of the day: we began with policies, proceeded to discuss disclosure risk assessment, and then addressed agency practices to protect data and ideas for further research. Our discussion of risk assessment went beyond current procedures. We discussed broad issues of privacy and transparency. Those issues will be referred to other parts of HHS and to the FTC to address. The HHS recommendations to Congress for HIPAA medical privacy were not implemented, so we did the best we could with our regulatory authority.

The portfolio of disclosure prevention practices are where we thought they are, but we heard some interesting ideas for going forward for disclosure avoidance techniques. We also heard about the continuum of data release concept: for example, quasi-public use (a public use file with terms of use), and the CMS virtual RDC (although users have to go through ResDAC and pay for access to the data).


Asiala: Could property rights for government agency data be used as a means to hold users accountable?

El Emam: This approach worked at the Louisiana code fest. Privacy Analytics produced a file for CMS for Louisiana with terms of agreement.

Love: DUAs are used by the states, but enforcement has been uneven. A DUA is not only a means of restricting the use of the data; it can also be a tool to communicate and educate data users of the data’s importance, of ethics and proper handling of data, and of the agency’s values in protecting and securing the data. Washington State did not have a DUA. This was an example of why we need multiple layers of protection.

Brooklyn Lupari (SAMHSA): We have aggregated data, public use files, restricted use files, and a virtual RDC. We have confidence in our data protection technology—but the concern is about human behavior, and how users might avoid monitoring protocols. This can only be mitigated through training and education.

Jenny Schnaier (AHRQ): We communicate our rules to data users, and we can write other things into the DUA—including that they are personally responsible (not just their organizations). In fact, AHRQ will not accept requests from organizations—we insist on a person taking responsibility. The data we distribute is considered limited use.

Nancy Donovan (GAO): We are tracking data methodology issues. There are a variety of technologies for pooling data. There are also barriers to linkage across agencies—how to combine datasets and deal with unstructured data. Are there examples of data pooling?

Scanlon: The underlying statutes make it difficult to share identifiable information with another federal agency.

Craig Schneider (Mathematica): Two types of users have been discussed: researchers and commercial organizations. Clinical users were not discussed. These may start to do more analysis.

Ben Busby (NIH): The purpose of the data commons at NIH is to get large datasets such as genomes in and out for research. We are trying to make it easier for investigators to upload and download data and starting to put public use files in the cloud—but no PII.

Malin: NIH doesn’t want to host the data for everyone. There will need to be a host for all this data, and it may not even be a federal agency. Amazon? Google? Trust will be a big issue. BD2K (Big Data to Knowledge) will fund national centers for biomedical computing.

El Emam: Data brokers haven’t re-identified data. Some have actually participated in re-identification studies using their own data. Researchers push back against changes in data access and the ability to do work on their own computers.

Ann Waldo (Privacy Analytics): What about consumer-generated health data? This is unregulated by HIPAA. The ONC (Office of the National Coordinator for Health Information Technology) will need to consider the impact of this.

Scanlon and Czajka: Thank you for your participation today. This has been a very informative discussion.

View full report


"rpt_Disclosure.pdf" (pdf, 1.01Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®