Minimizing Disclosure Risk in HHS Open Data Initiatives. 1. Restricted Access


There are three basic mechanisms that federal agencies use to provide researchers with restricted access to data that are not released to the public. These include licensing, research data centers (RDCs), and secure remote access.

Under licensing arrangements, prospective users request restricted data files through a formal application process. To obtain such data, users must demonstrate that the data will be stored and used in a secure environment that meets the issuing agency’s standards. As part of the proposal the user will generally have to explain why the data are needed and how they will be used, and access may be limited to variables and records for which the user can demonstrate a critical need. To receive the data, the user typically has to sign a nondisclosure agreement.

Several federal agencies maintain RDCs, in which approved users can access agency data that are not released to the public. The data never leave the site, and output produced from data held in the RDC cannot be removed without a disclosure review, which can take different forms. For example, RDC staff may be authorized to review output, or the output may have to be screened by an agency disclosure review board. The types of data manipulations allowed to RDC users are limited. Linkages between databases may be prohibited or restricted. Users may not be allowed to attach portable storage devices to the computers or terminals that they use, and even printing of output may not be permitted (one RDC emails output to users after it has been reviewed). Obtaining access to an RDC requires submission of a proposal, and acceptable uses may be restricted to applications that carry potential benefits to the agency. Some agencies require its RDC users to undergo a background check and obtain employee-like status. The entire approval process may require several months.

A number of federal agencies allow users remote access to agency data that are not released on public use files. This can take a number of different forms. For example, the Census Bureau allows users to request tabulations from decennial census files that include more detail than the numerous tabulations that can be obtained from the bureau website (FCSM 2005). The requests are reviewed to ensure that the tabulations do not present a disclosure risk. The National Center for Health Statistics allows approved RDC users to submit programs remotely, although the software that can be used for this purpose is more limited than what is available in the RDC, and certain functions are not accessible. The advantage to the user lies in not having to travel to an RDC. This may be important when the research involves submission of a series of programs that take little time to run but require extensive review of the results before the next program can be prepared. Some RDCs charge a daily fee for in-person visits, not to mention long-distance travel costs and overnight accommodations, which can make a series of brief visits to the RDC very costly.

Additional information on modes of restricted access is provided in Appendix E. Open data initiatives imply public use data, for the most part, so we direct the rest of this chapter to the discussion of procedures used to prepare data for public release.

