The NCHS disclosure manual focuses on the use of the agency’s Research Data Center (RDC), which provides restricted access to data elements not included on public use files and affords users the opportunity to work with datasets with linked survey and administrative records.
Part 1: Confidentiality and the RDC. There are several RDC procedures to prevent disclosure. Restricted data cannot leave the secure access modes, and output, programs, and files cannot be saved to transportable electronic media. A research proposal is required, and the Review Committee will carefully examine the variables requested, the plan of analysis, and the desired output. The RDC provides confidentiality training and requires researchers to complete confidentiality paperwork. Each mode of access has specific policies, procedures, and rules designed to prevent disclosure. Analytic datasets will be created for researchers, and all output must be reviewed by the remote access system or an RDC Analyst before it can be released to the researcher.
There are two laws that govern the NCHS RDC: Section 308(d) of the Public Health Service Act and CIPSEA. The Public Health Service Act asserts the importance of protecting confidentiality and that the only people who can access confidential data must become Designated Agents, while CIPSEA stipulates the penalties for violating confidentiality as up to five years in prison and/or a $250,000 fine.
Part 2: The RDC Research Process. Researchers are required to follow the proposal process instructions and use the proposal format provided on the RDC website. A list of variables to be used in the analysis must be provided, including a description of how they will be used. If restricted merge variables can be removed, coarsened, or substituted with randomized versions, this must be stated in the proposal. Although analysis plans may change, the RDC Analyst must be made aware of these changes throughout the process.
Part 3: Approved Proposals: Next Steps. Approval of a proposal does not explicitly or implicitly guarantee that all output generated by the analysis will be released. Once a proposal is approved, the principal investigator and all research team members who come in contact with the data must take the confidentiality orientation and complete the confidentiality forms. The forms are specific to the proposal, so they need to be completed each time a different proposal is approved by the RDC. The analytic dataset will be created by NCHS staff, who will follow certain policies to protect geographic, temporal, and perturbed and masked information. Researchers are responsible for providing the RDC with an extract from the NCHS public dataset as well as any non-NCHS data. The extract from the public dataset may include only variables that were specified in the proposal; original NCHS variables must retain the same name in the public dataset; and any derived variables should be clearly defined. The data files along with a list of variables must be emailed to the RDC Analyst, and any other questions should be discussed with the RDC Analyst as well.
Part 4: Working with Restricted Data. When working at an NCHS RDC, the researcher must abide by a number of rules developed to decrease the likelihood of a disclosure. Only one research project may be worked on at a time, and no individual-level data can leave the RDC facilities. No communication devices are allowed, and any items that may enable the identification of individuals and/or establishments are prohibited. Researchers cannot introduce new data using their computer code, and they are not allowed to put any content in code that would facilitate re-identification of a subject/establishment. Output must be submitted in a human-readable plain text file. While working at a Census Bureau RDC, all of the NCHS rules and restrictions apply. When working through the Remote Access System, only statistical code that is related to the analysis plan outlined in the research proposal may be submitted. Remote access rights are granted to only one person, and any output results that pose a disclosure risk will be suppressed.
Part 5: Disclosure Review Policies and Procedures. There are some general output policies that exist to protect the confidentiality of NCHS study participants. Datasets, including output in the form of datasets, will not be released, and no output will leave the RDC facilities without first being reviewed by an RDC Analyst or the Remote Access System. Before submitting output for review, it must be in a form that can be released by the RDC, and any individual-level data or extreme values representing an individual must be removed. Furthermore, all cells with a frequency less than 5 should be asterisked. Approved output is returned via email and must match the research questions/output suggested in the proposal.
Part 6: Publishing Research. When publishing, all additional requirements specified in the approval email must be adhered to. Information that could identify individuals, establishments, or geographic areas must not be revealed, as well as information about specific dates from external sources of data that have been merged to NCHS data based on temporal or geographic components. Citations for all publications, presentations, and reports that refer to research conducted using the RDC must be emailed to firstname.lastname@example.org and the RDC Analyst as soon as possible. In the publication, the methods section should specify which restricted variables were accessed through the RDC and why they were essential to the research questions. Finally, the following disclaimer must be added to the conclusion of the publication: “The findings and conclusions in this paper are those of the author(s) and do not necessarily represent the views of the Research Data Center, the National Center for Health Statistics, or the Centers for Disease Control and Prevention.”