Federal policies covering the use and protection of personal data focus on data collected or obtained by the federal government, but legislation extends to data collected at lower levels of government and by non-governmental entities. This review covers the key legislation that has helped to shape federal policy on the use and protection of personal data; additional laws governing data use; illustrative examples of agency regulations and guidelines; the major documents defining federal open data policy; an overview of datasets released by the Department of Health and Human Services (HHS); and methods of disclosure limitation used by federal agencies.
1 This background paper was prepared by Kevin Collins, John L. Czajka, Bonnie Harvey, Melissa Medeiros, Craig Schneider, and Amang Sukasih.
-
A. Key Legislation
-
Several key pieces of federal legislation govern the types of personal information that government and other organizations, such as health providers and educational institutions, can disclose about individual citizens or consumers. Most privacy laws focus on an individual’s rights over the privacy of personal information—including the ability to access and correct information—and the circumstances under which an entity may be allowed to disclose information, with or without consent from the individual. This summary provides an overview of the acts that created the foundation for U.S. privacy law as it relates to data held by the federal government. These include the Privacy Act of 1974, the Computer Matching and Privacy Protection Act of 1988, the Health Insurance Portability and Accountability Act (HIPAA) of 1996, the Confidential Information Protection and Statistical Efficiency Act (CIPSEA) of 2002, and the Health Information Technology for Economic and Clinical Health Act (HITECH Act).
-
-
B. Additional Laws and Proposed Legislation
-
A number of additional laws govern the use and protection of data from a variety of specific sources. Several examples are presented below, followed by a summary of a recent federal report that proposes new legislation to improve the protection afforded to data provided by or collected from consumers.
-
-
C. Agency Regulations and Guidelines
-
Federal regulations governing individual agencies sometimes include specific provisions regarding the collection and use of personal data. Prominent examples include Title 13 (The Census Act), Title 26 (The Internal Revenue Code), and a section of Title 20 that applies to the National Center for Education Statistics (NCES) and is notable for how it assigns legal liability for disclosure. Some agencies, like the National Center for Health Statistics (NCHS), have documented their internal rules in manuals, two of which are discussed here. To provide agency-wide guidance, the Federal Committee on Statistical Methodology (FCSM) developed a “Checklist on Disclosure Potential of Proposed Data Releases,” which was intended primarily for public use data products. Recently, the Statistical and Science Policy Office in OMB issued a proposed statistical policy directive that reaffirms the importance of protecting the confidentiality of the information that statistical agencies collect from the public. All of these regulations and guidelines are discussed below.
-
-
D. Open Data Documents
-
Four documents issued by the Executive Office of the President over a six-month period in 2013 define the scope and provide guidance on implementation of the new open data policy. These four documents were:
- Increasing Access to the Results of Federally Funded Scientific Research (Office of Science and Technology Policy 2013)
- Making Open and Machine Readable the New Default for Government Information (White House 2013)
- Open Data Policy—Managing Information as an Asset (OMB 2013a)
- Supplemental Guidance on the Implementation of M-13-13 “Open Data Policy—Managing Information as an Asset” (OMB 2013b)
Summaries of the four documents are presented below.
-
-
E. Datasets Released by HHS
-
HHS and the Institute of Medicine launched the Health Data Initiative (also known as the Open Data Initiative) in 2010. The purpose of the Health Data Initiative is to encourage “innovators to utilize health data to develop applications to raise awareness of health and health system performance and spark community action to improve health.” 10
The HHS Open Data Initiative is part of a broader federal government movement to make datasets available to the public. President Obama issued an executive order on May 9, 2013 that directed OMB to issue the Open Data Policy throughout the federal government. The objectives of the executive order are to advance the management of government information as an asset throughout its life cycle, to promote interoperability and openness, and to make the data accessible to and usable for the public.
These data are made accessible in machine-readable form, and innovators are encouraged to use the data to create new products and services, and thereby to create jobs. In addition to health care, open data efforts have been implemented in the fields of energy, education, finance, public safety, and global development. The executive order was refined by OMB memorandum M-13-13 (discussed above), which gave federal agencies guidance on definitions, scope, policy requirements, and implementation guidelines.
HHS alone has made an enormous amount of data available. As of June 2014 there were 1,530 datasets accessible at www.healthdata.gov, up from 428 one year earlier. The Centers for Medicare and Medicaid Services offers the most data, with 625 datasets, 515 of which are related to Medicare. The HHS agency with the next largest number of datasets is the Centers for Disease Control and Prevention, with 118. One reason for the large growth in the offerings of healthdata.gov since 2013 is the addition of state-specific databases, which totaled 425, including 111 from New York.
10 U.S. Department of Health and Human Services, Healthdata.gov, Washington, DC. Available at [http://www.hhs.gov/open/initiatives/hdi/index.html].
-
-
F. Methods of Disclosure Limitation Used by Federal Agencies
-
Table D.1 summarizes the methods of disclosure limitation used by federal statistical agencies at the time of the CDAC 2005 update of Statistical Policy Working Paper 22. Mathematica surveyed representatives of 14 agencies to ask if the agency has made any changes to its procedures. Results are reported in the final column of the table.
Table D.1. Summary of Agency Practices for Protecting Public use Microdata as Reported in Statistical Policy Working Paper 22 (2005), with Updates
Agency Public use microdata and who reviews Restricted access allowed for researchers Statistical disclosure limitation methods for public use microdata Any update? Energy Information Administration (EIA) Yes – office review No EIA does not have a standard for statistical disclosure limitation techniques for microdata files. The only microdata files for confidential data released by EIA are for the Residential Energy Consumption Survey (RECS) and the Commercial Buildings Energy Consumption Survey (CBECS). In these files, various standard statistical disclosure limitation procedures are used to protect the confidentiality of the data for individual households and buildings. These procedures include: eliminating identifiers, limiting geographic detail, omitting or collapsing data items, top-coding, bottom-coding, interval- coding, rounding, substituting weighted average numbers (blurring), and introducing noise through a data adjustment method that randomly adjusts respondent level data within a controlled maximum percentage level around the actual published estimate. After applying the randomized adjustment method to the data, the mean values for broad population groups based on the adjusted data are the same as the mean values generated from the unadjusted data. No updates. EIA still applies the same methodologies for protecting the CBECS and RECS public use files. National Science Foundation (NSF) Yes – Meet or exceed Census public use products that are merged Yes When releasing public-use microdata files, individual identifiers are removed from all records and other high risk variables that contain distinguishing characteristics are modified to prevent identification of survey respondents and their responses. Top codes and bottom codes are employed for numeric fields to avoid showing extreme field values on a data record. Values beyond the top code or bottom code are replaced either by the average of the values in excess of the respective top code or bottom code or through the application of various imputation methodologies. No updates; 2005 description remains accurate. U.S. Census Bureau Yes -- Disclosure Review Board Yes “Microdata cannot show geography below a population of 100,000. For the most detailed microdata, that threshold is raised to 250,000 or higher.” “For small populations or rare characteristics noise may be added to identifying variables, data may be swapped, or an imputation applied to the characteristic. Census data, which lacks the component of protection provided by sampling, employs targeted swapping in addition to the combination of table design and thresholds described above.”
Hawala, Zayatz, and Rowland (2004): “To insure that any data tabulation requested by external users will not disclose respondents’ identities, the U.S. Census Bureau uses data recoding and data swapping (Zayatz 2003).”
Zayatz (2005): “There are several disclosure avoidance techniques that we are currently using for our microdata files including geographic thresholds, rounding, noise addition, categorical thresholds, topcoding, and data swapping.”
Subsampling (only a fraction of the full microdata file from the survey/census is released) was used for the Decennial long form prior to ACS and is now used for ACS releases.
Synthetic data use is limited to the production of partially synthetic estimates for certain, small, specialized sub- populations. These subpopulations comprise only a small subset of the microdata files released. Synthetic data, in its various forms, is not widely used to protect Census microdata files.
Bureau of Labor Statistics (BLS) BOC Collects Title 13 Yes BLS releases very few public use microdata files. Most of these microdata files contain data collected by the Census Bureau under an interagency agreement and Census's Title 13 authority. For these surveys (Current Population Survey, Consumer Expenditure Survey) the Census Bureau determines the statistical disclosure limitation procedures that are used. BLS releases public-use data files from three surveys in the family of the National Longitudinal surveys. Disclosure limitation methods used for the public use microdata files containing data from the National Longitudinal Survey of Youth, collected under contract by Ohio State University and Research Center at the University of Chicago, are similar to those used by the Census Bureau. No update, but 2005 description has been edited. National Center for Education Statistics (NCES) Yes -- Disclosure Review Board Yes All direct individually identifiable information (for example, school name, individual name, addresses) is stripped from the public use file. Continuous variables are top and bottom coded to protect against identification of outliers. After this has been done, a casual data intruder might identify an individual respondent by first identifying the sampled institution for the individual. To prevent identification of the sampled institution, all known publicly available lists of education institutions that contain institutions’ names and addresses are gathered. Each list is matched with the sample file using all common variables between the two files. If an institution can be identified to within 2 other institutions, using an appropriate distance measure, then that is a disclosure risk and must be resolved before releasing the data. If too many disclosure risks are obtained then a common variable(s) may be dropped from the public-use file, or the variable(s) may be coarsened. If there are only a few identified disclosure risks found, the appropriate action is to selectively perturb a set of the common variables until all disclosure risks are resolved. This analysis is repeated sequentially for each list file until it can be applied to each list file without identifying any disclosure risks.
Whenever institution head, teacher, student, or parent data are clustered, a subsampling of respondents is required. Data from respondents selected into this subsample are reviewed using an additional disclosure edit. The edit is either: (1) a blanking and imputing, or data swapping of a sample of sensitive items collected; or (2) a data swapping of the key identification variable of the respondent or institution. The amount of editing is set at a level sufficient to protect the confidentiality of the respondent, while not compromising the analytic usefulness of the data file.
The basic procedures are still the same. NCES has added additional measures as diagnostics to determine which of several trial data perturbations to select to meet the requirement to protect the confidentiality of the respondent, while not compromising the analytic usefulness of the data file. National Center for Health Statistics (NCHS) Yes – Con- fidentiality Officer and Disclosure Review Board Yes It is NCHS policy to make microdata files available to the scientific community so that additional analyses can be made for the country’s benefit. Such files follow guidance and principles contained in the NCHS Staff Manual on Confidentiality (September, 2004), Section 9 "Avoiding Inadvertent Disclosures Through Release of Microdata," and the NCHS Checklist for the Release of Micro Data Files. These guidelines require that detailed information that could be used to identify individuals (for example, date of birth) should not be included in microdata files. The identities of geographic places and characteristics of areas with fewer than 100,000 people are never to be identified, and it may be necessary to set this minimum at a higher number if research or other considerations so indicate. Information on the drawing of the sample that could identify data subjects should not be included. The techniques, methods, and guidance used to protect NCHS’s public use microdata have largely remained unchanged since 2004, although there are a few exceptions. The changes detailed below have been in response to changes in technology and proliferation of external data, and were made to reduce disclosure risk to individuals in NCHS data systems.
1. Vital record (birth, death, fetal death and linked birth/infant death) public use microdata files beginning with the 2005 data year contain individual-level vital event data at the national level only. The files for births, deaths, fetal deaths and linked birth/infant death generally include most other items from the vital record with the exception of exact dates.
2. Some NCHS surveys collect information on observable health conditions/limitations or rare conditions. This information is often excluded from public use microdata files because the information, in combination with the extensive information for other characteristics, is considered to pose too great a risk of respondent re-identification by knowledgeable insiders or from media coverage.
3. The level of detail for some variables has been reduced on public use microdata files. This includes geographic information for almost all files, but also includes items such as household relationships, race/ethnic categories and other observable characteristics that could increase risk of identification when combined with other indirect identifying information.
Compared to 2004, NCHS staff responsible for developing public use microdata files spend more time identifying and researching external files available via the Internet to assess whether external sources can be used to re-identify NCHS survey respondents. Advances in computer technology, the introduction of Big Data and Open Data initiatives pose new challenges for preparing public use microdata files that were not present 10 years ago.
National Center for Health Statistics (NCHS) continued Yes – Confidentiality Officer and Disclosure Review Board Yes Refer back to previous row. Although NCHS has reduced the level of detail available on public use microdata files since 2004, we have attempted to balance this by making non-public use microdata files more available through expansion of RDC sites, use of special agreements permitting access under controlled conditions (e.g., Designated Agent Agreements or DUAs), and development of new access tools. a. The NCHS RDC now offers researchers four access modes to access restricted use NCHS microdata including: (1) on-site at the NCHS RDC, (2) on-site at a Census RDC, (3) remote access, and (4) staff assisted research option. Additional information about each access mode can be found at the following location: http://www.cdc.gov/rdc/ B2AccessMod/ACs200.htm. b. NCHS is developing new tools for data access. For example, NCHS is developing a National Health Interview Survey Online Analytic Real-Time System (OARS) to help meet the need for state-level estimates. This tool will allow health experts, policymakers, journalists, and others to search and compare health statistics by county, region, and state nationwide for grant proposals, needs assessments, research, news reporting, and policymaking. Additional information on OARS can be found at: http://www.cdc.gov/ nchs/data/bsc/nhis_online_analytic_realtime_system.pdf NCHS remains committed to making data as widely available as possible while protecting the confidentiality of respondents. Approximately 95 percent of NCHS collected data are released in public use microdata files and most of the remaining data are available under controlled conditions that meet our legislative mandates to protect respondent identity. Agency for Healthcare Research and Quality (AHRQ) Yes – Disclosure Review Board Yes The disclosure limitation procedures used by AHRQ are similar to those of NCHS. No updates; AHRQ continues to use procedures similar to NCHS but without the NCHS-specific revisions detailed above. National Agricultural Statistics Service (NASS) No Yes NA No updates Economic Research Service (ERS) No Yes NA Noupdates Bureau of Economic Analysis (BEA) No Yes NA Noupdates Social Security Administration (SSA) Yes - 2 Disclosure Review Boards. One handles Title 13 data; the other does not. Yes When releasing public use microdata files, individual identifiers are removed from all records, and other distinguishing characteristics are modified to prevent identification of persons to whom a record pertains. Records are sequenced in random order to avoid revealing information due to the ordering of records on the file. Top codes and bottom codes are employed for numeric fields to avoid showing extreme field values on a data record. Values beyond the top code or bottom code are replaced by the average of the values in excess of the respective top code or bottom code. Top code and bottom code values are derived at the national level and the replacement values are derived and applied at the state level when appropriate. Values shown for some categorical fields are combined into broader groupings than those present on the internal file, and dollar amounts are rounded. Top code and bottom code values, replacement values, and related information are provided to users as part of the file documentation. Since 2010, the DRB has built a working relationship with the Office of Open Government in part to prevent the mosaic effect. Based on White House Open Government initiatives, SSA has enhanced their procedures for releasing data on the Agency website and onto Data.gov. The Data.gov National/Homeland Security and Privacy/Confidentiality Checklist and Guidance (referred to as the NHSP Checklist) is part of the guidance from the White House and is to be used by departments and agencies submitting datasets for publication on Data.gov. This Checklist augments the processes SSA is using to meet its existing statutory, regulatory or policy requirements for protecting national/homeland security and privacy/confidentiality interests. Since 2012, the DRB includes an external voting board member from the U.S. Census Bureau. This provides an avenue for the DRB to ensure that agency staff are informed of the latest disclosure avoidance techniques utilized and recommended by the Bureau's DRB. Internal Revenue Service (IRS) Yes - Legislatively Controlled No SOI produces one annual public- se microdata file, known as the SOI “tax model”, containing a sample of data based on the Form 1040 series of individual tax returns. The disclosure protection procedures applied to this file include: (1) subsampling certainty records at a 33 percent rate; (2) removing certain records having extreme values; (3) suppressing certain fields from all records and geographical fields from high income records; (4) top coding and modifying some fields; (5) blurring some fields of high income records by locally averaging across records; and (6) rounding amount fields to four significant digits. To help ensure that taxpayer privacy is protected in the SOI tax model file, SOI has periodically contracted with experts who employ “professional intruder” techniques to both verify that confidentiality is protected and to inform the techniques to be applied to future releases of the SOI tax model file. SOI reviews its statistical disclosure limitation procedures for its public use microdata file and introduces enhancements on an ongoing basis. For example, the maximum sampling rate was changed to 10 percent several years ago, and multivariate blurring replaced univariate bluring for key fields on high-income returns. SOI is currently redesigning its public use file. Bureau of Transportation Statistics (BTS) Yes – Disclosure Review Board No The BTS Confidentiality Procedures Manual documents the confidentiality procedures for the agency. For most microdata and tabular data products, BTS program managers are required to complete a checklist identifying potential disclosure risks and outline any steps taken to mitigate such risks. The BTS’s DRB reviews the data product and checklist and makes a final determination on disclosure risk. The DRB can recommend application of SDL methods prior to public dissemination. BTS uses various microdata SDL methods based on the disclosure review findings and the unique characteristics of the data files. Some SDL procedures used include data suppression and modification. Data modification includes recoding continuous variables into categorical variables, collapsing categories, top and bottom coding, introduction of noise, and data swapping. BTS program managers must also identify any external data that could be matched to BTS datasets and take steps to minimize the ability to match. No updates;2005 description remains accurate. Bureau of Justice Statistics (BJS) Yes - legislatively controlled agency review No The same requirements under Title 13 of the U.S.C. that cover the Census Bureau are followed by BJS for those data collected for BJS by the Census Bureau. Standards for microdata protection are incorporated in BJS enabling legislation. Individual identifiers are routinely stripped from all microdata files before they are released for public use. BJS has allowed access to restricted files since at least the year 2000, if not before. Direct identifiers are routinely removed from all microdata files prior to release. Indirect identifiers—for example, geographic identifiers, dates of unique events, or age— undergo disclosure avoidance measures commensurate with the level of release (public, restricted, or enclaved). Measures commonly used include categorization of continuous variables, top- or bottom-coding, rounding, addition of noise, and data swapping. Most restricted microdata files are available by application from the National Archive of Criminal Justice Data (NACJD). The Archive is in the process of implementing and expanding technology that allows remote access to restricted microdata files. With this technology, the user does not receive or download the microdata, but rather logs into and analyzes the data on a secure NACJD server. In 2011, BJS began making extremely sensitive microdata files available onsite at the University of Michigan in the Interuniversity Consortium for Political and Social Research (ICPSR) Data Enclave in Ann Arbor, MI (also by application).
-
-
References
-
Federal Trade Commission. “Data Brokers: A Call for Transparency and Accountability.” Washington, DC: Federal Trade Commission. Available at [http://www.ftc.gov/system/files/documents/reports/data-brokers-call-tran.... May 2014.
Gerald Gates, “How Uncertainty about Privacy and Confidentiality Is Hampering Efforts to More Effectively Use Administrative Records in Producing U.S. National Statistics,” Journal of Privacy and Confidentiality, vol. 3, no. 2, 2011, pp. 3 to 40.
Hawala, Sam, Laura Zayatz, and Sandra Rowland. “American FactFinder: Disclosure Limitation for the Advanced Query System.” Journal of Official Statistics, vol. 20, no. 1, March 2004, pp. 115-124.
Johnson, Barry W. “Presentation to the Council of Professional Associations on Federal Statistics.” Presented at the COPAFS Quarterly Meeting, Washington, DC, June 6, 2014.
Office of Management and Budget, Executive Office of the President. “Open Data Policy––Managing Information as an Asset.” Memorandum Number M-13-13. Washington, DC: OMB, May 9, 2013a. Available at [http://www.whitehouse.gov/system/files/ omb/memoranda/2013/m-13-13.pdf]. Accessed July 15, 2013.
Office of Management and Budget, Executive Office of the President. “Supplemental Guidance on the Implementation of M-13-13 ‘Open Data Policy––Managing Information as an Asset.’” Washington, DC: OMB, May 9, 2013b. Available at [http://www.whitehouse.gov/ system/files/omb/memoranda/2013/m-13-13.pdf]. Accessed July 15, 2013.
Office of Management and Budget, Executive Office of the President. “Statistical Policy Directive: Fundamental Responsibilities of Federal Statistical Agencies and Recognized Statistical Units.” Federal Register, vol. 79, no. 98, May 21, 2014, pp. 29308-29312.
Office of Science and Technology Policy, Executive Office of the President. “Increasing Access to the Results of Federally Funded Scientific Research.” Memorandum. Washington, DC: OSTP, February 22, 2013. Available at [http://www.whitehouse.gov/system/files/ microsites/ostp/ostp_public_access_memo_2013.pdf].
Ohm, Paul. “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization.” UCLA Law Review, vol. 57, 2010, pp. 1701-1777.
Solove, Daniel J., and Chris Jay Hoofnagle. “A Model Regime of Privacy Protection,” University of Illinois Law Review, vol. 2006, no. 2, 2006, pp. 357-404.
U.S. Department of Health and Human Services. “Summary of the HIPAA Privacy Rule.” Washington, DC: U.S. Department of Health and Human Services. Last revised May 2003. Available at [http://www.hhs.gov/ocr/privacy/hipaa/understanding/summary/ privacysummary.pdf]. Accessed June 3, 2014.
U.S. Department of Health and Human Services, Office for Civil Rights. “Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule.” Washington, DC: U.S. Department of Health and Human Services, November 26, 2012. Available at [http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/De-id.... Accessed June 3, 2014.
U.S. Department of Health and Human Services. “Summary of the HIPAA Security Rule.” Available at [http://www.hhs.gov/ocr/privacy/hipaa/understanding/srsummary.html]. Accessed June 3, 2014.
U.S. Department of Health, Education and Welfare, “Records, Computers, and the Rights of Citizens,” Report of the Secretary’s Advisory Committee on Automated Personal Data Systems. Washington, DC: U.S. Department of Health, Education and Welfare. Available at [http://www.justice.gov/opcl/docs/rec-com-rights.pdf]. July 1973.
White House. “Executive Order 13642—Making Open and Machine Readable the New Default for Government Information.” Washington, DC: May 9, 2013 (White House 2013). Available at [http://www.whitehouse.gov/the-press-office/2013/05/09/executive-order-ma.... Accessed December 26, 2013.
Zayatz, Laura. “Disclosure Limitation for Census 2000 Tabular Data.” Paper presented at the Joint European Commission for Europe and EUROSTAT Work Session on Statistical Data Confidentiality. Working Paper No. 15. Available at [http://www.unece.org/stats/ documents/2003/04/confidentiality/wp.15.e.pdf]. 2003.
Zayatz, Laura. “Disclosure Avoidance Practices and Research at the U.S. Census Bureau: An Update.” Research Report Series, Statistics #2005-06. Washington, DC: U.S. Census Bureau, August 31, 2005.
-
View full report

"rpt_Disclosure.pdf" (pdf, 1.01Mb)
Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®