- Centers for Disease Control and Prevention (CDC)
- 5/12/2020
- Primary: Goal 2. Data Standards and Linkages for Longitudinal Research
- Secondary: Goal 3: Technology Solutions to Advance Research
STATUS: Completed Project
BACKGROUND
This project builds upon previously funded OS-PCORTF efforts to expand data capacity for research studies. This project focuses specifically on patient health outcomes across the continuum of care through linking disparate data sources. The first part of the project will focus on privacy preserving record linkage (PPRL). In order to ensure the accuracy of linked data sets, linkage algorithms rely on the exchange and matching of personally-identifiable information (PII). While this heightens researchers’ capability to examine individual health outcomes, concerns remain regarding privacy and the exchange of identifiable information. In an effort to move away from reliance on PII, groups such as Datavant and the CDC’s Childhood Obesity Data Initiative (CODI) have been working to develop linkage techniques that use PPRL, which eliminates PII sharing among organizations. This project assesses and compares linkage results from PPRL against earlier algorithms containing PII to identify discrepancies and their impact on subsequent analysis of the linked data.
The second part of the project will create new linked data sets to support patient-centered outcomes research (PCOR). In 2017 and 2019, the NCHS supported the linkage of the National Hospital Care Survey (NHCS) with three different data sets: 1) the National Death Index (NDI); 2) data regarding participation in Department of Housing and Urban Development (HUD) housing programs; and 3) Medicare enrollment, prescription medication, and claims data from the Centers for Medicare & Medicaid Services (CMS). This allows researchers to analyze individual-level outcomes using a broad array of patient-specific information. The linked data also enhance public understanding of the interplay between health determinants and post hospitalization outcomes. This project seeks to create new data sets through the linkage of Medicaid claims data from the CMS Transformed Medicaid Statistical Information System (T-MSIS) and the 2014 and 2016 NHCS. This expansion of existing infrastructure diversifies and widens researchers’ investigative range of PCOR topics. These topics include, interventions for opioid use, evaluation of medication protocols, use of social programs as a health determinant, and health disparities among understudied demographic groups.
PROJECT PURPOSE & GOALS
This project aims to evaluate privacy preserving record linkage methodologies and broaden existing data resources that are individually matched across factors that may influence health outcomes.
The project objectives are to:
- Evaluate PPRL technique utilizing past PCORTF-funded NHCS-NDI linkages as a gold standard.
- Disseminate output showing the suitability of PPRL as a linkage technique and the creation of new data sets to conduct PCOR.
- Conduct patient-level record linkages of 2014 and 2016 NHCS hospital administrative claims and EHR data to CMS’ T-MSIS data from 2014-2017 (and 2018, if available).
- Develop research and user guidance materials to aid PCOR-led usage of new and existing NHCS linked data sets.
KEY IMPACTS
Enhancing analytical resources: New machine learning-based linkage algorithm
To enhance analytic resources available for linking data while maintaining individual privacy, the project developed an enhanced machine learning-based linkage algorithm (i.e., sequential coverage algorithm), which has improved the accuracy and efficiency of linking NHCS files to CMS data.
Providing more relevant, comprehensive data for PCOR: NHCS-CMS linked datasets
The project augmented the previously developed 2014 and 2016 NHCS-Medicare linked datasets with additional years of data to fill data gaps and achieve consistency across files, including 2014 and 2015 Medicare claims, prescription drug and assessment data, and 2017 Medicare Advantage enrollment data. It also developed a new 2016 NHCS-T-MSIS linked dataset to support studies on Medicaid insurance and hospital utilization. These additional years of data will support trend analyses and longitudinal studies of utilization and expenditures over time.
Improved data quality: Examining the accuracy of a PPRL method
The project team compared the results of a PPRL method to results from an established “gold standard” linkage method to examine the potential impact on data quality. The findings demonstrated the promise of PPRL to facilitate record linkage while maintaining privacy.
PUBLICATIONS
Linked Dataset: 2014 NHCS and 2014/2015 Medicare Data Files. The dataset provides additional linked data files containing 2014 NHCS inpatient and emergency department claims and EHR data linked with 2014 and 2015 Medicare claims, prescription drug, and assessment data.
Linked Dataset: 2016 NHCS and 2017 Medicare Advantage Data Files. The project team added 2017 Medicare Advantage encounter data to the set of linked 2016 NHCS-Medicare files to achieve consistency with data years for fee-for-service claims.
Linked Dataset: 2016 NHCS and 2015-2017 T-MSIS Data Files. This dataset includes 2016 NHCS data linked with 2015-2017 Medicaid and Children’s Health Insurance Program (CHIP) claims data. The dataset includes five Medicaid/CHIP files: a demographic and eligibility file, an inpatient hospital file, a long-term care file, a pharmacy file, and other services file.
Data Dictionaries for Linked NHCS–Medicaid Dataset. Six data dictionaries are available for the linked NHCS-CMS data and include information on match status and the five Medicaid/CHIP files.
The Linkage of the 2016 National Hospital Care Survey to 2015-2017 CMS T-MSIS Claims Data: Matching Methodology and Analytic Considerations. This report describes the linkage of data from the 2016 NHCS to 2015-2017 CMS T-MSIS claims data. The report includes a brief overview of the data sources, a description of the linkage methodology, and analytic considerations to assist researchers when using the files.
A Methodological Assessment of Privacy Preserving Record Linkage Using Survey and Administrative Data. This journal article, published in Statistical Journal of the IAOS, describes the methodology used and results of the PPRL evaluation. The project compared commercially available PPRL software to the results of a clear text matching approach to link survey and administrative records, using the OS-PCORTF funded NHCS-NDI linkage as the gold standard.
Project Final Report. The final report describes the project’s goals, activities, and key accomplishments as well as lessons learned and a summary of key resources and publications.