- Centers for Disease Control and Prevention (CDC)
- 6/1/2019
- Linking of Clinical and Other Data for Research
STATUS: Completed Project
BACKGROUND
The National Hospital Care Survey (NHCS), conducted by the National Center for Health Statistics (NCHS), is designed to provide accurate and reliable health care statistics describing national patterns of health care delivery in hospital-based settings, including the prevalence of conditions, the health status of patients, and health services utilization. The NHCS collects patient-level identifiers, which enables the linkage of patient episodes of care within hospital inpatient and emergency department (ED) settings to other administrative data sources, providing a more complete picture of patient care. Previously funded OS-PCORTF projects have linked NHCS data to death certificate information collected by the National Death Index (NDI), creating a new unique data resource to support the study of post-hospitalization mortality outcomes in more than 3.2 million patients.
This project expanded on previously funded OS-PCORTF projects that increased the capacity of the NHCS to support a wide range of OS-PCORTF research objectives. This project linked the 2016 NHCS with Medicare enrollment, claims, encounters, and assessment data collected from the Centers for Medicare & Medicaid Services (CMS) and federal housing assistance program data collected from the U.S. Department of Housing and Urban Development (HUD). The files include a unique patient identifier, which makes it possible to link together information on mortality, health care service utilization, prescription drug use, facility-based patient health assessments, and receipt of federal housing assistance with a given patient’s hospital Uniform Bill (UB)-04 administrative claims or electronic health records (EHRs). The linkage of the NHCS to CMS Medicare and HUD data sources expands data capacity to support research studies focused on a wide range of patient health outcomes including initiatives targeting opioid use and mental health care services, efficacy of treatment protocols and drugs, medical interventions and drugs, health outcomes associated with different types of post-acute care services, and disparities in efficacy disaggregated by critical and previously unexamined subpopulations. The linked data sources allow researchers to examine the role of federal social support programs in health outcomes and treatment efficacy for persons with stable housing, with the ability to focus on specific subpopulations, including persons with substance use disorders.
PROJECT PURPOSE & GOALS
The project focused on the following objectives:
- Conduct a patient-level record linkage of the 2016 NHCS hospital administrative claims and EHR data to 2016/2017 CMS Medicare enrollment, claims, encounters, and assessment data.
- Conduct a patient-level record linkage of the 2016 NHCS hospital administrative claims and EHR data to the 2016/2017 HUD administrative records on federal housing assistance program participation.
- Refine probabilistic matching algorithms and disseminate a detailed statistical methodology report to support high-quality future data linkage activities within and beyond the patient-centered outcomes research (PCOR) community.
- Create research files and user guidance documents to support PCOR researchers in using the new NHCS linked data resources. The linked data sets will be available through the NCHS and Federal Statistical Research Data Center (RDC) Network and the documentation will be made available via the NCHS website.
- Disseminate tools and lessons learned to stimulate the application of these methods to a wider array of use cases by PCOR researchers.
PROJECT ACHIEVEMENTS AND HIGHLIGHTS
- The project produced an enhanced data linkage methodology utilizing machine learning techniques for record linkage, which resulted in improved linkage accuracy.
- The project produced data files, including linked NHCS data to CMS Medicare and HUD data covering several years, which are available to researchers as restricted-use files through the NCHS and FSRDC Network.
- The project has produced several manuscripts as well as presentations that cover a range of topics, including describing the novel linkage methodology and lessons learned, as well as original research based on the linked data files on topics such as opioid-involved emergency department visits and emergency department visits for respiratory illness.
PUBLICATIONS, PRESENTATIONS, AND OTHER PUBLICALLY AVAILABLE RESOURCES
Resources:
- A final report, which summarizes major project accomplishments, lessons learned, and future considerations is available here: https://aspe.hhs.gov/sites/default/files/documents/34696d3d740fcecbb2b950e21e602994/aspe-final-report-fy19-pcortf.pdf
- Information about restricted-use data files for linked 2016 NHCS data to 2016-2017 CMS Medicare enrollment, claims/encounter, and patient assessment data, with accompanying data dictionaries, are available here: https://www.cdc.gov/nchs/data-linkage/CMS-Medicare-Restricted.htm.
- A report that summarizes the data sources, describes the novel methods used to link NHCS and CMS Medicare datasets, and analytic considerations to assist researchers in using the files is available here: https://www.cdc.gov/nchs/data/datalinkage/2016-nhcs-cms-linkage-methodology.pdf.
- Information about restricted-use data files for linked 2014 NHCS data to 2013-2015 HUD data and 2016 NHCS data to 2015-2017 HUD data, with accompanying data dictionaries are available here: https://www.cdc.gov/nchs/data-linkage/nhcs-hud.htm.
- A report that summarizes the data sources, describes the methods used to link NHCS and HUD datasets, and analytic considerations to assist researchers in using the files is available here: https://www.cdc.gov/nchs/data/datalinkage/NHCS-HUD-Linkage-Methods-and-Analytic-Considerations.pdf.
- The algorithms used to analyze the clinical notes for opioid involvement and substance use disorders and mental health issues as well as ICD-10-CM codes related to opioid involvement, substance use disorder, and mental health issues are available on GitHub.
- The project produced two National Health Statistics reports:
- “Opioid-involved Emergency Department Visits in the National Hospital Care Survey and the National Hospital Ambulatory Medical Care Survey,” available here: https://www.cdc.gov/nchs/data/nhsr/nhsr149-508.pdf.
- “Respiratory Illness Emergency Department Visits in the National Hospital Care Survey and the National Hospital Ambulatory Medical Care Survey,” available here: https://www.cdc.gov/nchs/data/nhsr/nhsr151-508.pdf.
Publications:
- The project produced the manuscript, “Using supervised machine learning to identify efficient blocking schemes for record linkage,” published in Statistical Journal of the IAOS, available here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8371678/pdf/nihms-1718902.pdf.
- The project produced the manuscript, “Using Synthetic Data to Replicate Linkage Derived Elements: A Case Study”, published in Health Services and Outcomes Research Methodology, available here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8563018/pdf/nihms-1670775.pdf.
Presentations:
- The project had three presentations at the International Conference on Health Policy Statistics in January 2020:
- “Mortality for Women within One Year after Delivery in the National Hospital Care Survey, 2016,” the abstract is available here: https://ww2.amstat.org/meetings/ichps/2020/onlineprogram/AbstractDetails.cfm?AbstractID=306772.
- “Using Synthetic Data to Replace Linkage Derived Elements, a Case Study,” the abstract is available here: https://ww2.amstat.org/meetings/ichps/2020/onlineprogram/AbstractDetails.cfm?AbstractID=306675.
- “Leveraging Linked Data for Evidence Based Policymaking,” the abstract is available here: https://ww2.amstat.org/meetings/ichps/2020/onlineprogram/AbstractDetails.cfm?AbstractID=306597.
- The project team presented “Using Linked Hospital Care and Mortality Data to Enhance Identification of Opioid-Involved Health Outcomes” at the Rx Drug Abuse & Heroin Summit in April 2020.
- The project team presented “ Assessing National Hospital Care Survey and National Ambulatory Medical Care Survey Data: A Comparison of Opioid and Respiratory Disease Encounters” at the American Statistical Association’s 2020 Joint Statistical Meetings in August 2020, the abstract is available here: https://ww2.amstat.org/meetings/jsm/2020/onlineprogram/AbstractDetails.cfm?abstractid=312915.
- The project team presented “Adjusting Records Linkage Match Weights to Partial Levels of String Agreement” at the American Statistical Association’s 2020 Joint Statistical Meetings in August 2020, the abstract is available here: https://ww2.amstat.org/meetings/jsm/2020/onlineprogram/AbstractDetails.cfm?abstractid=312203.
- An October 2020 webinar provided information on the NCHS Data Linkage Program including linked NCHS data sources created by the project. The webinar is available here: https://www.cdc.gov/nchs/data-linkage/datalinkage-webinar.htm.
- The project team presented “Demonstration of the National Hospital Care Survey: Inpatient and Emergency Department Encounters for Congestive Heart Failure, 2016” at the International Conference on Establishment Statistics in June 2021, the abstract is available here: https://ww2.amstat.org/meetings/ices/2021/onlineprogram/AbstractDetails.cfm?AbstractID=308142.
RELATED PROJECTS
Below is a list of ASPE-funded PCORTF projects that are related to this project
Building infrastructure and evidence for COVID-19 related research, using integrated data from National Center for Health Statistics (NCHS) Data Linkage Program - Currently, most of the linked datasets (including those previously funded by the Patient-Centered Outcomes Research Trust Fund (PCOR-TF)) are available as restricted-use files that must be accessed through the NCHS and RDC, which creates barriers and reduces the utility of linked data. To mitigate this barrier, this project will develop publicly available synthetic linked data products that protect participant privacy while integrating social determinants, health-related, and administrative data. This project will also produce a public facing dashboard that utilizes the linked data, for a more wide-ranging community of users. Both products will be available on the NCHS website.
Enhancing Data Resources for Researching Patterns of Mortality in Patient Centered Outcomes Research – Through collaboration between the CDC, CMS, and Food and Drug Administration, the overall goal of this project was to increase the availability of information on the cause of death by linking NDI data to other sources. Enabling linkages will allow researchers to develop national estimates of cause-specific death rates following ED visits and/or hospital stays for specific conditions. The project produced linkages of patient EHRs and national mortality data and linked the NDI’s death and cause of death data with the Master Beneficiary Summary File and the Medicaid Enrollee Supplemental File. The project team also created new methods to optimize data linkages when using large national data files.
Enhancing Identification of Opioid-Involved Health Outcomes Using Linked Hospital Care and Mortality Data – National-level statistics on opioid-related hospitalizations are often incomplete. EHR data contain clinical notes and laboratory results, which allow a wider perspective on hospitalization. This project aimed to improve surveillance and expand researchers’ access to data on hospital care patterns and risk factors associated with opioid overdose deaths. To accomplish this, the project merged the NHCS, NDI, and Drug-Involved Mortality data. The linked data support research examining characteristics of individuals who have opioid-related events, patterns of hospital use in months before death, and comparison of patients and services.