- National Institutes of Health (NIH)/National Cancer Institute (NCI)
- National Institutes of Health (NIH)/National Center for Advancing Translational Sciences (NCATS)
- 11/01/2023
- Linking Clinical and Other Data for Research
- Use of Clinical Data for Resesarch
- Use of Publicly-Funded Data Systems for Research
STATUS: Active Project
BACKGROUND
Cancer is a significant cause of morbidity and the second leading cause of mortality in the U.S. Because relevant data are dispersed across disparate databases, it is difficult to generate and make available evidence on the comparative effectiveness of cancer treatments and treatment variations, and a range of patient outcomes over time. Linking data from complementary sources to create more robust, detailed, high-quality datasets would therefore enable researchers to more effectively and efficiently conduct the studies needed to capture the full trajectory of cancer patients’ care, characterize the variations in cancer treatments, and ascertain the impact of treatment variations on patient outcomes.
In response to this need, this project will link data from the National Cancer Institute’s (NCI) Surveillance, Epidemiology, and End Results (SEER) program with electronic health record (EHR) data maintained by the National Center for Advancing Translational Sciences’ (NCATS) National Clinical Cohort Collaborative (N3C). The SEER program is a comprehensive population-based cancer reporting system that collects and publishes data on nearly 50% of the U.S. population. Although the SEER cancer registries have detailed data on a cancer patient’s diagnosis, stage of cancer, and initial treatment, they lack data on subsequent treatments and on patient outcomes other than mortality. The NCATS’ N3C platform offers one of the largest collections of secure and deidentified clinical data for COVID-19 research and this project will expand its use for patient-centered outcomes research (PCOR) on cancer in COVID and non-COVID individuals. The EHRs data on N3C contain additional patient-level data (e.g., demographics, medical history, clinical notes, medications, and test results) not available in the SEER registry. By 1/conducting these data linkage (EHRs data with SEER registry data) and (2) assessing the capability of the resulting linked datasets, this project will significantly enhance the data infrastructure for PCOR studies on cancer.
PROJECT PURPOSE & GOALS
Purpose
This project will build and strengthen data infrastructure for patient-centered outcomes research in cancer by:
- Linking SEER cancer registry and EHR data using privacy-preserving record linkage methods (PPRL)
- Creating cancer treatment cohorts and extracting data on patient outcomes from the linked datasets (the cohorts will include one common cancer and one rare cancer)
- Assessing the capability of the linked datasets to address PCOR questions
- Establishing a data use request process to make the linked datasets available to, and facilitate their use by, researchers
Developing and providing access to a high-quality, longitudinal cancer database will improve (1) the efficiency with which PCOR studies can be designed and conducted, given the availability of accessible, standardized, and linked data and (2) the robustness of the evidence that is generated, by leveraging a more comprehensive set of data and analytic resources.