Skip to main content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Training Data for Machine Learning to Enhance Patient-Centered Outcomes Research (PCOR) Data Infrastructure

Creating a Foundation to Advance the use of Artificial Intelligence for Patient Centered Outcomes Research and Clinical Practice
Agency
  • Office of the National Coordinator for Health Information Technology (ONC)
  • National Institutes of Health National Library of Medicine (NLM)
Start Date
  • 8/1/2019
Functionality
  • Use of Clinical Data for Research

 

STATUS: Active Project

BACKGROUND

Artificial intelligence (AI) and associated innovative technologies like machine learning have the power to consume large amounts of data in varied, complex formats to more quickly identify effective treatments potentially accelerating clinical innovation by speeding up the research lifecycle and the application of evidence in clinical settings. Industry experts have acknowledged that large amounts of high-quality training data are a critical part of the foundation that will support researchers’ use of machine learning to accelerate the discovery of novel disease-outcome correlations and associations, and inform the design of prevention and treatment studies.

High-quality training data sets that are well-labeled and structured, use common data models and common data elements annotated by domain experts, and combine previously unconnected data resources that can be used to train algorithms to elucidate knowledge and extract relevant data points for research. This project will curate high-quality training data sets on two use cases: (1) kidney disease which ONC will lead together with the National Institutes of Health and (2) drug resistance in patients infected with tuberculosis which will be implemented by NLM with the National Institute of Allergy and Infectious Diseases (NIAID). The lead agencies will use these training data sets to develop, train, and improve algorithm performance. The project will also develop and disseminate papers and a final report that discuss the current strengths and limitations of AI for patient-centered outcomes research (PCOR), industry, and the Department of Health and Human Services (HHS). It will also include a forward-looking section that will provide an initial high-level blueprint identifying the potential for HHS to use AI in discovery, safety surveillance, and to address key issues facing the people served by its programs (e.g., Medicare, Medicaid).

PROJECT PURPOSE & GOALS

This project will enhance the capacity of PCOR researchers to use machine learning by developing and disseminating a number of resources that will present not only training data and methods but also lessons learned. Evidence generated from this application of AI will support multiple federal and HHS investments in the precision medicine, kidney and tuberculosis research programs so providers can match patients to the best treatments based on their specific health conditions, life-experiences, and genetic/phenotypic profiles.

This project will address the following objectives:

  • Develop high-quality training data sets and capture lessons learned from best practices in data annotation and curation, and compile insights on the data quantity and quality requirements for machine learning as applied in PCOR.

  • Develop machine learning algorithms that will be trained and tested on the curated data sets.

  • Develop implementation guides detailing each method used and the generic aspects of the data that each method leverages, with detail sufficient to facilitate its application to a wider array of use cases.

  • Disseminate tools and training data, and lessons learned to stimulate the application of these methods to a wider array of use cases by PCOR researchers.