Development of a Natural Language Processing (NLP) Web Service for Public Health Use

Designing a web service for the public and researchers to be able to share interoperable technologies to address public health issues.
  • Centers for Disease Control and Prevention (CDC) 
  • Food and Drug Administration (FDA)


Start Date
  • 6/1/2016


  • Use of Clinical Data for Research
  • Use of Publically Funded Data Systems for Research


STATUS: Active Project


While there have been strides through Meaningful Use and other activities to implement standardized electronic health record (EHR) systems, there continue to be parts of the medical record, laboratory reports, and other clinical reports that are reported in free-form text narratives.


This is a collaborative project between the Centers for Disease Control (CDC) and the Food and Drug Administration (FDA).

This project proposes to develop a natural language processing (NLP) web service that will be accessible and publicly available to researchers on the Public Health Community Platform (PHCP) – a cooperative platform for sharing interoperable technologies to address public health priority areas aimed at improving population health outcomes and health equity (e.g., tobacco use). The NLP service will enable functionality that processes spontaneous report narratives, extracts clinical and temporal information from the text, formats the data for presentation, and maps unstructured medical concepts (e.g., cancer data and safety surveillance data) into structured and standardized data (i.e., International Classification of Diseases 10th Edition Clinical Modification (ICD‑10‑CM), Logical Observation Identifiers Names and Codes (LOINC), Systematized Nomenclature of Medicine (SNOMED) and MedDRA.

The project objectives are to:

  • Conduct an “as‑is” environmental scan and literature review of exiting NLP algorithms, methods, and tools to inform the development of the NLP Web Service (FDA lead; CDC contributor).

  • Design the NLP Web Service technical requirements (CDC lead; FDA contributor).

  • Build structured datasets using CDC and FDA resources to capture data.

  • Pilot the NLP Web Service on the PCHP.

  • Evaluate the pilot.

  • Release the final NLP Web Service.