Standardization and Querying of Data Quality Metrics and Characteristics for Electronic Health Data

Creating and implementing a data capture and querying system to improve research on data quality, characteristics, data source, and institutional characteristics.

Agency

Food and Drug Administration (FDA)

Start Date

8/20/2016

Functionality

Use of Clinical Data for Research
Use of Enhanced Publically-Funded Data Systems for Research

STATUS: Completed Project

BACKGROUND

Currently, no standards exist for describing and characterizing the quality and completeness of electronic health data. In order to best utilize data, it is imperative that researchers and investigators know the fitness for use and reliability of data. In order to address this gap, the FDA built upon the connections and knowledge gained from the Cross Network Directory Service (CNDS) to discover and design data quality metrics for metadata standards; standards that would further regulate the “data about data” helpful in tagging, structuring, formatting, and coding data according to its categorization and fitness.

PROJECT PURPOSE & GOALS

Under the FDA, this project created and implemented a metadata standards data capture and querying system for: 1) data quality and characteristics, 2) data source and institutional characteristics, and 3) “fitness for use.”

Project Objectives:

Develop and test metadata standards and technical specifications.
Implementation of the new metadata standards in at least two distributed networks.
Incorporate the standards in an open‑source software release.

PROJECT ACHIEVEMENTS & HIGHLIGHTS

The project team evaluated existing data quality frameworks and processes, and developed a data quality data model to enable the exploration of Data Quality Metrics (DQM). The model is organized around a central table that captures measurements (e.g., counts of patients, maximum or minimum values) and that is surrounded by tables that identify for each measurement the source system, the context (e.g., patient, member, encounter, claim), relevant stratifications (e.g., age ranges), and other qualifiers.
The team developed and tested a web-based toolkit and accompanying database to store data quality information. The tool, which allows for the exploration of data source characteristics for multiple data sources, was released for open-source use.
The project team created several publically available resources. Technical system documentation targeted toward software developers and other technical audiences, and user documentation targeted towards system end-users were posted on GitHub. The team also published the software code for the DQM Authoring and Querying Platform, a cloud-based, open-source tool that allows user to develop and author new metrics, capture data quality metric measures, and support the evaluation and visualization of supplied measures.

PUBLICATIONS, PRESENTATIONS, AND OTHER PUBLICALLY AVAILABLE RESOURCES

Resources:

The project team published a final report, “Standardization and Querying of Data Quality Metrics and Characteristics for Electronic Health Data,” in December 2019: https://aspe.hhs.gov/sites/default/files/private/pdf/259016/FDAs-Data-Quality-Metric-Final-Report.pdf
The project team operationalized the DQM, creating a publically available tool: https://dataquality.healthdatacollaboration.net/
The DQM Authoring and Querying Platform is available on GitHub: https://github.com/PopMedNet-Team/DataQualityMetrics
The Technical Documentation report provides technical documentation to facilitate their use of the DQM system. The report is available on GitHub: https://github.com/PopMedNet-Team/DataQualityMetrics
The User Documentation report provides detailed user documentation related to the use of the web-based DQM system. The report describes all elements of the web-based system and provides instructional detail on use by an individual. The report is available on GitHub: https://github.com/PopMedNet-Team/DataQualityMetrics
The project team presented and demonstrated the system during four stakeholder sessions in September 2019. All four sessions were recorded and posted on the site at the following link under “Community Engagement”: https://dataquality.healthdatacollaboration.net/resources

RELATED PROJECTS

Below is a list of ASPE-funded PCORTF projects that are related to this project

Harmonization of Various Common Data Models and Open Standards for Evidence Generation - This project was a collaborative effort among the Food and Drug Administration (FDA), National Cancer Institute (NCI), National Institutes of Health/National Center for Advancing Translational Sciences (NIH/NCATS), Office of the National Coordinator for Health (ONC), and the National Library of Medicine (NLM). The project built a data infrastructure for conducting patient centered outcomes research (PCOR) using observational data derived from the delivery of health care in routine clinical settings. The sources of these data include, but are not limited to insurance billing claims, electronic health records (EHRs), and patient registries. In addition, the project team harmonized several existing common data models, including PCORnet and other networks.

Cross Network Directory Service - The FDA developed and implemented a Cross Network Directory Service (CNDS) that addressed the stand alone nature of existing distributed research networks and barriers to working across these networks. The CNDS enables individual networks to become a community of interoperable networks, allowing end users to participate in and move around from network to network as research needs to dictate. This product enhances network scalability and enables data partners to engage with multiple networks by allowing them to decide their level of participation and their governance rules. This project created an open source interoperable service that allows data partners to participate in multiple data research networks, query across the networks, and share analytic capabilities and knowledge across networks. The project was piloted across two existing networks: FDA’s Sentinel System and the National Patient Centered Clinical Research Network (PCORnet).

Standardization and Querying of Data Quality Metrics and Characteristics for Electronic Health Data

Connect with Us