The Feasibility of Using Electronic Health Data for Research on Small Populations. Potential for Future Research on Small Populations


Despite existing challenges to meeting the conditions needed to use EHR and other electronic health data for research, our interviews and literature review illustrate that innovative solutions are being developed through a variety of publicly supported and private efforts. In particular, a number of large delivery systems and research networks have made substantial steps forward in developing the infrastructure and methods needed to conduct this type of research.

Experts in the field have suggested ways to move forward in the field of research using EHR or other electronic health data in general and/or ways to study specific small or minority populations. These suggestions can be categorized as potential studies aimed at data validation, new tools and methods for mining and extracting data, descriptive studies around specific populations, and outcomes research. There were also recommendations to explore the types of research for which EHR data are best suited, as well as ways that it can be used in combination with other data sources for research, including survey data. In addition to potential studies, there have been recommendations for efforts to engage clinicians in order to improve the quality of data available for EHR research. Providing education around the importance of the data may motivate physicians to enter data into structured fields rather than free text. Opportunities also exist to update the current legal framework that regulates use of electronic health data for research to both promote patient ability to make meaningful choices while minimizing the burden on both patients and researchers.

In order for research using EHR and other electronic health data to reach its full potential both in general and with small populations, engagement of key stakeholders must continue. Many of these stakeholders are working to identify critical next steps and promising pilots through an effort led by the Assistant Secretary for Planning and Evaluation (ASPE), including the development of this report with the input of technical experts. Other key stakeholders include government agencies, EHR vendors, health plans, providers, researchers, and consumer/patient groups, which all play an important role in achieving the conditions needed for research using EHR and other electronic health data.

Table ES.1. Major Conditions Required for Research Using EHR and Other Electronic Health Data on Small Populations

Condition Challenges Solutions Being Tested
Data extraction Requires IT skills, data storage, vendor cooperation, identification of desired records and variables Central data warehouse within an organization, software to extract data from distributed data systems
Processing unstructured data Highly heterogeneous, use of acronyms and appreciations, may include typing and spelling errors Tools for natural language processing
High-quality, complete data Errors of omission and commission, data limited to population receiving care from the organization, who may also receive care elsewhere that is not included; generalizability Careful interpretation of results, linkage to other data sources, use of data from integrated delivery systems and research networks
Privacy and Security
Protection of patient privacy Informed consent required for traditional research too burdensome for EHR-based research and may result in biased samples when only consenters included, information needed to identify small populations may be a threat to privacy for individuals Obtaining general consent from patients for research using EHR data, use of de-identified data, classifying analysis as quality improvement rather than research
Governance Resource investment and cooperation needed for infrastructure specifying who owns, controls, and regulates the data for research use HIPAA provides some guidance, some organizations have developed a separate institute or company to conduct research
Combining Multiple Data Sources
Data sharing Creating central warehouse for multiple organizations is resource intensive to build, maintain, and govern, privacy and data ownership concerns Virtual/distributed data warehouses, practice-based research networks, regional health information exchange
EHR interoperability Large variety of EHR systems and vendors, lack of standards Federal incentives, voluntary consensus standards, efforts across organizations and vendors to standardize

Table ES.2. Ability of Federal Survey and EHR/Other Electronic Health Data to Address Challenges in Studying Small Populations

Challenge Survey Data EHR and Other Electronic Health Data
Sampling Challenges
Small size of population Difficult to obtain an adequate sample when sampled randomly Larger sample (although not random) increases the potential to obtain enough records from a small population
Uneven distribution across the country of some small populations Difficult to obtain an adequate sample when randomly sampled Can use data from providers where the targeted subpopulation is concentrated
Information Challenges
Ability to identify members of small populations Lack of consistent categories used to classify members makes this challenging. Also, at times categories are not granular enough to identify specific small populations Same, although natural language processing and use of multiple electronic data sources has shown some promise to help identify certain small populations. Challenges exist training providers and staff to collect needed information
Detail available to understand health and health care needs Limits to survey length and self-reported information make level of detail low Large volume of detailed information available, documented by providers, registration staff, and patient
Validity of data Relatively strong, although there are weaknesses with self-reported information Varies by type of electronic health data as providers document information for non-research purposes
Research Challenges
Ability to study small populations over time Cross sectional nature of most surveys does not allow this Longitudinal nature of electronic health records well suited to follow populations over time
Need for different types of research Data collection designed for generalization across the broader population and for hypothesis testing Better suited to study unique populations than for generalization, as well as for descriptive or hypothesis generating research
Privacy Access to information needed to identify small populations may risk ability to identify individuals Secondary use of EHR and other electronic health data for research is challenging in the current legal framework


View full report


"rpt_ehealthdata.pdf" (pdf, 1.99Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®