The Feasibility of Using Electronic Health Data for Research on Small Populations. Characteristics of EHR and Other Electronic Health Data That Make Them Useful for Research


EHR and other electronic health data are increasingly utilized for quality measurement and improvement, but until recently, the potential benefit of EHRs for research has not received much attention outside a few innovative, early adopting health care organizations. However, the use of EHRs for quality improvement has provided a foundation for extracting and formatting EHR data so it can be usable for other purposes, including research. In an EHR-based system, all quality improvement activities are implemented using the EHR. The wealth of information being collected has the potential to facilitate great leaps forward in both the scope and efficiency of clinical, health services and policy research.270 But, the answer to the fundamental question of whether EHR data are currently good enough for research on small n populations may depend on the definition of research and/or the specific kinds of research of interest. While EHRs may be well-suited for some types of research, it may be poorly suited for other kinds of research, and while the field has recognized this concept of “fit” between purposes and data it is still working through for which kinds of research EHRs and other electronic health data are currently well-suited and where further work is needed.

Health services research has been defined as “the multidisciplinary field of scientific investigation that studies how social factors, financing systems, organizational structures and processes, health technologies, and personal behaviors affect access to health care, the quality and cost of health care, and ultimately our health and well-being.”271 For example, EHR data has great potential value for comparative effectiveness research (CER) about drugs, medical devices, tests, surgeries, or ways to deliver health care.272 However, CER may require more precise and complete information than is necessarily found in EHRs and so may require additional investment to insure that the data quality in a given system is adequate to the specific type or aims of the research. However, even less precise and complete information may be useful to identify patient populations or potential areas for further study.

Today’s medical and pharmaceutical research largely consists of relatively small clinical studies using highly selected patients with only one health condition. Findings based on such study participants may have limited generalizability to patients in the real world who often have multiple conditions. The large volume of information going into an EHR creates the possibility of examining rich clinical information about large numbers of patients over time. While EHR-based research may not replace traditional methods of advancing medical knowledge and faces a number of challenges, there are examples in which innovative health systems and researchers have begun to demonstrate its potential for research. Data analytics engines have been developed to mine warehouses of EHR data, to provide the information about how patients with certain characteristics respond to a given medication or treatment.273

Analyses of data that have been collected in routine patient care have the potential to greatly increase the speed at which research can move forward. For example, researchers at MetroHealth Medical Center in Cleveland, Ohio were able in 11 weeks to study patient characteristics associated with venous thromboembolic events over 13 years among almost one million patients.274 Without EHR data, the resources required to recruit and follow so many patients over time would have been incomparably greater. Research to identify risks missed in clinical trials may be conducted through analysis of EHR data—such as Kaiser Permanente’s review of internal medical records that revealed the connection between Vioxx and cardiac complications.275 A benefit of EHR data is that once you identify a population, there may potentially be years of data already available rather than having to wait many years to collect the information, particularly in organized delivery systems.276

The fact that EHR data are already computerized and is available in real time substantially increases the efficiency of research, eliminating the need for extraction from paper records and data entry. Rather than being spent for data collection, resources can go towards programming and database work to prepare EHR data for analysis.277 The data are also timelier than claims or survey data, where there is often a significant lag involved in collecting and processing the data. Data collection in real time also eliminates the need for patients to recall something that happened in the past such as is often required in survey research.278 EHRs also include much detail about processes of care that isn’t available in claims data, as well as information on the uninsured. HRSA has made a substantial effort to invest in data capabilities of safety net providers for this reason—and research networks such as CHARN provide an opportunity to better understand populations where there might otherwise be very limited information. Use of clinical data from EHRs can also help reduce or mitigate traditional coding problems with claims and other administrative data.279

The availability of medical record data about all patients in a health system also allows for identification of small subpopulations where identifying information is available in the EHR, such as those in uncommon demographics or with rare conditions.280 Information may be present about patients who might not otherwise be included in research because they would not meet the narrow requirements for participation in a clinical trial.281 For example, EHR data has been used for observational comparative effectiveness research among patients with hard to detect co-morbidities, to identify patients for recruitment for interventions, and for population management research.282 The population covered by an EHR system may provide more representative information than comes from traditional research samples.283 As use of EHRs increase and efforts continue to improve interoperability of EHR systems and to create networks for pooling data, future research may be based or on actual populations rather than small samples.284

Another important aspect of EHRs is their longitudinal nature, which allows populations of patients to be followed efficiently over time so that, for example, outcomes of treatment can be studied. In contrast, surveys collect information at one point in time, typically asking if someone was ever diagnosed or currently has a condition. However, diagnoses change over time. For example, at KP-NW every diagnosis has a date stamp that begins an episode of care, and an end date is also recorded when the episode is resolved. In the EHR, a health problem list is available in a centralized place that displays a patient’s entire history of diagnoses received, as well as whether each is ongoing or has been resolved (as opposed to needing to review thousands of pages in a thick chart to get this information). In addition, the recent change that allows children to remain on their parent’s insurance coverage through age 26 increases the likelihood that they will remain in a given record system through their transition to adulthood, making it possible to follow those with a condition such as ASD through this transition.285 As the number of years covered by an organization’s EHR system increase, opportunities will grow for research that covers multiple generations of family members.286 With longitudinal data, there is the potential to make causal inferences, while this is not possible with cross sectional data. However, other factors must be carefully considered in interpreting longitudinal EHR data, such as organizational or national changes that may account for the observed change. For example, an increase in smokers among EHR data may result from increased documentation due to incentives for meaningful use rather than an actual increase in smokers.287

A limitation of EHR data, in comparison to survey data, is that the information is not collected or structured for research, which presents a number of challenges for research. While EHRs do include information of great potential value for research on small populations, a number of conditions at the technical, legal, and organizational level must be in place for such research to reach its full potential. These conditions and related challenges in meeting them are described in the following sections of this report, which are organized by these three categories. Technical conditions such as the need to convert EHR data into an analyzable format, legal conditions such as agreement over standards of privacy, and organizational conditions such as the infrastructure needed to share data across multiple institutions will be reviewed. Examples from our interviews and the literature of organizations that have begun to use EHR data for research demonstrate how conditions are coming together to allow the research opportunities to move forward. However, as we discuss in the conclusion, hurdles remain and additional steps are needed in order to take advantage of the opportunities at hand.

View full report


"rpt_ehealthdata.pdf" (pdf, 1.99Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®