The Feasibility of Using Electronic Health Data for Research on Small Populations. Potential for Future Research on Small Populations


Despite existing challenges to meeting the conditions needed to use EHR data for research, the experts we interviewed provided examples of innovative ways barriers were being overcome. Additionally, they were cautiously optimistic that some other barriers could overcome in relatively short time frames, potentially resulting in a “tipping point” or “major paradigm shift” in how clinical and health services and policy research is conducted in the not so distant future. Specifically, the experts we interviewed had a number of suggestions for ways to move forward in the field of EHR-based research in general and/or ways to study specific small or minority populations. These suggestions can be categorized as potential studies aimed at data validation, new tools and methods for mining and extracting data, descriptive studies around specific populations, and outcomes research. There were also a number of recommendations around engaging and encouraging collaboration among key stakeholders (clinicians, small populations, and vendors) to improve the quality of data collected, as well as on improving the legal framework and other policy issues around secondary uses of electronic health data.

Data validation

The most commonly suggested types of studies were those aimed at further examining the strengths and limits of EHR data, as well as identifying potential methods to strengthen the data for research use. Research networks such as the HMO Research Network,422 Community Health Applied Research Network,423 and Practice-Based Research Networks or DARTNet may be good places to conduct this kind of research because of the volume and variety of data they have available and the expertise they have already been developing through other projects and studies. The Health Care Systems Collaboratory was also identified as a good place to start for these types of projects because participants are advanced and can demonstrate the potential of EHR-based research.424

A potential related area for research included the development and testing of various patient surveys and/or completed instruments, including perhaps a catalog of items patients could self-report that would be integrated into the EHR and combined with other data. For example, it has been shown that patients will accurately report their height so it does not need to be measured by the nurse, but patients are less likely to accurately report their weight.425 A study to examine whether meaningful use has increased documentation of targeted variables was also suggested.426 In one such study, Kaiser is conducting targeted patient interviews as patients left a doctor’s office to see if they are smokers (a meaningful use measure) and whether the doctor talked to them about it, giving them a better sense of how to interpret their EHR data. The use of interviews and other methods of directly hearing from the patient are an important form of validation because although electronic health data can provide a lot of information, the only way to in know how a patient feels is to talk to him or her, or the caregiver. The collection of health-related quality of life data and/or patient experience data provides additional information from the patient’s perspective.

One suggestion for research funders from the technical expert panel was to take some studies that have been conducted on small populations using survey methods and to release requests for proposals to see if there is anyone who could look at the same issue and population using EHRs or other electronic health data, allowing for a comparison of results between methods. Similar rapid response requests for proposals could be used when there is a pressing issue for a particular small population that EHR networks could potentially examine. There were also potential studies suggested among those interviewed to examine the validity of data used to identify specific small populations for research, such as:

  • A large, prospective study to understand how sexual orientation and gender identity data captured in EHRs differs from patient views427
  • Research to identify how patients are identified as having an ASD and the data elements needed to study ASD patients, both to assess what data are available and how complete these data are428
  • Examination of the potential of natural language processing to identify ASD patients429 and sexual orientation430
  • Studies on whether and how physicians are collecting information around sexual practices and sexual orientation431

New tools and/or methods

As several examples briefly described in the report illustrate, the field is developing a variety of new methods and/or tools to identify priority small n populations in EHR databases and transform key EHR data into analytic files for research. For example, researchers described algorithms or natural language processing software that more reliably and validly identified small n populations of interest and ways to use well-validated surveys to collect key information and integrate it into the EHRs. They also described a variety of different kinds of databases and some of their relative strengths and weaknesses. These and other kinds of tools could be further developed and the significant experience gained from current projects be capitalized on to develop a clearer picture of the strengths and weaknesses of different approaches for extracting and using the data from a variety of perspective and the conditions under which one may be relatively advantageous or likely to succeed.

There is also work being done to explore new methods that can incorporate the use of EHRs and other electronic health data into more traditional methods of research, as well as to better understand what types of studies EHR data may or may not be best suited for. There is a need to further develop research study designs in order to study small populations. While randomized controlled trials have traditionally been the “gold standard,” there is growing agreement that this discipline must evolve, particularly to be able to focus trials on specific subgroups to look for differences. For example, the HSC Collaboratory has been exploring the use of EHRs for more pragmatic, real world approaches to clinical trials. While these approaches may not produce results that are generalizable, for research on small populations in particular there is a lot to be learned if they can be studied as the unique group that they are when the opportunity is available to use quasi-experimental models. Ease of access to the population may also provide opportunities to study of small, unique populations that may be concentrated in certain areas or in a health system or plan where there is good data. For example, Kaiser Hawaii may provide opportunities for research on Asian subpopulations as it serves a large concentration of Asians and has had good ethnicity data for years.

In addition, there should be considerations over what would be a useful control group for studies on small populations. Using controls from within the same electronic health data set may be advantageous because any bias in the data is likely not systematically skewed to the control. Although these biases may not be quantifiable, they can at least be described qualitatively in light of knowledge of the limitations of the data.432

It would also be helpful to identify ideal study components where EHRs and other electronic health data can help supplement other information that is collected, such as to provide utilization information for clinical trials, or to help develop high risk cohorts. EHRs may offer a viable first stage screening for proxies, such as use of a treatment as a proxy for having a rare condition. EHRs may be helpful in identifying these research questions, potentially by examining the distribution of comorbidities, or how delivery of care differs across subpopulations. There may also be ways to combine EHR and other types of data such as survey data. Some examples may include using EHRs to identify a population for a more targeted survey, or conducting a survey and then supplementing that information with what is available in medical records. Using a combination of data sources may also facilitate more effective identification of small populations. In addition, while geospatial approaches have typically been used to study rural populations, they may also be useful to study other small populations because they are often not evenly distributed throughout the country.433

Descriptive studies

There were also a number of suggested studies using EHR data to better understand the health and health care of specific small populations. For example, Kaiser has used sophisticated sampling with its EHR data to stratify patients into various subgroups according to how likely they are to have COPD—presumably, this could be done with other health outcomes. These studies could serve to examine how various subpopulations fare relative to the majority population and to identify disparities in order to address them. Some examples include:

  • Health: studies to examine comorbidities of adults with ASDs,434 or common diagnoses among different Asian subpopulations435
  • Social determinants of health: studies to better understand the patient complexity and risk associated with social determinants of health barriers (e.g., limited English proficiency, poverty level, insurance status) among different Asian subpopulations, many of whom are immigrants436
  • Health care utilization: studies to examine use of pediatric services by adolescents with ASDs during the transition to adulthood,437 use of psychotropic and ADHD medication among young children with ASDs,438 as well as referrals to mental health services and outside behavioral diagnostic testing439
  • Enabling services: studies to examine the impact of supportive health services (e.g. insurance eligibility, interpretation, case management) on health for Asian subpopulations440
  • Quality: research around the receipt of recommended care by Asian subpopulations, LGBT, and other minority or disadvantaged groups441
  • Patient experience: use of satisfaction surveys linked to encounter data to examine the experience of LGBT patients442

Outcomes research

Finally, a number of interviewees pointed to the potential of EHR data to be used for research examining outcomes, and how these outcomes may differ for different sub-groups of the population. This would include examining the outcomes of medications, types of treatments or care processes,443 interventions such as smoking cessation or medications,444 and new models of care such as telemedicine for rural patients.445

The information in EHRs is well suited for research around clinical topics, health services, delivery system issues, and quality of care. The volume of information makes it useful for high-level, broad utilization benchmarking as well as for more detailed information on small populations.446 The ability to identify small populations also presents an opportunity for comparison studies to identify disparities in health and/or health care that may be experienced by certain groups, such as differences in access or quality of care. These data are also useful for descriptive epidemiology that looks at the prevalence and trends of certain conditions over time by certain demographic or other characteristics,447 as well as quality improvement research to improve care for certain populations.448

EHRs also provide a unique opportunity to look for undiagnosed conditions. For example, CHARN is looking for people with possible undiagnosed hypertension by identifying people in EHRs who have high blood pressure but have not gotten tested for hypertension. They are then targeted for testing and therapeutic intervention.449

Stakeholder engagement and collaboration

In addition to potential studies, those interviewed recommended efforts to further engage key stakeholders to improve the quality of data collected, as well as to direct the research agenda for using electronic health data to study small populations. In particular, clinician engagement was recommended in order to improve the quality of data available for EHR research. Providing education about the importance of the data may motivate physicians to enter data into structured fields rather than free text. An additional incentive may be to provide feedback on their data quality along with reports around the quality of care.450 Encouraging clinicians to use their data will lead to improvement as they identify and address errors. Obtaining trust from participants is a big issue—for example, a representative from CHARN interviewed is aware the participating community health centers (CHCs) are still watching to make sure the coordinating center is not just writing reports using their data rather than engaging the CHCs in research.451 Information could also be provided to help them manage their patient populations more effectively so they can see the usefulness of high quality data. For example, reports could identify complex chronically ill patients for follow-up.452 Engaging clinicians in the development of research may help identify research questions that help address the challenges they face in clinical practice. Also, practices that participate in research networks should be supported monetarily and in terms of infrastructure to make sure they are collecting the data that researchers want. Relationship building is required, as well as some benefit to the providers from the data in order to obtain their buy-in and support. Some interviewees also suggested being purposive regarding what types of practices contribute data for research—partnering with those who are interested in using their EHRs to generate evidence, and practices with patient populations who might otherwise be underrepresented in research, such as those serving children or ethnic minorities.453

In addition to engaging providers who treat small populations, engaging the small populations themselves is important to improve the quality of data collected. One recommendation from the technical expert panel was to work with the LGBT community to develop ways to respectfully identify them, as well as to gain consensus around what information to collect and what categories to use. With HHS piloting questions to identify the LGBT population on national surveys, there may be an opportunity to compare these findings with EHR-based methods of identifying LGBT patients. Another suggestion was to convene a task force to identify the data needed to study small populations. Establishing common data elements for each population, such as specific demographic variables, may also be a task for such a task force. Vendors must also be engaged around the need for common data elements, as well as to promote the development of EHRs that support a learning health care system.454

The legal framework and other policy issues

Although the technical expert panel identified a potential role for the federal government in disseminating best practices on how research has been successfully conducted thus far within the legal framework, there was agreement that in the long run, these “work-arounds” would not be sufficient. Elements of the law that have been suggested as ripe for revision include the over-emphasis on informed consent over other fair information practices, preferential treatment of quality improvement and other internal uses over research, and lack of guidance around network architecture, governance and IRB structure.455 There is also opportunity for the government to educate the public around the benefits of using their health data for research and the barriers that over-protection of privacy pose to progress in the fields of medical and public health research. Privacy concerns that prevent patients from allowing their data to be shared also leads to a number of health risks, such as errors that occur when a patient’s multiple providers do not know what each other are doing. While the younger generation has grown up in the age of social media and may have fewer concerns around privacy, recent events such as the publicity around PRISM (the National Security Agency’s electronic surveillance program mining telecommunications data) have brought to light existing public concerns around privacy.

Implementation of policies aimed at closing the digital divide experienced by rural and safety net providers such as the HITECH Act will also improve the availability of electronic health data to study small populations. The need for a business model for EHRs in rural practice remains. The development of subscription-based EHRs operated over secure web portals and requiring only web appliances in the physician’s office may be one solution. Further development of networks like CHARN and support for such networks to learn from the experiences of more well-resourced research enterprises such as Kaiser or the HMO research network is also important for studying these populations. The government may also consider supporting the development of decentralized data warehouses and other IT infrastructure to link health systems in specific geographic areas, such as underserved urban areas or sparsely populated rural areas. Funding the development of “Centers of Research Excellence” to support the development of EHR-based research on small populations may also help build infrastructure.

Finally, closing gaps that occur when children age out of their parent’s insurance will improve the continuity of electronic information available to study small populations over time. While additional opportunities and subsidies to purchase insurance through the Affordable Care Act may help address gaps in coverage, there must also be efforts by delivery systems to close gaps in information. Development of personal health records and more robust information exchanges as incentivized in the HITECH Act will help. Simpler solutions exist as well, such as providing patients with a copy of their information that they can share with new providers. This has been done in cancer care and may be helpful to adolescents with ASDs as they transition to adulthood as well.

View full report


"rpt_ehealthdata.pdf" (pdf, 1.99Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®