The Feasibility of Using Electronic Health Data for Research on Small Populations. Information Available in an Electronic Health Record


To be useful for research on small populations, EHRs much include information identifying individuals as fitting into those populations, as well as information about their health and health care. For example, even if members of an Asian subpopulation were identifiable using EHRs, if they rarely seek health care or tend to seek care from places where there is less EHR penetration, or if language is a barrier to communication when they do seek care, limited information may have been recorded on their actual health and health care.

Much relevant information is routinely collected in EHRs in the process of patient care. In 2003, the Institute of Medicine identified eight core functions that EHR systems should be capable of performing in order to promote safety, quality and efficiency in health care. These functions include:241

  • health information and data
  • result management
  • order management
  • decision support
  • electronic communication and connectivity
  • patient support
  • administrative processes and reporting
  • reporting and population health

Additional functions common to EHRs include alerts for clinical preventive services, drug-drug interactions and drug allergies. Organizations have taken several approaches to obtaining a system with the needed functionalities. Purchasing a comprehensive system (often referred to as the “single-vendor strategy”) has been the most common approach among U.S. hospitals,242 but some piece together elements from different systems (e.g., scheduling, billing, and EHRs) and there is variation in what information is included in EHRs in different organizations.

EHRs typically include a patient’s demographic information, personal and family medical history, allergies, immunizations, medications, health conditions, contact and insurance information, as well as a record of what has occurred during visits with the provider.243 Information may be collected both at sign in at the registration desk and during the visit with the provider.

Patient-reported data

Basic contact, insurance, and demographic information about patients is collected at the registration desk or in the waiting room. Patients may also be asked for pertinent information about their health. Some providers use iPads or computer kiosks that allow patients to enter information directly into their EHR. Some also have patient portals that allow patients to view their information and to communicate with their health care providers. These can be set up to directly interface with the EHR,244 creating source of information within the EHR. At this stage of EHR use, all patients are not equally likely to use patient portals; minority patients may be less likely to use them and younger patients more likely.245

One benefit of collecting some information directly from patients through a written or computerized telephone questionnaire or patient portal is that it gets around the difficulty of getting staff to ask patients for information about such topics as race/ethnicity or sexual orientation.246 While challenges remain with how to word questions in order to identify LGBT populations, the bigger challenge remains training providers and other staff to ask the questions when there are common biases that may prevent them from wanting to ask or document this information.247 Both UC Davis and Vanderbilt health systems are beginning to collect information about patient’s sexual orientation and have opted to use patient portals for doing so.248 Given the opportunity to answer questions from home, patients may be more comfortable reporting certain information. Added benefit of reporting from home is that family members may help if there are language barriers. Geisinger Health System has started using patient portals to collect information about existing medications, and this information gets put into the EHR. Patient reporting may both save clinician time and include information that would not otherwise get entered. Vendors have developed tools such as clinical prediction rules and analytics engines to prompt clinicians based on information a patient enters.249

In recent years, there has been increasing effort to promote standardized collection of race, ethnicity and language data by registration staff in response to policy initiatives as well as accreditation requirements. Efforts often include staff training and patient education. For example, the Hospital Association of Rhode Island received funding for a five-hospital pilot to improve collection of race and ethnicity data. Its pilot included input from stakeholders on which granular ethnicity categories should be collected, standard interview scripts for staff to collect patient information, and materials to educate patients on why they were collecting the data.250

Clinical encounter data

Data collected during office visits and entered by the clinician into patient records during a visit may include reason for the visit, height, weight, vital signs, patient reported symptoms and characteristics (such as behavior and lifestyle), diagnoses, treatments and tests ordered, and medications prescribed. Information the pharmacy, laboratory and radiology are often incorporated into the EHR. This should include test results and imaging from other systems.

Clinical information may be entered in a structured format where the clinician can select from standard, predetermined categories such as diagnosis or procedure codes or medication list. Clinicians may also enter information in free-text notes in their own words or the patient’s words. For a condition such as autism spectrum disorder, relevant information may be entered as a diagnostic code or in free text about symptoms suggest the diagnosis or about patient or parental reports of such a diagnosis in the past. Diagnostic information may also be implied by the clinician’s prescription choices.

Although the use of electronic health records creates opportunities for standardizing much patient care information by setting requirements for data fields, many clinicians prefer to record information in the unstructured manner that was used when entering information into paper charts. Many clinicians have traditionally audio-recorded their notes from the visit, and voice recognition software can now transcribe audio-recording into free-text fields in the EHR.251 This preference may disappear over time as younger medical students who grew up using computers enter clinical practice. Whether information in an EHR is structured or unstructured has important implications for research, which will be described later in this report, but today most information contained in EHRs is unstructured.

Claims/billing information

Many providers have electronic practice management systems that handle functions like scheduling, billing, and collections. Such systems are increasingly being integrated with electronic health records. Although this is being done for practice management purposes, it can make the overall data system more useful for research. Billing systems can have more complete diagnostic and procedure information than do EHRs.


Figure II.1. Example: Potential Structure and Information in an EHR

Figure 1 is a diagram showing the types of information that could potentially be in an electronic health record and how it may be organized and retrieved.  Each person’s record is showed to include administrative, pharmacy, laboratory, radiology, and narrative information.  This information is displayed as collected from multiple people over time and stored in a database, where data can be extracted on an individual over time at the point of care, across multiple individuals over time from one category of information (such as laboratory data) for statistics, or across multiple people and categories of information over time for research.

Source: Jensen PB, Jensen LJ, and Brunak S. Mining electronic health records: towards better research applications and clinical care. Nature Reviews, June 2012 (13): 395-403.

View full report


"rpt_ehealthdata.pdf" (pdf, 1.99Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®