The Feasibility of Using Electronic Health Data for Research on Small Populations. Privacy and Security Conditions Required for Research Using EHR and Other Electronic Health Data


In addition to technical requirements for data extraction and analysis, there are legal requirements that complicate the repurposing of EHR data for research. Privacy and security may be of particular concern for small populations, where individuals may be easily identified with just a few variables. In addition, particularly where there may be issues with stigma, individuals from small populations may not want to be identifiable by their employer, school, or others who may access the data. Institutional review boards are used to requiring that data are used for only one project for which patients consent, and that identifiable data are destroyed at the end of the study. Such requirements create barriers for the use of EHR-based data for research. Usual practices for protecting privacy and security may need to be reconsidered when EHR-based data are to be used for research. This data source will have increasing potential to answer additional research questions as more information is collected over time. Alternatives to study-by-study review and consent requirements will need to be found if the potential of EHR-based data is to be realized.

Legal landscape

Presently, the two federal laws most relevant to the use of electronic health data for research are the Health Insurance Portability and Accountability Act (HIPAA) and the Common Rule.324 In addition, there are state laws that govern the use of health data tend to go beyond the protections provided by HIPAA. While HIPAA allows covered entities (including most health care providers) to access, use and disclose identifiable personal health information for treatment, payment, and health operations (including quality improvement), the HIPAA Privacy Rule requires informed consent be obtained from individuals to use this information used for research. The Common Rule covers research conducted using federal funding from certain agencies, and defines research as “systematic investigation, including research development, testing, and evaluation, designed to develop or contribute to general knowledge.” Application of these two laws broadly defines what is legally considered research today.

The original HIPAA legislation was passed in 1996—before the use of EHR based data for research was foreseen. Concern is growing about how much the HIPAA rules and their local application may deter important research based on secondary use of patient records.325 The HIPAA omnibus rule was changed earlier this year with the intention of increasing protection and control of personal health information, particularly in light of the growth of electronic data. Individual rights are expanded so patients can ask for a copy of their electronic medical record, as well as instruct their provider not to share their information with their insurance company if they pay in cash. In addition, the new rule aims to reduce individual burden by allowing the use of their health information for future research purposes.326 This however does not address the need for consent for secondary uses of already collected data for research.

There is ongoing legal/ethical debate about the role of restrictions based on HIPAA and human subjects’ protection in governing the use of EHRs for research, as well as on the blurring line between the use of the information for quality improvement and for research. The IOM has suggested that in a learning health care system, the distinction between research and quality improvement or other internal uses is artificial, and the laws remain unclear on this difference as well. Out of caution, IRBs tend to treat all secondary uses of data as research—a practice supported by publication policies of many academic journals that require IRB approval for results to be published.327 Other countries such as the UK and Canada are in the midst of similar debates around balancing the need to protect privacy with secondary uses of data for research. Some countries such as Denmark have concluded that database-driven research should be allowed without the consent typically needed to protect research subjects because of its contribution to the common good without disrupting people’s everyday lives. Because studies entirely based on national registries or clinical databases can be done without patient consent, a growing number of population-based studies using EHR data are being done in Denmark.328, 329

In addition, where there is lack of clarity or knowledge on the details of the laws, researchers tend to air on the more conservative side where they perceive there may be a potential issue for their IRB. At times, it is even unnecessary to go through the IRB but it is done with intentions of being cautious—but also creating unnecessary expense and patient and provider burden that at times are not legally necessary.

Opportunities for patients to make meaningful choices

While the intent of informed consent is to respect patient autonomy, it has been argued that the public benefit of health research is greater, particularly if adequate provisions for protecting data confidentiality are present.330, 331, 332 The burden would be intolerable if patients had to be re-contacted for consent for each new research use of a database that contained their records. The ability for patients to now give consent for future research given the update to HIPAA may help relieve this burden. However, patients may want their information to be used for certain purposes and not others, or change their mind over time. Interestingly, there is some evidence that patients view the use of medical records to be part of the health care routine and a necessary part of receiving good treatment rather than considering it in terms of the costs and benefits of participation in research.333 There are a full range of practices that can help patients make a meaningful choice, such as transparency around how their information will be used, who will use it, and allowing patients access to their own data.334

The benefits of seeking individual informed consent before using their EHR-based data for research are increasingly seen as coming at too high an administrative burden on research.335, 336 Of even greater concern is the potential for bias when records of patients who have not consented are excluded. One national survey found strong support and willingness to share one’s electronic health information for research,337 and evidence is accumulating that patients who refuse to agree to the use of their records in research differ in various ways from those who agree. A recent review of 17 such studies from around the world (including 5 from the United States) found differences by age, sex, race, education, income, and health status between patients who did and did not consent to the use of their medical records for research.338 Such differences could bias research results or limit generalizability of findings. This could be particularly problematic in research on small populations. In addition, there are specific issues with including child populations (such as adolescents with ASDs) in research because they are not legally able to provide informed consent, which implies understanding of the potential risks of participating in research. Parents must provide consent on their behalf, but may uncomfortable with their children being included in research studies. Until recently, children were rarely included in medical studies. Agencies such as the FDA are making an effort to educate parents on the importance of including children research.339

Both HIPAA and the Common Rule have been criticized for over-emphasizing patient consent rather than providing more comprehensive opportunities for patients to make meaningful choices.340 Organizations that conduct a lot of research using EHR data have taken a number of approaches to issues of meaningful choice and protecting patient privacy. These approaches include obtaining general consent from patients at the time care is being provided for the use of their records for research, standardizing IRB documents, classifying studies as quality improvement rather than research, and using de-identified data. For example, Essentia Health asks patients to sign a general consent form each year to use their data for research purposes. Only 1–2 percent of Essentia’s patients have been opting out, and those who opt out don’t appear to be different from those who do not demographically. This general consent applies only to research conducted within the health system and its research institute, and IRB approval is needed for use of the data for research.341 Geisinger Health System requires IRB approval for each research project, but has standardized the needed documentation to streamline the process. They also take additional steps to protect patient information, such as altering dates in the copy of the data used for research to protect confidentiality.342

For Kaiser Permanente, when someone signs up to be a member, they are informed that their data will be used for “approved research purposes.” Members may request to be excluded from all future research projects or from all genetic research. IRB approval is not needed when identifying information in EHR-based studies is used only to make linkages and then removed.343 Vanderbilt has also granted a waiver of consent under the IRB Common Rule to allow research on LGBT patients without consent since the data are de-identified after extraction. However, patients do have the opportunity to opt out of studies.344 New York’s Health and Hospital Corporation makes only de-identified data available to researchers.345

Other health systems such as Intermountain Healthcare and UC Davis conduct some studies that are classified as quality improvement rather than research, and these do not require IRB approval or informed consent.346 Such classifying of studies as serving operational purposes may avoid the privacy protections needed for research (defined as intended to generate generalizable knowledge new findings for publication), there are tradeoffs. If the activity is conducted for quality improvement or other in-house purposes, the investigator may lose ability to set priorities, be unable to invest the time needed for a rigorous study, or to candidly share findings externally. This disincentive to share knowledge externally prevents much of this type of work from contributing to a learning health care system.347 On the other hand, analytics performed for internal uses such as quality improvement may have the benefit of leveraging available data facilitate studies that are quicker and less costly than traditional research. 348, 349

De-identified data

HIPAA’s Privacy Rule does not regulate de-identified data, and it specifies that data can be de-identified using safe harbor criteria (the removal of 18 specified data fields that could be used to identify an individual) or statistical methods (demonstrating extremely small statistical risk that an individual could be identified). Statistical methods are less commonly used because the description is vague and there remains lack of a standard approach.350 In addition, individuals with the knowledge needed to make an expert determination that the statistical risk is sufficiently small are in short supply. However, some organizations such as Vanderbilt’s Multicenter Perioperative Outcomes Group, a consortium of 30 medical centers aggregating EHR data, patient reported outcomes and administrative outcomes,351 have opted to seek this expert determination instead after finding use of the safe harbor criteria to be more challenging, particularly when pooling data from multiple centers. The Privacy Rule does allow the alternative of using a limited data set that includes certain geographic and date information considered important for patient-centered outcomes research, but then requires a data use agreement between the data holder and the recipient. Researchers at Kaiser Permanente have found limited data sets to be useful for research when the length of time between events can be included where full dates are not allowed.

While eliminating the need for informed consent, de-identifying data may remove the information needed to identify small populations. For instance, removal of geographic identifiers makes it impossible to identify residents of rural communities. In addition, de-identified data complicates linkage of patient records from multiple sources, such as with lab or pharmacy data if not integrated into the EHR or across multiple institutions where the patient may receive care.


Governance processes specifying who owns, controls, and regulates the data must also be in place in order to use EHR data for research. Data governance is generally understood to include legal and regulatory concerns, the structure and role of governance bodies, IRB issues, properties of data, data sharing considerations, business issues, stakeholder engagement and participation, and sustainability.352 Institutions may designate committees or have designated employees responsible for these issues. Data governance has also been described as the process designated for the data steward (such as a health care organization) to carry out its responsibilities. A data steward has fiduciary responsibilities toward the data, or has been trusted with information that patients consider private. The role of a data steward continues to evolve both conceptually and legally, particularly as health care data have potential not only for research, but are already used for many purposes in the public interest such as for quality monitoring and improvement.353 There remains a lack of coherent policies and standards to help govern the secondary use of health data.354

In the absence of specific governance structures for research processes, some organizations such as New York’s Health and Hospital Corporation have developed a data warehouse and use the data for quality improvement; their data are used less frequently for research.355 However, building this infrastructure is resource intensive and obtaining funding for this type of development may be difficult for health systems. One of the reasons Essentia developed a separate research institute was because grants are often unwilling to pay for programming at the site of day to day operations.356 Geisinger has also developed a separate Research Center which is based on an honest broker system where researchers can request to look at a topic (such as diabetes and a specific genome), and then the broker runs the database and shares the results.357 Some health systems are creating new companies that house and mine their electronic health record data and to combine them with other sources such as EHRs from other health care organizations. Two examples of health systems with such companies are Montefiore (Emerging Health Information Technology) and MetroHealth (Explorys).

View full report


"rpt_ehealthdata.pdf" (pdf, 1.99Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®