The Feasibility of Using Electronic Health Data for Research on Small Populations. Barriers to research on the LGBT population


Collecting valid and reliable survey data about LGBT populations has been complicated by several problems. There has been a historical reluctance to seek information about sexual orientation and gender identity in national health-related surveys. Being part of a same-sex couple has been used as a crude fallback measure in one study that used data from the Medical Expenditure Panel Survey to study LGBT peoples’ health care experiences.97 However, the reluctance to collect relevant information in national surveys appears to be changing. For additional federal surveys that may be used to identify members of the LGBT population, see Table I.3. Asking questions about sexual orientation, gender identity, and behavior is crucial to identifying this population.98 After several years of research and literature review, the National Center for Health Statistics has adopted a basic question and two follow-up questions regarding “sexual identity” (“Do you think of yourself as….”) for use in the 2013 Health Interview Survey.99 In the clinical context, however, sexual behavior may be more important than orientation or identity. Specific risk factors are associated with same sex sexual behavior no matter whether individuals self-identify as gay, while people who can be identified as gay may be stigmatized no matter what their sexual behavior may be. These multiple dimensions may raise the need for multiple questions, depending upon the purpose of the research.100

The smaller surveys that have included questions about sexual orientation, same-sex sexual behavior, and gender identity have varied in their focus and measures used. The choice of language in survey questions matters—affecting, for example, the extent to which respondents will identify themselves as lesbian or gay.101

The reluctance of some LGBT people—particularly but not only adolescents―to identify themselves as such to researchers has also made survey research more difficult, and it could complicate the collection and research use of relevant information in electronic health records, as we will discuss in the Part II of this report. It is possible that such reluctance will decrease over time as societal acceptance of LGBT people increases. Challenges in identifying LGBT populations in both surveys and EHRs likely differ by age, gender and sexual orientation—for example, gender identity may be better measured as a scale rather than a categorical question for women—who tend to have greater fluidity in their gender identity.102 Bisexuals may be the least likely to identify themselves as they are less likely to be “out” in their workplace or to health care providers than other people in this population. Sexual behavior is particularly relevant to health concerns among men, but collecting information about sexual orientation and gender identity is also important for research on the impact on health and health care of discrimination, stigma, and stress.103

A third problem that is particularly important in survey research (though it could also arise in EHR-based research) is the difficulty of obtaining high quality samples of small populations. As previously discussed, small numbers and the need to break the resulting sample into smaller units by sex, age, or race/ethnicity (and perhaps other factors) create the need for oversampling or combining years of data. Getting sufficient numbers of small categories in survey research of the general population is both inefficient and expensive. Because of the lack of good alternatives, researchers may draw samples from people who have had contact with organizations whose missions focus on the LGBT population. The representativeness of such samples is not known.

The records of service providers are also a potential source of data related to health and health care, but to date information about patients’ gender identity has not been a routine, structured field in medical records. Vanderbilt University Medical Center found that the time between when patients were first seen and when their LGBT status appeared in medical records averaged 30 months. This may be due to fluidity of sexual orientation, because patient were not comfortable disclosing the information, or because the provider didn’t ask about or document the information. Little is known about the extent to which questions about sexual behavior are asked in clinical encounters or recorded in medical records, although some methodological research that attempts to identify sexual orientation, gender identity, and sexual behavior using the narrative notes or unstructured electronic health record data with natural language processing (NLP) software is under way.104 Training medical and administrative staff about asking and recording information related to sexual orientation is a substantial task.105 Medical records will become much more useful for research as the health care industry moves from paper to electronic form and develops other strategies and tools to structure data or mine unstructured data. We explore this potential in Part II of this report.

View full report


"rpt_ehealthdata.pdf" (pdf, 1.99Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®