The Feasibility of Using Electronic Health Data for Research on Small Populations. The Need for Research on Small Populations


Research has found differences among segments of the population on nearly all aspects of health and health care. The ability to identify and document such differences is an essential starting point for improving people’s health. The four small populations that we selected illustrate a range of unanswered health and health care questions as well as the challenges in conducting research to answer these questions, both with existing federal data sources and potentially with EHR data. While small relative to the U.S. population, these populations have each reached a size where research on their health and health care needs has become both increasingly important and increasingly possible, particularly as new data sources are becoming available. Members of these groups are eager to be recognized and to better understand the particular characteristics and needs of their populations.

These populations were identified based on discussions with government officials at the Assistant Secretary for Planning and Evaluation (ASPE), Agency for Healthcare Research & Quality (AHRQ), and the Center for Disease Control and Prevention’s National Center for Health Statistics (NCHS), and the Health Resources and Services Administration (HRSA), who have all received requests for better information about populations that have been difficult to study in existing federal surveys. Here we provide a brief overview of the distinct characteristics, health and health care needs of our four example populations. More detail can be found in Part I of this report.

Asian subpopulations such as Filipinos and Vietnamese

Asian Americans are the fastest growing racial group,210 making up about 4.4 percent of the American population but including more than 50 different ethnicities and 100 languages.211 Language and cultural barriers to accessing health care are important concerns generally among immigrant populations, but their health and health care needs are poorly understood due to lack of disaggregated data about ethnic subgroups.212 But there is evidence that various ethnic subpopulations have distinct patterns of disease and health care use. For example, one study found the prevalence of diabetes was three times higher among Filipino men than among Japanese men.213 Other research has shown how Vietnamese women to have both higher cervical cancer rates—the highest among Asian-American women—but also low screening rates. 214

Small numbers relative to the total population, uneven geographic distribution, and language barriers combine to make it difficult to obtain adequate samples of Asian-American subgroups in national surveys. In claims data or health records, subpopulations may remain difficult to identify because ethnicity and language are not routinely or accurately collected. These factors, along with the time and cost of manual data abstraction, have been barriers for records-based research.

Lesbian, gay, bisexual, and transgender people

The health and health care needs of lesbian, gay, bisexual, and transgender (LGBT) people are not well documented, and even basic survey-based estimates of the size of these populations are inconsistent. However, there is evidence that experiencing stigma, discrimination, and violence are common among LGBT populations, and this has significant implications for this population’s health and access to care. For example, elevated rates of suicidal attempts, depression, and substance use have been reported among LGBT youth as well as for those in early/middle adulthood compared to their heterosexual counterparts. Elevated rates of HIV/AIDS among men, particularly young black men who have sex with men, has been a concern for many years. There is also evidence that lesbian and bisexual women use fewer preventive services than heterosexual women and have higher rates of obesity and breast cancer. The associated stigma may make LGBT individuals hesitant to seek care, or to withhold information from their provider when they do.215 Therefore, information needed to identify this population in medical records is seldom there. Some experts believe that LGBT people may be more willing to identify themselves in a written or online survey compared to a face-to-face encounter. At present, however, there is no well-validated way to reliably collect data on LGBT populations, and numbers vary depending on whether information is collected on behavior, identity, or relationships. In addition, small numbers relative to the whole population make it difficult to obtain adequate samples for basic analyses, much less if split by age or gender, although there is evidence the subgroups of LGBT populations have distinct health care needs.

While transgender people have much in common with LGB populations, they also experience a number of distinct challenges with their health and health care. Although we have included them with LGB populations for illustrative purposes, there are additional issues regarding research for transgender populations that we were unable to fully cover in this report.

Adolescents with autism spectrum disorders

Autism spectrum disorders (ASDs) are a group of developmental disabilities characterized by difficulty communicating and repetitive motions or other unusual behaviors, and range from mild to severe.216 ASDs are lifelong chronic conditions that often require significant medical and psychological care. Over 95 percent of children with autism also have co-occurring conditions such as attention deficit disorder, learning disability, or mental retardation.217 Children with autism are also more likely to experience depression, anxiety, and behavioral problems,218 often as a result of difficulty being understood or bullying.219 As a result, children with ASDs use much more health care services, therapy, counseling, and medication than children without ASDs.220, 221 The prevalence of prescription medications for children with ASD is high—with the most commonly prescribed drugs being psychotropic medications, antidepressants, stimulants, and antipsychotics.222

Most research on ASDs focuses on children, but the health care transition between adolescence and adulthood is a particularly vulnerable period for this population as they move from pediatric to adult care and from child to adult special services.223 However, transition planning for this population is not common.224 This transition has been difficult to study because most national health-related surveys do not have a longitudinal design, making it impossible to follow youth with ASDs over time. In addition, because the condition is difficult to diagnose and diagnostic criteria have evolved over time, there are concerns about the validity and reliability of case reported in parental surveys. There may be opportunities to use health records alone or in combination with other records (e.g., education, social service) to study people with ASDs over time, although the lack of biologic markers and shifting definitions of ASDs may continue to pose challenges in identification, even using clinical data.

Residents of rural areas

Rural communities are generally less densely populated and more geographically isolated than urban areas, often limiting economic opportunities. The out-migration of younger residents has left many of these communities with declining and generally older populations. In addition to the higher rates of chronic conditions associated with age, rural populations are more likely than urban residents to report fair to poor health status225 and to have higher rates of mortality, disability, and smoking and lower rates of physical activity.226 The rural residents of some parts of the country also face environmental health risks associated with agriculture, mining, and industrial pollution. Access to health care services is a serious concern as many rural communities lack the economic resources needed to support expensive medical services. Difficulty attracting and retaining clinicians further limits access to care. Telemedicine has the potential to help with some access problems, but Internet connectivity and adoption of HIT lag behind in many rural areas.

Research on rural populations has been by small numbers in some research activities and by a lack of consistency in defining rural populations. More than two dozen definitions are used for different purposes by federal agencies, with criteria ranging from population size/density to land-use to commuting distance. In addition, although granular geographic identifiers (such as county and zip code) are needed to examine rural communities, such variables about individuals are not included in public-use data sets because of concerns that those living in sparsely populated areas could be identified.

View full report


"rpt_ehealthdata.pdf" (pdf, 1.99Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®