The Feasibility of Using Electronic Health Data for Research on Small Populations. Limitations in Federal Survey Data


There are a number of strengths to primary survey data compared to other primary data sources (e.g., focus groups, case studies) and secondary data (e.g., administrative and claims data). Survey data allows the researcher more control over who is included (i.e., sample frame and sample), the kinds of information that is collected from them (e.g., data domains, elements or specific questions), and key aspects of data elements (e.g., standardization and quality) compared to administrative, claims, or other secondary data sources. Consequently, it is often easier to generalize to the nation or other large populations and to replicate survey research.

All research approaches and data sources have limitations, and that is true of survey research. Although many important research questions (e.g., about outcomes of treatment or the consequences of being uninsured) require longitudinal data, most surveys are designed to collect cross-sectional data at a point of time. The Medical Expenditure Panel Survey (MEPS) is a two-year panel and a rare example of a study that attempts to follow cohorts (of households) over time. Such efforts are few and expensive. There are also limitations regarding the kinds of data that can be collected via survey research. For health matters, for example, surveys most often are limited to collecting self-reports about individual’s overall health status, so the resulting data do not include the kinds of clinical information (e.g., about diagnoses, service and procedures, laboratory results, drugs, genetic information) needed for some kinds of studies. Selection bias, which results from survey respondents’ decisions about whether to participate or not, can lead to misleading data.16 Self-reported survey data have weaknesses resulting, for example, from limitations in knowledge or from recall bias. Finally, with the exception of highly specialized studies, surveys generally obtain data from too few people to break out separate results for small populations. As a result even valid inferences drawn about the population (or major segments thereof) based on well-designed survey samples may not apply to small populations such as we are considering in this report.

General problems with small populations do not necessarily stem from the absolute size of the population, but rather its size relative to the total population (or sampling frame) from which the survey sample is drawn. Sample sizes calculated to collect information on the general population of Americans often lack ability to accurately detect small populations. This problem only increases when wanting to study specific health conditions within these small populations. There are standard approaches to increasing the chances of including people from small populations, such as using a list of group members to specifically target or screening questions to increase representation of the groups. However, these strategies are not typically used in national surveys.

Standard “solutions” for getting adequate numbers for analysis from small populations include oversampling17 and combining data from multiple years. But oversampling subgroups may require the researcher to screen out large numbers of people who do not fit the category in order to obtain the sought-after number of those who do. This becomes more costly as the target group’s presence in the population being screened becomes smaller and as the number of needed subgroups (e.g., age, gender, or those using different languages) increases. The smaller a group’s presence in the population being screened, the more calls are needed to obtain the desired number of respondents. Combining data from multiple years becomes problematic if year-to-year changes are taking place within that population or if survey questions change. A third alternative, sampling from an organization that specializes in service to the population in question, raises questions of representativeness.

In general, the limitations of national surveys for studying small populations can be summarized as issues related to coverage of the target population and issues related to data collection.18 These issues as they relate to our four example populations are presented in Table I.2 and are discussed in greater detail later in this report.

View full report


"rpt_ehealthdata.pdf" (pdf, 1.99Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®