There are a number of strengths to primary survey data compared to other primary data sources (e.g., focus groups, case studies) and secondary data (e.g., administrative and claims data). Survey data allows the researcher more control over who is included (i.e., sample frame and sample), the kinds of information that is collected from them (e.g., data domains, elements or specific questions), and key aspects of data elements (e.g., standardization and quality) compared to administrative, claims, or other secondary data sources. Consequently, it is often easier to generalize to the nation or other large populations and to replicate survey research.
All research approaches and data sources have limitations, and that is true of survey research. Although many important research questions (e.g., about outcomes of treatment or the consequences of being uninsured) require longitudinal data, most surveys are designed to collect cross-sectional data at a point of time. The Medical Expenditure Panel Survey (MEPS) is a two-year panel and a rare example of a study that attempts to follow cohorts (of households) over time. Such efforts are few and expensive. There are also limitations regarding the kinds of data that can be collected via survey research. For health matters, for example, surveys most often are limited to collecting self-reports about individual’s overall health status, so the resulting data do not include the kinds of clinical information (e.g., about diagnoses, service and procedures, laboratory results, drugs, genetic information) needed for some kinds of studies. Selection bias, which results from survey respondents’ decisions about whether to participate or not, can lead to misleading data.16 Self-reported survey data have weaknesses resulting, for example, from limitations in knowledge or from recall bias. Finally, with the exception of highly specialized studies, surveys generally obtain data from too few people to break out separate results for small populations. As a result even valid inferences drawn about the population (or major segments thereof) based on well-designed survey samples may not apply to small populations such as we are considering in this report.
General problems with small populations do not necessarily stem from the absolute size of the population, but rather its size relative to the total population (or sampling frame) from which the survey sample is drawn. Sample sizes calculated to collect information on the general population of Americans often lack ability to accurately detect small populations. This problem only increases when wanting to study specific health conditions within these small populations. There are standard approaches to increasing the chances of including people from small populations, such as using a list of group members to specifically target or screening questions to increase representation of the groups. However, these strategies are not typically used in national surveys.
Standard “solutions” for getting adequate numbers for analysis from small populations include oversampling17 and combining data from multiple years. But oversampling subgroups may require the researcher to screen out large numbers of people who do not fit the category in order to obtain the sought-after number of those who do. This becomes more costly as the target group’s presence in the population being screened becomes smaller and as the number of needed subgroups (e.g., age, gender, or those using different languages) increases. The smaller a group’s presence in the population being screened, the more calls are needed to obtain the desired number of respondents. Combining data from multiple years becomes problematic if year-to-year changes are taking place within that population or if survey questions change. A third alternative, sampling from an organization that specializes in service to the population in question, raises questions of representativeness.
In general, the limitations of national surveys for studying small populations can be summarized as issues related to coverage of the target population and issues related to data collection.18 These issues as they relate to our four example populations are presented in Table I.2 and are discussed in greater detail later in this report.
Surveys typically use a list of landline telephone numbers and/or addresses as the frame from which the sample will be drawn. Certain population segments (e.g. migrant workers) may be underrepresented if their members disproportionately lack a landline phone or stable/documented address. (The increased use of cellular phones has presented general challenges and issues for survey research.)19 Federal household surveys typically select their samples by first selecting a sample of geographic areas, then households within those areas, and finally individuals within those households. Target populations that are geographically segregated, such as remote rural communities or neighborhoods where an Asian subpopulation may be concentrated,20 they may be underrepresented in the sample if their geographic area is not selected.
Data collection problems
Even if members of small populations are included in the sample, challenges remain in collecting information through a survey questionnaire. These challenges include:
Certain populations may be less likely to participate in a survey even if invited. For instance, functional limitations may prevent individuals with autism from participating, and proxy respondents are typically used. Even greater challenges occur in getting individuals to repeatedly respond to a survey as is needed to study health issues over time, such as through transition into adulthood.21 In addition, most surveys are conducted in English and perhaps Spanish, making it difficult for some non-English speakers in Asian subpopulations to participate.22 Some federal surveys, such as National Health and Nutrition Examination Survey, National Health Information Survey, and Medical Expenditure Survey address this issue by having translation options available for Asian subpopulations, or allow family members to answer for respondents.
Some members of small populations may be unwilling to answer certain questions around sensitive topics (e.g., citizenship or immigration status, risky behaviors, cultural norms and mores, where one works and lives) due to privacy and other concerns. There have been efforts to address this challenge; for example, the National Survey of Family Growth has adopted the use of audio computer-assisted self-interviewing technology, which allows for respondents to listen to a set of prerecorded questions through a computer and input their answers to collect sensitive information, such as drug use. In some cases, sensitive information may be needed to identify the subpopulation in the survey data or to answer the pressing health and health care questions about it. In terms of using survey data to study health issues, there may also be health conditions or behaviors that individuals are less willing or able to disclose in a survey. Which survey method is used may make a difference, with some people more willing to make sensitive disclosures online or in written surveys rather than in a telephone survey, particularly if interviewer hesitancy or other non-verbal communication creates discomfort.23
Even when individuals are willing to answer each question on a survey, it is often difficult to design questions that collect the desired information. For instance, the variety of definitions used to understand each of the four small populations discussed in this report make it difficult to design questions that will identify them.24 Rare characteristics or conditions may not be included as response options, or may be included in a larger category (such as “Asian” or “conditions on the autism spectrum”), making more granular analysis of sub-categories impossible. There is also lack of alignment in how key questions are asked in different national surveys or over time, affecting comparability and ability to combine these data sources. In addition, there are cognitive limitations in people’s ability to understand, remember and self-report much of the information needed to study health issues, such as diagnoses25 and other detailed clinical information, as well as what services were used and when. There are a number of federal efforts to address these limitations in national survey data. As discussed later, Section 4302 of the Affordable Care Act (ACA) required the adoption of data collection standards on race, ethnicity, sex, primary language, and disability status in national population health surveys sponsored by HHS. Under the auspices of the Department of Health and Human Services Data Council, the data standards are being implemented in the major surveys.
To illustrate the need for research on small populations and the challenges that such populations pose for research, the following section summarizes the health care needs of these populations and discuss the limitations of the sources of data commonly used by researchers. We do so to illustrate the need for research; a comprehensive examination of the health and health care needs of these populations is beyond the scope of this report. It should also be noted that there is great heterogeneity—for example, by age, gender, or place of residence—within the small populations we have selected, as there will be in any population. Small numbers is a problem that confronts many research efforts that would explore variations within small populations, as well as in attempts to make comparisons with other, often larger populations.
In a Part II of this report, we consider the potential usefulness of electronic health information collected by health care providers as a source of data about these four groupings. The intent of this part of the report is to describe the challenges of doing research on small subpopulations and consider the extent to which past limitations might be overcome by the growing use of electronic technologies within the health care system, even if the organizations that have successfully implemented such technologies are not typical.