The Feasibility of Using Electronic Health Data for Research on Small Populations. Data collection problems


Even if members of small populations are included in the sample, challenges remain in collecting information through a survey questionnaire. These challenges include:

Unit Nonresponse

Certain populations may be less likely to participate in a survey even if invited. For instance, functional limitations may prevent individuals with autism from participating, and proxy respondents are typically used. Even greater challenges occur in getting individuals to repeatedly respond to a survey as is needed to study health issues over time, such as through transition into adulthood.21 In addition, most surveys are conducted in English and perhaps Spanish, making it difficult for some non-English speakers in Asian subpopulations to participate.22 Some federal surveys, such as National Health and Nutrition Examination Survey, National Health Information Survey, and Medical Expenditure Survey address this issue by having translation options available for Asian subpopulations, or allow family members to answer for respondents.

Item Nonresponse

Some members of small populations may be unwilling to answer certain questions around sensitive topics (e.g., citizenship or immigration status, risky behaviors, cultural norms and mores, where one works and lives) due to privacy and other concerns. There have been efforts to address this challenge; for example, the National Survey of Family Growth has adopted the use of audio computer-assisted self-interviewing technology, which allows for respondents to listen to a set of prerecorded questions through a computer and input their answers to collect sensitive information, such as drug use. In some cases, sensitive information may be needed to identify the subpopulation in the survey data or to answer the pressing health and health care questions about it. In terms of using survey data to study health issues, there may also be health conditions or behaviors that individuals are less willing or able to disclose in a survey. Which survey method is used may make a difference, with some people more willing to make sensitive disclosures online or in written surveys rather than in a telephone survey, particularly if interviewer hesitancy or other non-verbal communication creates discomfort.23


Even when individuals are willing to answer each question on a survey, it is often difficult to design questions that collect the desired information. For instance, the variety of definitions used to understand each of the four small populations discussed in this report make it difficult to design questions that will identify them.24 Rare characteristics or conditions may not be included as response options, or may be included in a larger category (such as “Asian” or “conditions on the autism spectrum”), making more granular analysis of sub-categories impossible. There is also lack of alignment in how key questions are asked in different national surveys or over time, affecting comparability and ability to combine these data sources. In addition, there are cognitive limitations in people’s ability to understand, remember and self-report much of the information needed to study health issues, such as diagnoses25 and other detailed clinical information, as well as what services were used and when. There are a number of federal efforts to address these limitations in national survey data. As discussed later, Section 4302 of the Affordable Care Act (ACA) required the adoption of data collection standards on race, ethnicity, sex, primary language, and disability status in national population health surveys sponsored by HHS. Under the auspices of the Department of Health and Human Services Data Council, the data standards are being implemented in the major surveys.

To illustrate the need for research on small populations and the challenges that such populations pose for research, the following section summarizes the health care needs of these populations and discuss the limitations of the sources of data commonly used by researchers. We do so to illustrate the need for research; a comprehensive examination of the health and health care needs of these populations is beyond the scope of this report. It should also be noted that there is great heterogeneity—for example, by age, gender, or place of residence—within the small populations we have selected, as there will be in any population. Small numbers is a problem that confronts many research efforts that would explore variations within small populations, as well as in attempts to make comparisons with other, often larger populations.

In a Part II of this report, we consider the potential usefulness of electronic health information collected by health care providers as a source of data about these four groupings. The intent of this part of the report is to describe the challenges of doing research on small subpopulations and consider the extent to which past limitations might be overcome by the growing use of electronic technologies within the health care system, even if the organizations that have successfully implemented such technologies are not typical.

View full report


"rpt_ehealthdata.pdf" (pdf, 1.99Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®