The Feasibility of Using Electronic Health Data for Research on Small Populations. Data issues regarding rural health


Rural areas are covered in federal surveys by the U.S. Census Bureau, the Appalachian Regional Commission, the Environmental Protection Agency, the Department of Agriculture, the Centers for Disease Control and Prevention, the Agency for Healthcare Research and Quality, the Health Resources and Services Administration, and the Substance Abuse and Mental Health Services Administration.194 Larger surveys such as the Current Population Survey, American Community Survey, and National Health Interview Survey have census tract or county identifiers that allow for identification of rural populations. However, smaller surveys such as the National Health and Nutrition Examination Survey or the National Survey of Family Growth allow for estimates only at the national level (see Table I.3).

But the relatively small size of the rural population when combined with its diversity creates a distinctive problem. Rural areas differ from each other in many ways, including in their racial/ethnic and socioeconomic composition and their proximity to urban areas. It may be important to know whether the rural respondents to a survey are from a Texas border county or Litchfield County, Connecticut. But adding a geographic identifier such as county or zip code to a data set raises concerns that this information could be combined with other collected data to make possible the identification of individuals from whom data were collected. The data security practices that agencies have developed to forestall this possibility make rural research much more difficult and expensive. Some federal agencies restrict researcher access to data needed to study some populations.

For example, the public use files from some federal surveys do not include certain information such as zip codes or date or birth that might be used to identify individual data subjects, because of statutory requirements to protect personally identifiable information. The excluded information may be needed to study certain populations. Researchers can gain access to the excluded information only by going to the designated data use center for the agency that collected the data (e.g., the Agency for Healthcare Research and Quality or the National Center for Health Statistics), paying a user fee, and analyzing the data on the agency’s own computers. There are also restrictions, designed to protect individual privacy, on what researchers can take with them when they leave the agency’s offices. Such restrictions constitute a significant logistical and financial barrier to research on small subpopulations using data from large federal surveys. In an effort to address this issue, NCHS has tried approaches such as providing remote access options for researchers to analyze restricted data.195

A second problem for rural research is that minority populations, particularly those facing language barriers, are under-represented in many surveys of rural areas. For example, there has been an influx of Southeast Asian refugees in meatpacking communities in Iowa and some other states, filling jobs once occupied by Mexican and Central American workers who departed after federal immigration law enforcement increased. These towns are struggling to provide language services for these new refugees and often have difficulty identifying the languages that are being spoken.196 These challenges also exist for data collection, creating gaps in information on the health and health care needs of rural racial and ethnic minorities.

Lack of consensus and consistency on how to define “rural” makes identification of rural populations within federal data difficult, even where geographic identifiers are available. There does seem to be agreement that no one definition can suffice for all instances and that the definition used should align with the goals and needs at hand.197, 198 More than two dozen rural definitions are currently used by federal agencies, each identifying different populations as rural (see Table I.5 for a list of the most commonly used taxonomies).199 While geographic isolation and population size/density are common elements among these definitions, there is variation in whether administrative (such as municipalities), land-use (such as population size), or economic concepts (such as commuting areas) are used to define the boundaries or rural areas. County-level, economic definitions (such as nonmetropolitan areas) are most commonly used in rural research because of the availability of county-level data.200

