Extending the Utility of Federal Data Bases. Examples of Precision Requirements in Federally Sponsored Surveys


Common standards for precision do not exist for U.S. Government surveys. Each survey is viewed as a unique data system designed to meet specific needs. In most cases, the sample size is determined by agreement on the key analytic requirements of the survey weighed against available funding or a realistic budget, rather than to satisfy abstract notions of analytic goals. In a few cases, Government agencies have articulated the principal analytic and policy uses expected of the data and the sampling errors that would permit these analyses. Some examples follow:

  • Sample used in long form of U.S. Censuses. The 1990 long-form sample contained about 16 percent of the U.S. population and this proportion was repeated in Year 2000. Two factors led to the use of sampling and, in particular, to the choice of 16 percent. First, the existence of considerable nonsampling errors arising from coverage problems and reporting errors by respondents and interviewers indicated that, beyond a certain point, sampling would have only a minor effect on the overall quality of the data. Secondly, a crucial goal of the census is to provide data for small areas, such as tracts, small towns and villages, etc. A 16 percent sample was viewed as mostly satisfying that goal. However, income distributions are frequently used by states and the Federal Government for fund allocation among municipalities, and a 16 percent sample appeared to be inadequate for this purpose for very small municipalities. The sampling rate in these small places was increased, to a maximum of 50 percent.
  • Current Population Survey (CPS.) Although CPS is a general-purpose social and economic survey with emphasis on labor force characteristics, statistics on unemployment are viewed as particularly important, since movements in the monthly unemployment rate can have a major effect on economic policy. Two aspects of unemployment data determined the monthly sample size for the CPS. First, it was considered necessary to detect a month-to-month change in the national unemployment rate of 0.2 percent as being statistically significant at the 95 percent confidence level. Secondly, it was desirable to have reasonably precise data (level of precision not specifically defined) on unemployment rates for blacks and Hispanics, including separate data for minority youths.
  • National Health and Nutrition Survey (NHANES.) Since health characteristics are strongly associated with age and sex, the major analyses of the data involve age-sex breakdowns. In addition, as is the case in many other national surveys, separate data are considered necessary for the dominant minority groups; blacks and Mexican-Americans in the case of NHANES. A set of 52 age-sex-race/ethnicity classes was defined and a major goal of the sample design was to provide about the same level of precision for each of these 52 classes. The precision specifications for NHANES III were:
    • A prevalence statistic of 10 percent should have a relative standard error (RSE) less than 30 percent; and
    • Differences of at least 10 percent in health or nutrition statistics between any two classes should be detected with a type I error of no more than 0.05 and a type II error of no more than 0.10.

    A sample size of 560 was determined to satisfy these requirements in most classes. Many of the Mexican-American age-sex classes had higher design effects than other classes and needed somewhat larger samples.

  • National Health Interview Survey (NHIS.) Precision Specifications in Considering the Feasibility of Producing Data for States. The National Center for Health Statistics (NHIS) contracted with Westat to examine the feasibility of producing state data from the NHIS, and to propose methods of enhancing the feasibility for states that currently do not satisfy the precision standards. NCHS specified several alternative precision requirements for this project to give the agency information on what could be accomplished at various budgeting levels. The requirements were:

    To achieve specified levels of precision (described below) for four crucial statistics:

    • Percent of the total population without health insurance;
    • Percent of persons under 19-years of age without health insurance;
    • Percent of low-income children under 19-years without health insurance; and
    • Percent of persons 18-years and over who are smokers.
      • To achieve the same level of precision for five "generic" prevalence rates: 0.01, 0.05, 0.10, 0.15, and 0.20, for statistics that are likely to have design effects of 1.0, 1.5, 3.0, and 6.0.
      • Three levels of precision are specified for the statistics described above: coefficients of variation (CV) of 30 percent, 20 percent, and 10 percent.

The generic prevalence rates are a useful model for an examination of the feasibility of producing minority subgroup statistics, and we will focus on these specifications later in this report. However, as stated earlier, no single set of specifications is likely to meet all conceivable analytic needs.