Extending the Utility of Federal Data Bases. Standards for Precision


Sections 2.1 and 2.2 discussed ways of focusing on the precision levels required for various analytic uses. Since most U.S. Government surveys cover a broad array of data items, it is clear that no single standard of precision is likely to satisfy all potential uses of the data and that some compromises are necessary. It is particularly difficult to create standards of precision for a group of unrelated surveys whose specific analyses are yet to be developed at some future time. Under these circumstances, it seems sensible to use the standards for "generic" prevalence rates that were established for the study of the feasibility of producing state data from the NHIS. However, we reiterate our caveat that the standards may not satisfy all requirements, and if some new and critically important needs for statistical information on the subpopulations arise, the standards should be reviewed.

The precision levels that were examined for NHIS state level generic estimates were described in Section 2.2. We repeat them below:

  • To achieve specific levels of precision for five "generic" prevalence rates: 0.01, 0.05, 0.10, 0.15, and 0.20.
  • Three levels of precision are specified for the statistics described above: coefficients of variation (CV) of 30 percent, 20 percent, and 10 percent.

In order to remove these prevalence rates from the abstract, we show some examples reported in recent U.S. Government sponsored surveys:

Percent of U.S. population age 15 and over with earnings under $10,000 in 1996 24.9%1
U.S. poverty rate in 1998 12.7%2
Percent of U.S. population without health insurance coverage during all of 1998 16.3%3
Percent of persons of Hispanic origin without health insurance in 1998 35.3%3
Cocaine use by adults employed full time 0.7%4
Cocaine use by adults employed part time 0.9%4
Cocaine use by unemployed adults 2.4%4
1Source: March 1997 CPS, U.S. Census Bureau Report P60, No. 206.
2Source: March 1999 CPS; U.S. Census Bureau Report P60, No. 207.
3Source: March 1999 CPS; U.S. Census Bureau Report P60, No. 208.
4Source: National Household Survey of Drug Abuse

It may be useful to convert the generic coefficients of variation to standard errors and confidence intervals for a clearer view of the effects of the sampling errors on the statistics. They are shown below.

CV and prevalence rate Standard error 66% confidence interval 95% confidence interval
30% CV      
.01 .003 .007-.013 .004-.016
.05 .015 .035-.065 .020-.080
.10 .030 .070-.130 .040-.160
.15 .045 .105-.195 .060-.240
.20 .060 .140-.260 .080-.320
20% CV      
.01 .002 .008-.012 .006-.014
.05 .010 .040-.060 .030-.070
.10 .020 .080-.120 .060-.140
.15 .030 .120-.180 .090-.210
.20 .040 .160-.240 .120-.280
10% CV      
.01 .001 .009-.011 .008-.012
.05 .005 .045-.055 .040-.060
.10 .010 .090-.110 .080-.120
.15 .015 .135-.165 .120-.180
.20 .020 .180-.220 .160-.240

Section 2.2 also notes the types of variables that are likely to have design effects beyond the average values used in this report. We have left them out of our discussion because we use a single average design effect (or, in a few cases, two) for each survey covered in this report.

Our examination assumes the prevalence rates are applied to the total of all persons in the subpopulation. In practice, analyses are frequently desired for subsets, e.g., adults or children, each sex separately, families rather than persons, persons below the poverty level, etc. Examining all possible uses of data would lead to such a wide variety of possibilities that no clear-cut decision could be made, and it seems sensible to restrict the alternatives. Basically, if subset analysis is considered of crucial importance for a survey, the sample size implied by each precision level can be thought of as applying to the subset, and the implications for the total sample for the survey can be calculated. For example, if a 30 percent CV requires a sample of 200 persons, then a sample of 400 persons is necessary if the same CV is desired for males and females separately.

Some examples of subsets, and associated prevalence rates are shown below.

Males 15 years and over: Percent with income under $10,000 17.2%1
Hispanic males 15 years and over: Percent with income under $10,000 21.4%1
Males, 15-24 years of age: Percent with income under $10,000 40.8%1
Hispanic low-income persons (under 125% of poverty): percent without health insurance 44.0%2
Hispanic children under 18 years of age: percent without health insurance 30.0%2
Persons 65 years and over: percent without health insurance 1.1%2
Hispanic males, 65 years and over: percent with income below 50 percent of poverty 4.7%3
1Source: U.S. Census Bureau Report P60 No. 206.
2Source: U.S. Census Bureau Report P60 No. 208.
3Source: U.S. Census Bureau Report P60 No. 207.