Extending the Utility of Federal Data Bases. Design Effects


The simple formula for the variance of a sample mean given in most elementary statistics textbooks is (sigma sub x bar) squared = (1-f) (sigma squared)/n).  For a proportion or a prevalence rate, this formula is equivalent to (sigma sub p) squared = (1-f)[(p(1-p))/n].

In these formulas, f is the sampling rate, n is the sample size, sigma squared is the population variance of the characteristic being estimated, and p is the proportion that is estimated. These formulas apply to the simplest type of situation, that is, use of simple random sampling with all members of the population sampled at the same rate.

In practice, it is rare for population surveys to use simple random sampling. Where interviewing is done on a face-to-face basis (as distinct from telephone or mail data collection) some form of clustering is almost always used to reduce the cost of interview travel. The clustering frequently results from several stages of sample selection, e.g., counties, groups of neighboring households, and members of sample households. Even when telephone or mail is used (e.g., planned as the dominant data collection methods for the census long forms and the ACS); persons within the sample households constitute clusters. The extent to which characteristics of persons within these levels of clustering tend to be correlated influences the size of the sampling variances. In most cases, clustering increases the sampling variances above what would result from a simple random sampling with the same sample size. Variances will also be increased if the sampling rates vary among members of the population. This can come about if some groups are oversampled or undersampled. It can also result from a fairly common practice of selecting a household sample, then choosing one member at random for a more detailed interview. Persons in large households then have smaller probabilities of selection than persons in smaller households. In attempts to compensate for such features of the sample design, statisticians apply devices that tend to reduce variances, principally stratification and sophisticated weighting methods. However, the features that tend to increase variances usually dominate.

The design effect is a measure of the extent to which the interactions of all such features affect the sampling variances. It is defined as the factor by which the variance of an estimate is changed through departure from simple random sampling. It is generally expressed symbolically by d, so that the variance of a mean becomes (sigma sub x bar) squared = d(1-f)((sigma squared)/n)).  As indicated above, d is mostly, though not necessarily, greater than 1. The value of n/d is frequently referred to as the effective sample size, since replacing n by n/d in the expression for sigma sub x bar permits one to use the formulas for simple random sampling.

Design effects differ greatly from one survey to another, since there are important differences among sample designs. They can also vary among different items measured within a survey, and sometimes among specific population groups. A few examples of such variations are described below:

  • The CPS provides estimates of the distribution of employees by major industries, as well as measures of the total number of employed persons. For most major industrial groups such as retail trade, manufacturing, etc. there is a rather modest effect of clustering. However, the design effect for agricultural employment is quite large due to the fact that there is a heavy concentration of farming activities in some counties, and practically none in other counties. Furthermore, within counties with extensive agricultural operations, persons living in clusters of households in open country will tend to work in agricultural operations whereas those in towns are more likely to work in support services. The design effect for agricultural employment is therefore likely to be much larger than for most other industry groups. (The CPS employs a special procedure in weighting, referred to as first-stage weighting, to reduce the design effect for rural and farm statistics, but they are still greater than for most other items.)
  • The NHIS oversamples blacks and Hispanics in order to improve the reliability of statistics on these two minorities. However, in order to keep the cost of the oversampling within reasonable limits, the oversampling is restricted to the blacks and Hispanics who live in concentrated minority areas. The variation in sampling rates between minorities who live in concentrated areas and those in more integrated areas contributes to the design effects for all statistics on these minorities, as well as increasing the design effects for data on the total population. Clustering, particularly of sample persons within households, also adds to the design effect for some items but not for others. For example, in most households, either all household members have health insurance or none are insured. Obtaining this information for all household members provides only slightly better reliability than if only one person in each household was in the sample. The design effect for this statistic is thus very high. However, estimates on the incidence of health insurance for children or women do not increase the design effect to the same degree because the number of adult women or children in a given household is generally much smaller than the total household size. Similarly, estimates of persons with specific chronic diseases or those with recent hospitalization episodes will have fairly low design effects since there is hardly any clustering. Design effects among items and subpopulations in NHIS range from 1 to 6 so that averages of design effects can only be considered a simplified generalization.
  • Both NHANES III and current NHANES oversample Mexican-Americans and blacks in order to permit separate analyses of these two race/ethnic groups. The goals of both NHANES III and current NHANES were to produce reasonably reliable statistics for a set of 52 age-sex-race/ethnicity groups; 14 for both Mexican-Americans and blacks, and 24 for whites and all other persons. This implied starting with a sample designed to provide approximately equal size samples in the 52 groups. Superimposed on this was an oversampling of Mexican-Americans in geographic areas containing high concentrations of Mexican-Americans. The combination of these two features of the sample resulted in sampling rates for Mexican-Americans varying in range from 7.5 to 1. Asian and Pacific Islanders were sampled at the same rate as white persons and their sampling rates varied in a range of 20 to 1. The range among all age-sex-race/ethnicity groups was over 120 to 1. This diversity in sampling rates contributed significantly to the design effects for NHANES and the effective sample size is much lower than the nominal sample size, which was already very small for the APIs.

    In spite of the diversity in sampling rates, the NHANES sample sizes provide data with fairly good precision for Mexican-Americans in each of the 14 age-sex domains designated by NCHS for separate analysis. On the other hand, the sample size for even the total API population and for American Indian or Alaska Native is quite low, and it was trivial for individual age-sex groups of these subpopulations.

As the illustrations above indicate, most surveys are subject to a wide array of design effects. If there are a few key statistics in a survey whose importance dominates the analyses and uses of the data (as in the case of unemployment for CPS), then it is useful to concentrate on these statistics in assessing the reliability of the survey estimates. Otherwise, it is sensible to use an average design effect, about midway between the upper and lower levels that are likely to occur. We will follow this latter practice in assessing the ability of the various data sets to meet the needs for data for the different subpopulations of Hispanic and API populations and for American Indian or Alaska Natives.