Sample surveys often are designed to draw inferences about finite populations by measuring a subset of the population. The classical inferential capabilities of the survey rest on probability sampling from a frame covering all members of the population. A probability sample assigns known, nonzero chances of selection to every member of the population. Typically, large amounts of data from each member of the population are collected in the survey. From these variables, hundreds or thousands of different statistics might be computed, each of which is of interest to the researcher only if it describes well the corresponding population attribute. Some of these statistics describe the population from which the sample was drawn; others stem from using the data to test causal hypotheses about processes measured by the survey variables (e.g., how length of time receiving welfare payments affects salary levels of subsequent employment).

One example statistic is the sample mean as an estimator of the population mean. This is best described by using some statistical notation in order to be exact in our meaning. Let one question in the survey be called the question, *Y,* and the answer to that question for a sample member, say the _{i}th member of the population, be designated by *Y _{i}*. Then we can describe the population, mean by

(1)

where *N* is the number of units in the target population. The estimator of the population mean is often

(2)

where *r* is the number of respondents in the sample and *w _{i}* is the reciprocal of the probability of selection of the

_{i}th respondent. (For readers accustomed to equal probability samples, as in a simple random sample, the

*w*is the same for all cases in the sample and the computation above is equivalent to .)

_{i}One problem with the sample mean as calculated here is that is does not contain any information from the nonrespondents in the sample. However, all the desirable inferential properties of probability sample statistics apply to the statistics computed on the *entire* sample. Lets assume that in addition to the *r* respondents to the survey, there are *m* (for missing) nonrespondents. Then the total sample size is *n = r + m*. In the computation mentioned we miss information on the *m* missing cases.

How does this affect our estimation of the population mean? Lets make first a simplifying assumption. Assume that everyone in the target population is either, permanently and forevermore, a respondent or a nonrespondent. Let the entire target population, thereby, be defined as *N = R + M*, where the capital letters denote numbers in the total population.

Assume that we are unaware at the time of sample selection about which stratum each person occupies. Then in drawing our sample of size *n*, we will likely select some respondents and some nonrespondents. They total *n* in all cases, but the actual number of respondents and nonrespondents in any one sample will vary. We know that in expectation that the fraction of *sample* cases that are respondents should be equal to the fraction of *population* cases that lie in the respondent stratum, but there will be sampling variability about that number. That is, *E(r) = fR*, where *f* is the sampling fraction used to draw the sample from the population. Similarly, *E(m) = fM*.

For each possible sample we could draw, given the sample design, we could express a difference between the full sample mean, *n*, and the respondent mean, in the following way:

(3)

which, with a little manipulation, becomes

(4)

RESPONDENT MEAN - TOTAL SAMPLE MEAN = (NONRESPONSE RATE) * (DIFFERENCE BETWEEN RESPONDENT AND NONRESPONDENT MEANS)

This shows that the deviation of the respondent mean from the full sample mean is a function of the nonresponse rate (*m/n*) and the difference between the respondent and nonrespondent means.

Under this simple expression, what is the expected value of the respondent mean over all samples that could be drawn given the same sample design? The answer to this question determines the nature of the* bias* in the respondent mean, where bias is taken to mean the difference between the expected value (over all possible samples given a specific design) of a statistic and the statistic computed on the target population. That is, in cases of equal probability samples of fixed size, the bias of the respondent mean is approximately

(5)

BIAS(RESPONDENT MEAN) = (NONRESPONSE RATE IN POPULATION) * (DIFFERENCE IN RESPONDENT AND NONRESPONDENT POPULATION MEANS)

where the capital letters denote the population equivalents to the sample values. This shows that the larger the stratum of nonrespondents, the higher the bias of the respondent mean, other things being equal. Similarly, the more distinctive the nonrespondents are from the respondents, the larger the bias of the respondent mean.

These two quantities, the nonresponse rate and the differences between respondents and nonrespondents on the variables of interest, are key issues to surveys of the welfare population.

Figures 1-1a through 1-1d show four alternative frequency distributions for respondents and nonrespondents on a hypothetical variable, *y*, measured on all cases in some target population. The area under the curves is proportional to the size of the two groups, respondents and nonrespondents. These four figures correspond to the four rows in Table 1-1 that show response rates, means of respondents and nonrespondents, bias, and percentage bias for each of the four cases.

Response Rate | Difference | Response Rate Percentage | Respondent Mean | Nonrespondent Mean | Total Sample Mean | Bias | Bias Percentage | Required Sample Size of Nonrespondents |
---|---|---|---|---|---|---|---|---|

High | Small | 95 | $201 | $228 | $202 | $1.35 | -0.7 | 20,408 |

High | Large | 95 | $201 | $501 | $216 | $15.00 | -6.9 | 210 |

Low | Small | 60 | $201 | $228 | $212 | $10.80 | -5.1 | 304 |

Low | Large | 60 | $201 | $501 | $321 | $120.00 | -37.4 | 7 |

The first case reflects a high response rate survey and one in which the nonrespondents have a distribution of *y* values quite similar to that of the respondents. This is the lowest bias case; both factors in the nonresponse bias are small. For example, assume the response rate is 95 percent, the respondent mean for reported expenditures on clothing for a quarter is $201.00, and the mean for nonrespondents is $228.00. Then the nonresponse error is .05($201.00 - $228.00) = -$1.35.

FIGURE 1-1a. High response rate, nonrespondents similar to respondents.

SOURCE: Groves and Couper (1998).

NOTE: y = outcome variable of interest.

The second case, like the first, is a low nonresponse survey, but now the nonrespondents tend to have much higher y values than the respondents. This means that the difference term, (), is a large negative number, meaning the respondent mean underestimates the full population mean. However, the size of the bias is small because of the low nonresponse rate. Using the same example as above, with a nonrespondent mean now of $501.00, the bias is .05($201.00 - $501.00) = -$15.00.

FIGURE 1-1b. High response rate, nonrespondents different from respondents.

SOURCE: Groves and Couper (1998).

NOTE: y = outcome variable of interest.

The third case shows a very high nonresponse rate (the area under the respondent distribution is about 50 percent greater than that under the nonrespondent a nonresponse rate of 40 percent). However, as in the first graph, the values on *y* of the nonrespondents are similar to those of the respondents. Hence, the respondent mean again has low bias due to nonresponse. With the same example as mentioned earlier, the bias is .40($201.00 - $228.00) = [-$10.80].

FIGURE 1-1c. Low response rate, nonrespondents similar to respondents.

SOURCE: Groves and Couper (1998).

NOTE: y = outcome variable of interest.

The fourth case is the most perverse, exhibiting a large group of nonrespondents who have much higher values in general on *y* than the respondents. In this case, both m/n is large (judging by the area under the nonrespondent curve) and () is large in absolute terms. This is the case of large nonresponse bias. Using the previous example, the bias is .40($201.00 - $501.00) = -$120.00, a relative bias of 37 percent compared to the total sample mean!

FIGURE 1-1d. Low response rate, nonrespondents different from respondents.

SOURCE: Groves and Couper (1998).

NOTE: y = outcome variable of interest.

These four very different situations also have implications for studies of nonrespondents. Lets imagine we wish to mount a special study of nonrespondents in order to test whether the respondent mean is biased. The last column of Table 1-1 shows the sample size of nonrespondents required to obtain the same stability for a bias ratio estimate (assuming simple random sampling and the desire to estimate a binomial mean statistic with a population value of .50). The table shows that such a nonresponse study can be quite small (n = 7) and still be useful to detect the presence of nonresponse bias in a low-response-rate survey with large differences between respondents and nonrespondents (the fourth row of the table). However, the required sample size to obtain the same precision for such a nonresponse bias test in the high-response-rate case is very large (n= 20,408, in the first row). Unfortunately, prior to a study being fielded, it is not possible to have much information on the size of the likely nonresponse bias.

#### View full report

"01.pdf" (pdf, 472.92Kb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®

#### View full report

"02.pdf" (pdf, 395.41Kb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®

#### View full report

"03.pdf" (pdf, 379.04Kb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®

#### View full report

"04.pdf" (pdf, 381.73Kb)

#### View full report

"05.pdf" (pdf, 393.7Kb)

#### View full report

"06.pdf" (pdf, 415.3Kb)

#### View full report

"07.pdf" (pdf, 375.49Kb)

#### View full report

"08.pdf" (pdf, 475.21Kb)

#### View full report

"09.pdf" (pdf, 425.17Kb)

#### View full report

"10.pdf" (pdf, 424.33Kb)

#### View full report

"11.pdf" (pdf, 392.39Kb)

#### View full report

"12.pdf" (pdf, 386.39Kb)

#### View full report

"13.pdf" (pdf, 449.86Kb)

#### View full report

"14.pdf" (pdf, 396.87Kb)