Implementing nonresponse adjustment procedures requires the specification of appropriate weighting classes or cells. Survey responses generally are correlated with certain characteristics of the sample units, and it would be desirable to form classes based on these characteristics. Often, little is known about the nonrespondents. Relevant information about each sampled unit sometimes can be obtained through data retrieval efforts to collect limited data about the nonrespondents or by interviewer observation (if applicable). The availability of this information would enhance the effectiveness of the nonresponse adjustment.
Data used to form classes for nonresponse adjustments must be available for both respondents and nonrespondents. In state low-income surveys, the administrative files used to select the sample are good sources of information for forming weighting classes. In a recent survey, we contacted a number of states to inquire about the availability and the quality of their administrative data, including the following variables:
- Marital status
- Number of children
- Earned income
- Welfare income
- Housing subsidy
- Length of time on welfare
- Metropolitan/nonmetropolitan status
- County code
- Zip code
Thirteen states completed our questionnaire. All states reported having data on age, gender, race/ethnicity, number of children, and length of time on welfare. Most states also have data on earned income, welfare income, employment, county code, zip code, and marital status. About 50 to 60 percent of states reported having data on education, housing subsidies, metropolitan/nonmetropolitan status, and urbanicity. The 13 states that responded to the questionnaire on auxiliary data also indicated their assessments of the quality of the administrative data that their state maintains. We observed that the quality of data on demographic variables was quite high, with less than 1 percent missing values. For the socioeconomic variables, the only two variables with high-quality data are ''welfare income" and ''length of time on welfare," where length of time on welfare is measured for the most recent episode. Data on employment and earned income, if applicable, were obtained by matching with quarterly wage records. The only geographic variables of high quality are county and zip codes. We encourage state welfare program administrators to look for other potential data sources that could be used as auxiliary variables for nonresponse and/or noncoverage adjustments, such as wages and employment data sources. The above variables are usually good candidates for use in nonresponse adjustment. However, missing data on items used for nonresponse adjustment can present problems for postsurvey adjustments. If a substantial amount of data are missing for an item on the sampling frame, this variable is probably not appropriate for the purpose of nonresponse adjustments.
The variables used to form weighting classes should be effective in distinguishing between subgroups with different response rates. They are most useful when survey responses are roughly similar for respondents and nonrespondents within a class. If this implicit assumption holds, the estimates are effectively unbiased. In establishing the nonresponse adjustment classes, the following should be kept in mind:
- The variables used in nonresponse adjustment should be available for both respondents and nonrespondents;
- Response rates should be different among the nonresponse adjustment classes;
- Survey responses are expected to be different among the classes; and
- The adjustment classes should respect a balance between bias and variance (refer to the section entitled ''Balancing Bias and Variance When Adjusting for Nonresponse" for a discussion of balancing bias and variance when creating adjusted sampling weights).
As mentioned earlier, knowledge of the likely behavior of persons in various demographic and socioeconomic classes can be used to construct weighting classes. A preliminary analysis of response rates in these classes can refine the classification further.
Returning to the FIS example provided earlier, assume that nonresponse evaluation research has identified the gender and race (white/nonwhite) of the head of family as the best predictors of nonresponse. Then, the sample is divided into four classes, as shown in Table 5-4. Note that mean income and the nonresponse rate are both quite variable across the four classes. This suggests that the adjustments have the potential to reduce the nonresponse bias.
|Adjustment Class||Head of Family's
Gender and Race
Mean Income ($)
|NOTES: *Family Income Survey(FIS)|
More sophisticated methods also are available. We discuss two commonly used procedures (referred to as modeling response propensity) for defining weighting classes using data on auxiliary variables. The first method involves classification or segmentation based on a categorical search algorithm. The second method is based on logistic regression modeling. Software is available to perform the computations required for both procedures.
The first class of methods divides a population into two or more distinct groups based on categories of the ''best" predictor of a dependent variable. The dependent variable is a categorical variable with two categories: respondents and nonrespondents. The predictor variable with the highest significance level is used to split the sample into groups. It then splits each of these groups into smaller subgroups based on other predictor variables. This splitting process continues until no more statistically significant predictors can be found, or until some other stopping rule is met (e.g., there are too few observations for further splitting). The result is a tree-like structure that suggests which predictor variable may be important.(5) It is a highly efficient statistical technique for segregation, or tree growing, with many different versions currently available, as described in Breiman et al., (1993).
The second approach models the response status of the sampled units using predictor variables that are known for both respondents and nonrespondents from the sampling frame. Most commonly, the prediction approach is based on a logistic or probit regression model effectively using auxiliary variables, such as demographic, socioeconomic, and geographic variables, to predict the probability of response. For more information on logistic response propensity modeling, refer to Little and Rubin (1987), Brick and Kalton (1996), and Iannacchione et al. (1991).
"01.pdf" (pdf, 472.92Kb)
"02.pdf" (pdf, 395.41Kb)
"03.pdf" (pdf, 379.04Kb)
"04.pdf" (pdf, 381.73Kb)
"05.pdf" (pdf, 393.7Kb)
"06.pdf" (pdf, 415.3Kb)
"07.pdf" (pdf, 375.49Kb)
"08.pdf" (pdf, 475.21Kb)
"09.pdf" (pdf, 425.17Kb)
"10.pdf" (pdf, 424.33Kb)
"11.pdf" (pdf, 392.39Kb)
"12.pdf" (pdf, 386.39Kb)
"13.pdf" (pdf, 449.86Kb)