When considering whether to omit observations from the sample because of incomplete or clearly incorrect administrative and survey data, it is important to distinguish (1) background information, and (2) outcomes data.
Missing or Invalid Background Information. Background information may be omitted for certain cases because of omitted or incorrect values in administrative records or because of survey nonresponse, item nonresponse, or invalid responses to baseline surveys.
The presence of certain background information is essential for an observation to be included in the analysis sample. Welfare reform evaluations generally distinguish impacts for experimental and control cases in recipient and applicant samples, so the presence of an original experimental/control status variable and an applicant/recipient variable is essential for every observation. To construct certain outcome variables (such as employment and earnings) a valid SSN is usually required for matching with UI wage records.
In other instances, observations may be included in impact analyses even if background information is incomplete. For instance, information on the demographic characteristics of a household is valuable for the construction of descriptive statistics and for increasing the precision of impact estimates. However, such information is not essential for impact estimates. Regression-adjusted means can still be calculated by imputing the missing background information, or by setting the missing information equal to a default value and adding indicators for the missing values, without introducing bias. Excluding a large number of cases with missing background information risks making the analysis sample less representative of the entire research sample, since particular types of recipient or applicant cases might be less likely to provide valid background information.
Missing or Invalid Outcomes Data. Outcome information may be omitted for certain cases because of omitted or incorrect values in administrative records or because of survey nonresponse, item nonresponse, or invalid responses to client surveys.
The presence of outcome information is usually essential for obtaining impact estimates. Imputing missing values of nonessential background variables is unlikely to bias impact estimates. Imputing values of the outcome variables themselves is more questionable, however, since it assumes that the relationship between background information and outcomes is the same for cases with missing information as for cases with nonmissing information.
If observations with missing outcomes data are excluded from impact analyses, then biased impact estimates may result if the incidence of missing outcomes data differs for experimental and control cases, or if observations with missing outcomes data differ from other observations in some systematic way correlated with the outcome variables. In these situations, use of a sample selection procedure may be possible, provided that at least some background information is available for the cases with missing outcome information and that a background variable can be identified that is correlated with the absence of outcomes data but is not correlated with the outcomes themselves. Assuming such a variable can be identified (which is not certain), correcting for a possible sample selection bias in impact estimates must still be balanced against the loss of precision in impact estimates that such corrections entail.