Approaches to Evaluating Welfare Reform: Lessons from Five State Demonstrations. B. ALTERNATIVE DESIGNS FOR IMPACT EVALUATIONS


With only a few exceptions, the terms and conditions for welfare reform demonstrations in the 1990s have required evaluations based on an experimental design. (The most notable exception, the evaluation of Wisconsin's WNW demonstration, is discussed later in this chapter and elsewhere in this report). To assess the advantages and limitations of an experimental design, it is helpful to identify the key features of this design and several nonexperimental designs:

  • Experimental Design. Within selected research sites in which the reform program and the pre-reform program are operating side by side, target cases are randomly assigned to experimental status (the reform program) or control status (the pre-reform program). For the welfare reform demonstration evaluations, the target cases are ongoing welfare recipients and new applicants for assistance. Random assignment ensures that the experimental and control cases are alike, on average, in all respects except for the welfare program rules that they face. Thus, differences in average outcomes for the two groups can be attributed to the reform program. This type of design is said to have a high level of internal validity.
  • Self-Selected Comparison Group Design. This design typically is used to compare two versions of a program operating side by side, when target cases are permitted to choose which program to apply for or participate in (for example, two types of training programs). It may also be used to examine the effects of a program (versus nonparticipation) when it is deemed morally or practically impossible to limit who participates. A multivariate statistical model can be used, in principle, to control for differences in characteristics between cases that select the program of interest and cases that select the alternative program, thus isolating impacts due to differences between the two programs. In practice, such a model has two important limitations: (1) the impact estimates may be sensitive to the exact specification of a statistical model (typically, little guidance is available regarding certain aspects of model specification);(2) and (2) the statistical model may do a poor job of controlling for differences in difficult- to-measure characteristics of cases or individuals, such as self-esteem and ambition, that affect the program they choose, and this limitation (known as selection bias) may bias estimates of impacts.
  • Quasi-Experimental Design. A quasi-experimental design entails the selection of one or more groups of program applicants or participants to receive the reform program (the demonstration groups) and other groups to receive the pre-reform program (the comparison groups). The demonstration and comparison groups are separated in space or time but are matched on the basis of aggregate characteristics that are believed to influence the outcomes of interest. Despite the best attempts to match demonstration and comparison groups, important differences often exist, and these may be a source of bias in impact estimates. Statistical models have the potential to control for such differences, if the differences can be measured at the individual level, but they have the same limitations in a quasi-experimental design as in a self-selected comparison group design. (The advantage of the quasi-experimental approach is that individual-level differences should be smaller.) However, the key disadvantage of the quasi- experimental design is that any site-level (or time-period-specific) differences that affect outcomes may be confounded with the effect of the program. (Examples could include differences in economic climate or program administration.) The problem in this instance is not that the differences are difficult to observe, but that they only vary across sites (or time periods), and the number of sites (periods) is generally too small to allow all of these factors to be controlled for. Additional information on quasi- experimental designs is provided in Section B.3.

The first major application of an experimental design in social welfare policy research was to evaluate the negative income tax experiments of the late 1960s and early 1970s (Burtless and Hausman 1978; and Keeley et al. 1978). Since that time, there have been many social welfare policy evaluations based on experimental designs (Greenberg and Shroder 1991). The number and diversity of these evaluations have been increasing in recent years. Using data on several of these evaluations, methodological studies were conducted to determine whether nonexperimental evaluation methods could yield impact estimates similar in sign and magnitude to those generated by experimental methods (LaLonde 1986; Fraker and Maynard 1987; and Heckman and Hotz 1989). The interpretation of the findings from these studies remains controversial (Heckman and Smith 1995). The most common conclusion, however, is that nonexperimental estimators frequently provide different results than would be found in an experimental evaluation, and are therefore biased. Furthermore, the nonexperimental results are sensitive to minor changes in model specification. Thus, experimental estimators are preferred (Burtless 1995; and Friedlander and Robins 1995). DHHS shares this conclusion, as shown by the strong preference it exhibited for experimental evaluations of the welfare reform waiver demonstrations. In special circumstances, however, it approved alternative designs for evaluations of waiver demonstrations.

Despite the methodological strength of an experimental design, the difficulty of implementing such a design sometimes may limit its usefulness. In addition, there may be nontechnical reasons for preferring an alternative design (such as considerations of cost or fairness). The next two subsections consider the advantages and limitations of an experimental design, with particular emphasis on the needs of the impact analysis component of an evaluation. The third and final subsection defines various permutations of a quasi-experimental design and discusses when such a design might be desirable.