We recommend that states that want to estimate impacts for separate components of a welfare reform package consider evaluation designs with multiple experimental groups. Such designs (Minnesota's four-group MFIP design is an example) can be more informative to policymakers than the standard two-group experimental design. The major disadvantages of multigroup designs are that they require a larger research sample to achieve the same precision standards as two-group designs and that a state must administer three or more programs simultaneously in the research sites. To reduce the burdens of maintaining a four-group design, states may want to consider adopting a three-group experimental design, defining two experimental groups--a full experimental group subject to all of the welfare reform provisions and a partial experimental group subject to a subset of the welfare reform provisions--in addition to a control group. The policy changes from which the partial experimental group would be exempt would depend on the interests of the state but might include components of the proposed welfare reform package that are especially controversial or untested.
When states introduce a new welfare reform package after an evaluation of an earlier initiative has begun, we recommend that a second research sample be created, if possible; this would preserve the integrity of the research sample used to study the first initiative. The second sample would consist of recipient and applicant cases that are randomly assigned to either the earlier package only or to the combination of policies contained under both packages. Creation of a second research sample would require state officials to administer welfare under three different regimes, but it would make it much easier to distinguish impacts of the first and second set of welfare reform packages, for both recipients and applicants, in both the short and the long term.
If more than a two-group experimental design is not possible, we recommend that evaluators not attempt to estimate impacts for specific welfare reform provisions within the overall package. Our investigation of welfare reform waiver evaluations found no evidence that separate impacts for different welfare reform provisions can be distinguished reliably in the absence of a design with multiple experimental groups. Instead, we recommend that evaluators confine their analysis of separate welfare reform provisions to qualitative inferences obtained on theoretical grounds or through a process study that includes interviews with program staff and welfare recipients.