Approaches to Evaluating Welfare Reform: Lessons from Five State Demonstrations. 2. Limitations of an Experimental Design


The advantages of the experimental design discussed earlier are compelling. When the design is implemented carefully, most policy researchers see these advantages as eclipsing the limitations discussed next. However, there may be particular applications in which one or more of these limitations looms large--perhaps because of a strong policy need for information on a specific type of outcome that an experimental design is not well suited to provide.

An experimental design can be costly and challenging to implement. Program staff members sometimes are reluctant to implement random assignment; substantial training may be necessary to convince them that it is worth doing and doing right. Alternatively, it may be necessary to contract out certain aspects of random assignment. Either approach can be expensive. Program staff members also must be trained to operate the reform and pre-reform programs side by side in the research sites. Both random assignment and the operation of two programs simultaneously require additional managerial resources.

Two limitations are associated with the challenge of successfully implementing an experimental design. First, because an experimental design often is difficult and costly to implement, state administrators generally select only a subset of counties (or other administrative units) to implement random assignment. They may be inclined to choose only those sites that they believe will be successful both in implementing random assignment and in operating the reform and pre-reform programs concurrently. Selection of any small group of sites--particularly those more likely to be successful--means that the research sample of experimental and control cases is unlikely to be representative of the statewide welfare caseload (the broader population of interest). Consequently, findings from experimental evaluations frequently lack external validity, meaning that users of the research cannot generalize from the findings for the research sample to the full (state) population. With alternative designs that are easier to implement, state-level administrators may be more willing to select research counties randomly or to allow all counties to be research counties. Either approach may yield findings with a high degree of external validity.

A second limitation associated with the difficulty of implementing an experimental design is that it may be difficult to maintain pure versions of the reform and pre-reform programs for the experimental and control groups. Participants in the pre-reform program may receive elements of the reform program, or vice versa. For example, program staff could have difficulty keeping the rules of the two programs separate, or participants in one program could be exposed to advertising or news accounts of the other program and mistakenly assume that the rules governing the other program apply to them. Any such mixing of elements from the two programs would tend to bias impact estimates toward showing no impact of the reform program. In addition, cases in the experimental and control groups could be exposed to the other program if they migrate to a nonresearch site that is operating the other program or if they split into two cases or merge with a case that has a different research status.

Unless specifically designed to do so, an experimental design does not provide a strong basis for estimating the impacts of individual components or sets of components within a package of reforms.(4) To allow estimation of component impacts, a design must include random assignment of cases to multiple experimental groups. The number of such groups increases as the number of program components with impacts to be estimated increases. The number of different programs that must be operated also increases. Few states are willing to take on such an administrative burden. It can be done, however, as shown by the MFIP demonstration, in which a four-group experimental design is being used to estimate the overall impacts of the demonstration as well as the separate impacts of two distinct sets of reforms.

Some welfare reforms may be designed to discourage families from applying for welfare or from entering welfare if they are eligible; others may actually encourage applications (for example, among two-parent families). An experimental design will not support the estimation of such entry

effects because they occur prior to application and thus random assignment. Furthermore, although an experimental design will still give unbiased impacts for those who apply for welfare after welfare reform has been implemented, substantial entry effects may imply that these estimates are not applicable to the population that would have applied under the old program. A nonexperimental study of entry effects that examines application behavior over time is vulnerable to differences between reform and pre-reform groups that are not related to the demonstration; however, no practical experimental alternatives are available.(5)

Similarly, if an intervention is designed to have substantial community effects (that is, to change the culture and mores of an entire community), it may be necessary to implement the new program on a saturation basis in selected sites, and this precludes the use of an experimental design. The federal government approved the use of a quasi-experimental design to evaluate Wisconsin's WNW demonstration, largely because this demonstration was designed to have substantial community effects. There was also concern that the program had been designed to reduce caseloads by discouraging entry into cash assistance. The following subsection provides additional information on quasi-experimental designs and the application of such a design in the context of WNW.