Approaches to Evaluating Welfare Reform: Lessons from Five State Demonstrations. 3. Analysis and Recommendations


Our recommendation is that states, in developing their preliminary evaluation sample designs, specify the precision standard that estimates of the key outcomes must meet (rather than minimum sample sizes) and the key outcome measures to which these standards must be applied. In addition, designs should justify the magnitude of the impact they expect to measure and the assumed variance of the outcome measure, which inevitably vary with the nature of the intervention. A reasonable precision standard would be the ability to detect a plausible impact on all applicants or all recipients at a 5 percent significance level with 80 percent power, using a one- tailed test. We do not generally recommend pooling the applicant and recipient samples; in the next section we discuss reasons. In addition, we recommend allowing reductions in sample size due to the increased precision from regression adjustment, particularly if plans for collection of baseline data (discussed further in Chapter V) are also included in the design.(16)

The study's research questions should determine the key outcome on which the sample size is to be based. It may be appropriate, however, for DHHS to recommend "default" assumptions

(based on a review of the existing literature) concerning magnitude of impacts, standard errors, and regression reductions in standard errors for the most common outcome variables. States then could elect to use these outcome measures and associated assumptions, or they could propose others, but they should state and justify their assumptions.

The power of the sample design to detect impacts should be addressed for key outcomes, for the full sample and for key subgroups (particularly for subgroups for which there is an explicit stratification). States also may wish to establish an explicit precision standard for subgroups. States should consider design effects resulting from any oversampling of subgroups; federal officials could also suggest default assumptions concerning the likely magnitude of such effects from a review of previous studies.