How Effective Are Different Welfare-to-Work Approaches? Five-Year Adult and Child Impacts for Eleven Programs. Analysis Issues


Subgroups were organized around three barriers to employment: high school education (high school graduates compared with nongraduates), recent work experience (those who had worked in the year prior to random assignment compared with those who had not), and welfare history (those who had ever been on welfare at least two years prior to random assignment compared with those who had not). Results by high school credential were presented in Chapters 4-6. This chapter presents results for the other barriers. In addition, the three individual barriers to employment were used to define three mutually exclusive subgroups based on relative disadvantage. The "most disadvantaged" were sample members who did not have a high school diploma or GED prior to random assignment, did not work in the year prior to random assignment, and were on welfare for two years or more prior to random assignment. "Moderately disadvantaged" sample members faced only one or two of the three barriers, while the "least disadvantaged" faced none. Finally, results are presented in this chapter for racial and ethnic groups.

Subgroups were identified using information collected just before individuals were randomly assigned. Because these groups were defined by pre-existing characteristics observed at study enrollment, control and program group members in the subgroup should be comparable at the time of random assignment, and any systematic differences that emerge between the two groups can be reliably attributed to the programs being studied.

The chapter presents results for three outcomes: (1) earnings, (2) cash assistance, and (3) combined income from earnings, cash assistance, Food Stamps, and the federal Earned Income Credit (EITC) net of payroll taxes. The three outcomes represent three different perspectives. Many policymakers want to encourage welfare recipients to work; for them, the "best" program may be the one that increases earnings the most. Other policymakers may be primarily interested in reducing spending on welfare; for them the best program may be the one that reduces cash assistance the most. Welfare recipients and policymakers concerned about child and family poverty may care most about their total income; for them, the best program may be the one that increases income the most.

For each outcome, the chapter focuses on cumulative dollar amounts over a five-year follow-up period. Although use of program services by control group members might have reduced the effects of some programs in years 4 and 5, five years of follow-up are used for two reasons. First, the program effects over five years are generally similar to their effects over three years. Second, an earlier report presented a detailed analysis of subgroup impacts over three years.(1)

In analyzing subgroups, several types of comparisons are made, with each comparison answering a different question. The first question is whether there is evidence that the welfare-to-work programs taken as a whole  without regard to the approach they used or where they operated  tended to affect a particular subgroup. For example, with welfare time limits, administrators probably want to make sure that long-term welfare recipients are able to find work and leave welfare; if the programs were not particularly effective at benefiting long-term recipients, policymakers might want to target them for more resources or devise different and better services.

A second question is whether the programs tended to have larger effects for one subgroup than another. Again, the answer to this question can help policymakers think about how to use their precious resources or whether new services should be developed. Suppose that long-term recipients were generally affected by the programs being studied. Suppose, however, that they were affected less than short-term recipients. This might suggest that more effort or different services should be considered for long-term recipients to increase the effectiveness of welfare-to-work services.

A third question is whether one program approach benefited a subgroup more than another approach or whether it benefited one subgroup more than another subgroup. Because there are fewer programs of each type, however, statements about the effects of particular program models might be more speculative. For example, Portland is the only NEWWS program that was employment-focused with varied first activities. Although its impacts on earnings were by far the largest  and this chapter shows that the effects were also the largest for most subgroups  it cannot be determined whether this is a consequence of Portland's approach, the way sample members were chosen, the Portland economy (or other local factors), or unobserved differences between welfare recipients in Portland and in the other sites.

Since the number of people in a subgroup is, obviously, less than the number in the full sample, it is consequently more difficult to confidently say that an individual program had an effect for a subgroup than it is for the full sample, and it is more difficult to say whether the estimated effects are bigger for one group than another because of the program rather than by chance. However, the pattern of impacts across programs can provide statistical evidence that the programs taken as a whole had a particular effect, even if no individual program had a statistically significant effect. For example, suppose the question is whether welfare-to-work programs have a larger impact on earnings for long-term welfare recipients or for short-term welfare recipients. If 9 or more programs had a larger impact for long-term recipients than for short-term recipients, say, the hypothesis that the impacts are the same for the two groups can be rejected at the 10 percent significance level  even if no single program had a statistically significant different effect for long-term recipients than for short-term recipients. Likewise, if 10 or more programs have an impact in the same direction, we can reject the hypothesis of no difference at the 5 percent significance level, and if all 11 programs have an impact in the same direction, we can reject the hypothesis of no difference at the 1 percent significance level.(2)

The most rigorous means of examining the effects of program models is to compare the effects of the three LFA programs with their HCD counterparts. The chapter consequently devotes a section to this comparison. With only three programs of each type, however, it can be difficult to draw firm conclusions about the relative benefits of the two approaches by subgroup. As mentioned above, it is harder to find statistically significant effects for a subgroup than for the full sample and unlikely that the impacts between two subgroups will be statistically significantly different in any specific site. Moreover, three sites are too few to use only the pattern of results to draw conclusions about the relative effectiveness of the two approaches unless the differences in impacts between the two approaches are large. If, for example, the LFA and HCD approaches are equally effective, then the chance that all three LFA programs would have larger impacts than all three HCD programs would be 12.5 percent, or greater than the usual threshold for drawing conclusions based on statistical significance. However, it is extremely unlikely that all three LFA programs would increase earnings significantly more than all three HCD programs simply by chance, and statistically significant differences in all three sites would be enough to draw solid conclusions based on the statistical evidence.