How Effective Are Different Welfare-to-Work Approaches? Five-Year Adult and Child Impacts for Eleven Programs. Calculating Impacts


As discussed above, control group outcomes in this evaluation represent outcomes expected in the absence of a welfare-to-work program. Program-control differences show the effect, or impact, of each program. In the sites that conducted side-by-side evaluations of alternative program approaches, differences between the outcomes for each program group represent the relative effects of each program.

Although random assignment minimizes the likelihood of the research groups' differing systematically at the outset, there can be small differences in their average characteristics at random assignment. To control for these differences, the outcomes for each research group were regression-adjusted using ordinary least squares in all the analyses presented in the chapters that follow.

In this report, a difference between the program and control groups with respect to a particular outcome is considered statistically significant if the result of a statistical test indicates that there is less than a 10 percent probability that the difference occurred by chance (that is, when the p-value, or level of significance, of the difference is under .10). Impacts are generally reported only if they are statistically significant. This rule is intended to keep researchers from inferring an impact where none exists.(24)

Many analysts have noted that the greater the number of analyses conducted (regardless of the outcomes or domains studied), the greater the likelihood of chance findings and, thus, one needs to take the number of outcomes examined into account. However, some argue that this is relevant only when outcomes are not theoretically independent from each other. There are stringent statistical tests of multiple dependent variables that automatically adjust for the (limited) number of theoretically related outcomes (that is, multivariate analysis of variance, or MANOVA), as well as post-hoc corrections to p-values that can be applied to results from multiple individual analyses of "similar" outcomes (for example, the Bonferroni correction).

For this report, we have not attempted to adjust for the number outcomes such as employment and AFDC receipt that are examined because many of these outcomes are so highly statistically significant that they would pass the most stringent statistical correction for the fact that many outcomes are being measured. By contrast, because we are less certain about whether the nontargeted (child and family) outcomes examined in Chapters 9, 11, and 12 are theoretically independent from one another (and, thus, whether we may in part be capitalizing on chance by examining multiple measures of the same or similar underlying constructs), we calculate and report the number of findings we might expect by chance as if they were independent from one another. The proportion of statistically significant impacts across all family outcome measures (in Chapter 9), and by selected categories of these measures, or across all child outcome measures (in each of Chapters 11 and 12) and across all relevant programs was calculated. Specifically, given that the experiment-wise Type I error rate was set at .10, any one result will emerge as significant 10 percent of the time owing to chance alone. The number of chance significant outcomes were calculated and noted in drawing any conclusions about the effects of these welfare-to-work programs.

Some might argue that a more stringent standard is needed, requiring that the number of significant impacts within each program must exceed chance levels or that the number of significant impacts within each domain of child development must exceed chance levels. Because there is a lack of consensus on this issue among statisticians, and given that a goal of the analyses on family and child outcomes was to provide a thorough examination of program impacts, we did not adhere to a more stringent standard.

All impact estimates are based on the entire research sample, including program group members who did not participate in program activities (it is likely that nearly all "nonparticipants" in the program group encountered the program messages and participation mandates, which may have affected their decision to look for work or to leave welfare). Because all sample members are included in the analyses, the impacts must be interpreted as being the results of the welfare-to-work programs as a whole, not only of participation in specific program services. By the same principle, calculations of average earnings and welfare payments  which form the basis of many of the impact estimates  include sample members who were not employed (that is, earned $0) or did not receive welfare (that is, received $0 in welfare). To the extent that a program turns nonearners into earners or encourages welfare recipients to leave welfare, excluding these $0 values from the program and control group averages would lead to seriously biased underestimates of program impacts. For example, previous research has shown that some welfare-to-work programs dramatically increased the proportion of people who have earnings without affecting the average earnings of those who work. These programs led to a relatively large impact on earnings when all sample members were included in the calculation. However, omitting people with $0 earnings from the analysis would have suggested that these programs had no impact on earnings.(25)

Some analyses in this report focus on subgroups of the full impact sample. In one such set of analyses, presented primarily in Chapter 7, each site sample is broken down by various background characteristics (such as previous work history) measured at the time of random assignment. The impacts found for these subgroups can be confidently attributed to the programs under study because they are based on characteristics measured before anyone entered the program and because program and control group members are similar in other respects; that is, the only difference between the program and control subgroups with a particular background characteristic is exposure to the program. In the language of evaluations, such impacts are experimental. However, because they are based on smaller samples of people, the impact estimates for subgroups are less likely to be statistically significant than those for the full sample. Other analyses in the report compare outcomes such as average hourly wages for program and control group members who shared characteristics (such as being employed) acquired after random assignment. These nonexperimental comparisons should be interpreted with caution because the research groups may differ with respect to measured or unmeasured background characteristics that affect employment and, in turn, hourly wages. The report presents findings from such nonexperimental comparisons to explore underlying trends in the experimental impact estimates.

Because data were unavailable, the results for sample members in Atlanta who were randomly assigned during the last six months of the random assignment period are not presented in this report. Similarly, because welfare and Food Stamp data were not available for years 4 and 5 for sample members in Oklahoma City, the year 4, year 5, and cumulative impacts on welfare receipt, Food Stamp receipt, and combined income are not reported for Oklahoma City.

As discussed in Chapter 1, changes in the labor market and the environments in which the programs operated during the follow-up period could have affected program impacts. In particular, the economic expansion that began in the mid 1990s created a strong demand for entry-level jobs nationwide. However, because program and control group members in each site experienced these changes, it is difficult to know ahead of time which group was most affected by them. On the one hand, program impacts on employment and earnings may diminish as more control group members find employment. On the other hand, during an economic expansion welfare-to-work programs may help people advance more quickly to higher-paying or more stable employment, resulting in increasing impacts over time.

In addition, most of the programs in the evaluation became more employment-focused over time. As a result, in the last two years of the follow-up period, people assigned to education-focused programs who remained on welfare received services and messages similar to those that people in the employment-focused programs were exposed throughout the follow-up period. However, people in the education-focused programs were probably little affected by this evolution in program approach because  even under the original program model  many of them would have received job search assistance if they had not found employment after completing education and training activities.(26)