A commonly accepted method of evaluating the quality of the match is to examine whether or not the comparison and treatment groups differ in their observable characteristics.(73) We therefore perform a series of t-tests that compare the characteristics of the two treatment groups to the characteristics of each of their potential comparison groups. Table B.2 reports the t-statistics derived from comparing the mean of each characteristic of the treatment group with that of the comparison groups for both the full and the low-income samples.
|Comparison Group 2b: Not Employed to Not Employed||Comparison Group 1b: Employed to Not Employed||Comparison Group 2a: Not Employed to Employed||Comparison Group 1a: Employed to Employed|
|Full Sample||At-Risk||Full Sample||At-Risk||Full Sample||At-Risk||Full Sample||At-Risk|
|Married and female||-0.26||-0.07||0.57||-0.04||1.89||0.14||0.54||0.11|
|Change in marital status||-0.86||-1.18||1.49||0.28||-0.31||-1||0.49||0.49|
|Number of children||-1.51||-1.16||-2.67*||-1.55||-0.8||-0.15||-0.14||0.21|
|Decrease in number of children||-0.64||0.07||-1.37||-1.2||-1.23||-0.78||-0.85||-0.81|
|Child under one||0.12||0.13||-1.22||-0.19||0.48||0.84||-0.06||0.11|
|Child under three||-0.66||-0.42||-1.53||-0.12||0.44||1.05||-0.35||0.25|
|Child under five||-1.13||-0.77||-1.83||-0.35||0.32||1.02||-0.6||0.1|
|Number of adults||0.65||-0.64||-1.52||-0.92||-1.69||-1.85||-0.42||-0.15|
|Increase in number of adults||-0.12||-0.43||-3.55*||-1.93||1.09||0.33||-0.07||-0.32|
|Decrease in number of adults||0.49||0.41||-1.01||-0.5||-4.54*||-4.27*||0.12||0.36|
|100 to 200% of poverty||0.01||0.42||0.75||3.67*||-1.05||-0.24||0.39||-0.35|
|200% of poverty||0.79||2.47*||1.64||-0.77|
|Short term work history||2.66*||2.50*||2.87*||1.42||-2.22*||-3.01*||0.15||1.2|
|Percent of last 10years working||-2.12*||1.79||5.46*||2.85*||-0.7||-1.06||-0.21||-0.6|
|Percent of time in welfare||-1.13||-1.03||1.34||1.75|
|Duration of current job||1.57||1.05||-2.94*||-1.35|
|Duration between jobs||-2.48*||-1.08||-0.34||-0.41|
|Low wage occupation||-2.41*||-1.44||-0.54||-0.35|
|Low wage industry||-2.48*||-0.01||-0.51||-0.07|
|Number of Temp Workers||738||425||648||234||738||425||648||234|
|Source: SIPP 1990-1993 panels, calculations by the Urban Institute.
Note: At risk defined as below 200% of family poverty level in month prior to reference month. The comparison group mean is the average of the mean within each of the five quintiles.
* Significance of the t-statistics at the 0.05 level.
An analysis of this table reveals that the matching procedure generally worked well in grouping like individuals based on demographic characteristics. There is little significant difference between either set of treatment and comparison groups on the basis of age, sex, race, and education. There is also little difference between the two groups in terms of household structure--marital status, number of children--or changes in the household structure. An exception is in matching temporary workers who were previously employed to those who moved to non-work. For that comparison, many demographic characteristics show significant differences between the comparison and treatment groups.
The only set of characteristics in which the matching procedure consistently performed poorly was on the work history variables: particularly the measures of long- and short-term work history and unemployment duration. This suggests that the models that we use fail to capture the full process by which individuals select into each group, and hence that our estimates are likely to be biased by the degree to which this failure occurs. This is not surprising; it would be difficult to argue that individuals take temporary jobs without the existence of work history factors that affect that choice. The construction of more detailed work histories might well be a solution to controlling for the differences we observe, but this is not possible with the current SIPP dataset.(74) These results, however, do reinforce our earlier suspicion that datasets that are unable to control for such work history measures (such as the CPS) would not be appropriate for use in such an analysis.
62. We define at risk as 150 percent of the federal poverty level rather than 200 percent (our definition of at risk for the SIPP analysis, described below). The lower cutoff is used here because the sample size is adequate for analysis and the lower cutoff provides a sample at greater risk of receipt than is the case with the higher cutoff.
63. However, because the CPS is an addressed-based household survey, the actual number of matched cases is lower, due primarily to individuals changing residences from month to month.
64. Income is based on family income from the month prior to either the start of temporary work or a randomly chosen month (for members of the comparison group), multiplied by 12 to get an annual equivalent. This annualized income is then compared to the federal poverty level.
66. This is particularly important for our analysis of persons at risk of welfare, for whom nonemployment may be at least as likely of an alternative to temporary work as nontemporary work.
67. Our inability to describe the impact of temporary work for welfare recipients results from our relatively small number of temporary agency workers. With a large enough sample of temporary workers, we could sub-sample cases to obtain a distribution of propensity scores similar to that of all welfare recipients. Then we would be more comfortable claiming that we had estimated the effect of temporary work on the full sample of nontemporary workers.
68. We include in our logit analysis of temporary work observations that are missing data a year later in an attempt to include as many cases as possible in predicting who is likely to be employed in temporary work. These cases are excluded from the matching procedure and from the analysis of the effects of temporary work because they lack the outcome information from a year later.
69. To make the sample sizes manageable (and to ensure that they reflect the distribution of survey months), we include data for only one month chosen at random per household in the comparison group. The month is chosen from all months that occur at least 12 months before the end of the panel to ensure a sufficient follow-up period.
70. Separate analyses by previous employment status are also expected to make the experiences of those categorized as temporary workers more homogeneous within a grouping.
71. The quintiles procedure was suggested by Rosenbaum and Rubin (1984). A second approach would be to choose for each temporary agency person, the comparison group person with the most similar propensity score. Both approaches will lead to similar distributions of propensity scores for the two comparison groups and the treatment group of temporary workers. For relatively rare transitions, such as those from employment to nonemployment, the reweighting approach is more feasible for our analysis, since it requires fewer observations than a one-to-one match.
72. As of June 2001, standard errors have not yet been adjusted for non-independence of the cases using STATA's cluster option.
73. While it is possible, and even likely, that the groups differ in their unobservable characteristics--and that this may systematically bias the evaluation of the impact--this problem is endemic to evaluation studies (see, e.g., Heckman et al., 2000), and currently unsolved.
74. A variant of this that was suggested by Rosenbaum and Rubin (1984) is to include in the model interaction terms that capture the variation across sample groups in the effects of work history. For example, work history variables may have different effects on the likelihood of temporary work for older women with no children than for young men. To date, experimentation with such interactions--such as separate models for low- and high-income cases or for men and women did not yield an appreciable improvement in the quality of our match.