Impact on Young Children and Their Families 2-Years After Enrollment: Methods: How Did We Study Impacts on Children? . Strategy of Analysis


The data analyses that we will present in the following chapters follow a progression across:

  • descriptive analyses
  • analyses of aggregate impacts
  • examination of impacts for subgroups, and
  • explanatory analyses.

Below we briefly describe the aim and approach for each of these types of data analyses.

I. Descriptive Analyses

The goal of descriptive analyses is to provide a portrayal of the families and children in the sample apart from any effects of JOBS. Descriptive data on sample characteristics (presented in Chapter 4) are based on the information collected from respondents prior to random assignment. (10) For these analyses, as for all analyses in this report, we present findings separately for the Atlanta, Grand Rapids, and Riverside research sites. However, since we are relying on data collected before respondents were randomly assigned for these particular descriptive analyses, we combine the data across the three research groups (labor force attachment, human capital development, and control groups), and present summary figures. Thus, for example, we present the percentage of mothers with differing levels of educational attainment in each site, and we summarize the mean ages of mothers and children in each site, using baseline data.

Descriptive data on the developmental status of the children (presented in Chapter 5) are based on the child outcome measures described above that were collected as part of the two-year follow-up. Because the intent of providing descriptive data on the children is to portray their development apart from the effects of the program (in order to provide a context for interpreting subsequent findings on child impacts), we restrict our focus here to children in the control groups, the group in each site unaffected by exposure to JOBS. In presenting this descriptive portrayal of the developmental status of the children apart from JOBS, we will sometimes draw upon "benchmark data," or data for the same child outcome measures collected in other samples. For example, the measure of child behavior problems used at the two-year follow-up, the Behavior Problems Index, was also used in a national survey, the National Longitudinal Survey of Youth-Child Supplement. Behavior Problems Index findings for children of the same ages from the National Longitudinal Survey of Youth-Child Supplement can help us get a sense of whether the children in the control group of our sample have more or less frequent behavior problems, compared to children in a national sample.

II. Examination of Aggregate Impacts at Each Site

Having given a descriptive portrayal of the families in the sample and of the developmental status of the children apart from JOBS, we turn to an examination of program impacts on the children's developmental outcomes. A program impact reflects the average difference between families in an experimental group and families in the control group on a given outcome measure. Our examination of program impacts will contrast each experimental group (labor force attachment, human capital development) with the control group separately. We will carry out these contrasts separately within each site.

In examining program impacts, we first consider aggregate impacts. An aggregate impact reflects the difference, in a given site, between the average score on a particular measure for all of the families in one of the program groups, and all of the families in the control group. That is, in examining aggregate impacts we are asking whether, for a particular measure, there is a program impact for a research group as a whole, in a given site. In section III below, we describe analyses aimed at assessing whether program impacts occur in specified subgroups, in addition to or rather than for a research group as a whole, in each site.

We include in these analyses all of the families assigned at random assignment to the research groups of interest. Thus, for example, we consider all of the families assigned to the human capital development group whether or not they actually participated in basic education, job training, or employment activities, and contrast this group with all of the families assigned to the control group.(11) These group contrasts thus reflect, on the average, experiences of families in the different research groups in light of whether they were assigned to a JOBS program group, rather than according to their actual participation in program components.

All analyses of aggregate impacts will be reported separately by site and by program approach.(12) When the examination of impacts involves a continuous dependent variable (for example, children's scores on the assessment of cognitive school readiness), we have carried out ordinary least squares multiple regression. In these analyses, we examined each child outcome measure separately as a dependent variable, and included an experimental comparison "dummy" variable (i.e., either labor force attachment vs. control, or human capital development vs. control) as an independent variable to test program impacts. In each of these analyses we used a common set of covariates to improve the precision of the impact estimate by controlling for variation on background characteristics.(13) These covariates were chosen in communication with researchers at the Manpower Demonstration Research Corporation, so as to coordinate the present analyses of child outcomes with the analyses of economic outcomes at the two-year follow-up point being carried out with the larger NEWWS sample.

Where the examination of aggregate impacts involved a dichotomous child outcome variable (for example, in examining whether or not the focal child had any academic problems) rather than a continuous measure, the analysis was carried out using logistical regression. Again, each experimental group was contrasted with the control group separately; analyses employed the common set of covariates; each child outcome was examined in a separate analysis; and all analyses were carried out separately by site and program approach.

When we report that JOBS had a statistically significant impact on a child outcome, this indicates that the mean difference on a continuous outcome variable (for example, on the assessment of child cognitive development), or a difference in the proportion of children receiving a rating of one on a dichotomous variable (for example, the proportion of children with one or more academic problems), is unlikely to have arisen simply by chance. We will follow the convention of reporting an effect as statistically significant when data analyses indicate that there was a smaller than 10 percent probability that the finding could have arisen by chance, that is, reflected random variation in individuals' scores.

Tables reporting on aggregate impacts will note with a "+" superscript those effects that have less than a 10 percent probability of having arisen by chance. One asterisk will indicate effects that have less than a 5 percent probability of having arisen by chance, two asterisks will indicate a less than 1 percent probability, and three asterisks a less than one-tenth percent probability.(14)

As noted above, this report will also examine impacts on children from the point of view of whether they are of sufficient magnitude for policy makers to consider when developing policy. Such a "policy relevant" impact was defined, statistically, as one in which the effect size was at least one-third of a standard deviation on a given measure.(15),(16) While the "harm hypothesis" directs us to identify unfavorable, and especially "policy relevant," program impacts on children, we acknowledge that policy relevant program impacts may occur in a positive as well as negative direction.

In all discussions of impacts, the patterning of results (according to developmental domain, site, and/or program approach) will also be taken into account. Thus, while we present all statistically significant program impacts on children, we concentrate our discussion on impacts that show a distinct pattern, as well as impacts that are of sufficient magnitude to meet the criterion for policy relevance.

III. Examination of Subgroup Impacts at Each Site

As noted in Chapter 1, an important possibility is that JOBS will affect subgroups of families differently. In order to examine this possibility, we will go beyond the consideration of aggregate impacts to consideration of impacts for specified subgroups. Subgroups are delineated according to characteristics of the families at baseline and are categorized into "lower-risk" and "higher-risk" based on these variables. In an effort to minimize the number of subgroups examined and maximize the clarity of findings, information from ten baseline variables was drawn upon in creating higher and lower-risk subgroups of four different types. We refer to each approach to defining higher and lower-risk subgroups as a "risk composite" because each is based on multiple rather than individual baseline variables:

  • information on the number of children in the family and on the age difference between the focal and next oldest or next youngest child was used to create a "sibling constellation risk" composite;
  • items relating to depression and locus of control form a "maternal psychological well-being risk" composite;
  • information on mothers' educational attainment (i.e., at least a high school degree or GED), literacy, and numeracy was used to create an "educational risk" composite; and
  • information relating to practical barriers to employment (e.g., health problems, lack of transportation or child care), duration of welfare, and employment history was used to create a "work risk" composite.

Thus, for example, we ask whether JOBS programs had effects on children in families in which the mother was at higher and lower risk in terms of indicators of psychological distress at baseline; in which the mother was at higher and lower educational risk; in which the mother was at higher and lower work risk; and in families with more or closely spaced children and with fewer or less closely spaced children. For each of the composite risk measures, families were categorized in a mutually exclusive way, as at either higher or lower risk. Families could be categorized as at higher risk on more than one of the risk composites.

The particular baseline variables that formed the basis of the composite risk measures were chosen from the far longer list of available baseline variables on two grounds:

  • We hypothesized that mothers varying on these particular baseline variables might respond differentially to JOBS, which could in turn have implications for child impacts; and
  • These baseline variables have been documented to be important to the development of children.

Thus, for example, mothers showing few or many indicators of psychological distress at baseline might well differ in their ability to mobilize to respond to the requirements of JOBS. At the same time, there is ample evidence to indicate that maternal psychological distress is an important predictor of children's developmental outcomes (Downey and Coyne, 1990).

In addition to creating these composite risk measures, a summary index of cumulative risk was created, reflecting the number of composite risk factors at baseline for which a family was at higher risk. The risk summary score could range from 0 to 4, with a point assigned when:

  • the family was in the higher-risk category on the composite for sibling configuration risk (the child was in a family with three or more children at baseline, or in a family with two children born less than two years apart);
  • the family was in the higher-risk category on the composite for maternal psychological well-being risk (the mother received a score of at least two out of seven on a set of baseline indicators of depression and feelings of a lack of control over one's life);
  • the family was in the higher-risk category on the composite for work risk (the mother had received welfare for five or more years, or had no history of having worked full time for the same employer for six months or more, or reported at least four of seven barriers to employment);
  • the family was in the higher-risk category on the composite for educational risk (the mother had less than a high school diploma or GED; or scored in the lower levels of literacy, or scored in the lower levels of numeracy).

Families experiencing none or one of these baseline composite risks were considered to be at lower cumulative risk, while families with two to four of these composite risks were considered to be at higher cumulative risk.

In addition to the creation of higher and lower-risk subgroups in terms of sibling configuration, educational risk, work risk, maternal psychological well-being risk, and cumulative risk, we examined three further approaches to delineating risk on a more exploratory basis: age of child, maternal attitudes about working, and maternal attitudes toward school. While theoretically important, there is less empirical evidence to suggest that these constructs provide meaningful bases for identifying risk within the present sample. These variables allow us to distinguish among families in which the mother had more and fewer reservations about working (with more reservations about working hypothesized to reflect higher risk); more and less positive attitudes toward school (with less positive attitudes about school hypothesized to reflect higher risk); and in families in which the focal child was the median age or younger at baseline or older than the median age at baseline (with younger child age hypothesized to reflect higher risk). As will be seen in Chapter 7, analyses of child outcome measures for control group families supported the use of only one of these more exploratory bases for grouping families as a risk measure: "attitudes toward work" risk. For this but not the other more exploratory measures, children's scores in the three sites' control groups consistently went in a direction indicating less favorable development in the group hypothesized to be at greater risk.

Table A-1 (in Appendix A) provides the definition and sample sizes for each of the baseline subgroups at each of the research sites.

The examination of subgroup impacts focuses on effects within a particular higher or lower-risk subgroup. For example, we consider impacts on child outcome measures in the subgroup of families at higher educational risk. Within this baseline subgroup, we ask whether families in one of the experimental groups have mean or proportion scores on child outcome measures that differ significantly from the scores of families in the control group. We then ask the same question for the subgroup of mothers in the subgroup at lower educational risk. In the same way, we ask whether there is evidence of significant program impacts within the higher and lower-risk subgroups in terms of work risk, maternal psychological well-being risk, sibling configuration risk, cumulative risk, and the more exploratory approaches to delineating risk (especially "attitudes toward work" risk).

Apart from the delineation of a particular subsample to focus upon as the sample for each subgroup impact analysis, we follow the same strategy here as was noted for aggregate impacts. For example, we use the same set of covariates in all analyses; we carry out ordinary least squares multiple regression or logistical regression in keeping with the nature of the outcome variable examined; and we reporting findings separately by site and program approach.

IV. Explanatory Analyses

Having identified the child outcomes for which there are significant aggregate impacts and impacts for specified subgroups, the focus of analysis will shift to the question of what underlies the program impact findings for children? In a modest set of non-experimental analyses, we will examine the pathways through which particular JOBS programs appear to have affected children using mediation analyses (Baron and Kenny, 1986). The first step requires identifying the child impacts that we wish to examine. We do not attempt to explain all significant program impacts on children; rather, for these mediational analyses, at least one aggregate impact in each developmental domain (i.e., cognitive development and academic achievement; behavioral and emotional adjustment; physical health and safety) was selected that generally illustrates the pattern of results for that domain. In order to conclude that a program impact on a targeted or non-targeted outcome helps to explain statistically, or "mediates," the same program's impact on a given child outcome, three conditions must hold (see Baron and Kenny, 1986): (1) the adult outcome must, itself, be affected by the JOBS program being considered; (2) the adult outcome must predict the child outcome (with the JOBS program dummy also in the model), and (3) with this adult outcome variable in the model, the previous impact of a JOBS program on the given child outcome must be smaller than without this variable in the model.

We should emphasize that, while we draw conclusions regarding the degree to which adult impacts appear to have led to impacts on children, the adult impacts we examine as possible mediators of program impacts on children were measured concurrently with children's outcomes; that is, both adult and child outcomes were measured at the two-year follow-up. Thus, any causal conclusions regarding the pathways through which children were affected by their mother's assignment to a JOBS welfare-to-work program must be made cautiously. Information from the five-year follow-up will allow us to examine the chronological nature of program impacts. This subsequent wave of data, combined with more rigorous statistical techniques that allow the direct testing of alternative hypotheses regarding pathways of program impacts, will improve our ability to identify the ways in which children were affected by JOBS welfare-to-work programs.


1.  Researchers in the behavioral sciences often rely on Cohen's (1988) characterization of effect sizes (in standard deviation units) of .20 as "small," .50 as "medium," and .80 as "large."

2.  Sixty-four cases were dropped because the focal child was not the respondent's biological or adoptive child. (There is only one adoptive child in the Child Outcomes Study sample.)

3.  Two children were too old, and one child was too young, to be focal children; child ages must have been incorrectly reported at baseline and their families should not have been selected for the Child Outcomes Study in the first place.

4.  A total of 69 families - all in Riverside - were dropped from the sample because they had moved 100 or more miles away from Riverside County.

5.  A total of 67 mothers reported living away from the focal child for at least three months at the time of the two-year follow-up.

6.  Despite the fact that not all families eligible and randomly assigned at baseline are contained in the sample for the present report, the "fidelity" of random assignment was maintained -- that is, there is no systematic difference between the experimental and control groups on baseline characteristics - with one exception. In Riverside, among those identified as "in need" of basic education, those assigned to the labor force attachment program differed from those assigned to the control group on a few background characteristics. However, neither group can be considered uniformly more advantaged or less advantaged since, on some characteristics (e.g., prior employment ), the control group appeared more advantaged, whereas on other characteristics (e.g., maternal psychological well-being), the LFA group appeared more advantaged. Moreover, these differences were controlled statistically in all impacts analyses by including the variables on which these groups differed as covariates.

7.  A special "synthesis" report (Hamilton, with Freedman and McGroder, 2000) draws together the findings relating to any child in the family from the present Child Outcomes Study report, and these "any child in the family" items from the full NEWWS sample.

8.  Internal consistency reliability indicates the extent to which the individual items that make up a scale, all of which should reflect the same hypothetical underlying construct, are interrelated or "hang together" statistically. The measure used to reflect internal consistency reliability, Cronbach's alpha, has a possible range of 0 to 1.0, with higher scores indicating better internal consistency reliability.

9.  See the National Health Interview Survey, the National Health and Nutrition Examination Survey, the Rand Health Insurance Experiment, the Medical Outcomes Study, and the Child Health Questionnaire (Krause and Jay, 1994; Landgraf, Abetz, and Ware, 1996).

10.  Missing baseline data occurred on selected items from the Private Opinion Survey (POS), which measured clients' attitudes toward welfare, their psychological well-being, and the barriers to employment they faced. Because these baseline variables were important for impacts analyses -- both as covariates and, in subgroup impacts analyses, in defining baseline subgroups -- we imputed values where data were missing. In addition to relying on information regarding site in imputing these data, we selected other POS attitudinal variables to use as the basis for imputation, after examining which particular POS variables were most highly correlated with the variables for which we were imputing scores. Specifically, imputation was done based on data regarding site, JOBS office at random assignment, number of baseline risks, and high school degree status. The descriptive portrayal of families in Chapter 4 do not rely on imputed data.

11.  As we have noted, however, in the Riverside site, members of the human capital development group were contrasted only with members of the control group considered, at baseline, to be in need of basic education (i.e., those without a high school diploma or GED, who demonstrated lower levels of literacy, and/or were not proficient in English at baseline. Hamilton et al., 1997).

12.  Impact analyses were weighted to adjust for cohort differences in the assignment of clients to a treatment stream or to the control group (to preserve the experimental design), as well as to allow generalizations to populations from which the evaluation sample was drawn, namely, the county's AFDC-eligible population. Additional factors entering into the weighting were the number of JOBS offices (Riverside had more than one), high school/GED status, and cohort differences in Atlanta. Weights were decided upon in collaboration with researchers at MDRC, to assure common analytic approaches in the Child Outcomes Study and the NEWWS.

13.  Model covariates included were: marital status, number of children, race, mother's age, average AFDC benefit per month, number of months received AFDC in prior year, focal child's age and gender, high school diploma or GED, literacy, numeracy, time on welfare, work history, depressive symptoms, locus of control, sources of support, family barriers, and number of baseline risks.

14.  All tests of program impacts were "two tailed." That is, we did not begin with a hypothesis about direction of effects (for example, that scores for children in the human capital development group would be better than those in the control group), but rather considered the possibility of effects in either a positive or negative direction.

15.  Standard deviations were calculated separately for each site's control group(s), yielding a criterion for policy-relevance that is identical in a relative sense (i.e., .33 of a standard deviation) but that varies in an absolute sense, depending on the distribution of the measure in the particular site's control group.

16.  For example, on the Bracken Basic Concept Scale/School Readiness Composite, which ranges from 0 to 61, a difference as small as 3.6 points (in Atlanta), 3.8 points (in Grand Rapids), 4.2 points (for the impact of Riverside's LFA program), and 4.3 points (for the impact of Riverside's HCD program) - representing about four school readiness concepts relating to colors, letters, numbers and counting, comparisons, and shapes -- would be considered policy relevant. As another example, regarding the proportion of focal children in "very good" or "excellent health," a difference of at least 13.5 percentage points (in Atlanta), 13.1 percentage points (in Grand Rapids),13.2 percentage points (for Riverside's LFA program), and 13.4 percentage points (for Riverside's HCD program) is considered policy-relevant. In fact, for dichotomous outcomes, one-third of a standard deviation actually represents a relatively large impact in absolute terms.