Chapter 1 addressed the question of why it is important to focus on the effects of JOBS programs on children. We turn now to the question of how child impacts were studied in the NEWWS Child Outcomes Study and, more specifically, in the two-year follow-up wave of the study.
We begin by providing the context for the two-year follow-up. An overview is provided of the design of the National Evaluation of Welfare-to-Work Strategies (NEWWS), and of the NEWWS Child Outcomes Study that is embedded within it. We then describe the procedures and measures specific to the two-year follow-up survey of the Child Outcomes Study, noting especially how children's developmental outcomes were assessed. We conclude with an overview of our strategy for analysis of the data regarding child outcomes.
The focus here is on design, procedures, measures and analysis strategy. Chapters 3 and 4 complement the present chapter: Chapter 3 by providing a description of the three study sites (demographic context as well as information about how the various types of welfare-to-work program strategies, under the auspice of the JOBS Program, were implemented at each site), and Chapter 4 by providing a detailed description of the study sample at each site.
Key Questions Addressed in Chapter 2
- What is the design of the NEWWS within which the Child Outcomes Study is embedded?
- What is the design of the NEWWS Child Outcomes Study?
- What are the procedures of the two-year follow-up survey in the NEWWS Child Outcomes Study, the source of much of the data for the present examination of child outcomes?
- How were child outcomes measured in the two-year follow-up survey?
- What is our strategy of analysis for examining the data on child outcome measures?
Design of the National Evaluation of Welfare-to-Work Strategies
As noted in Chapter 1, the Family Support Act of 1988 recommended an experimental evaluation of the impacts of the JOBS Program, that is, an evaluation involving random assignment of families to control and experimental groups. In such a design, it can be assumed (and documented) that the families in the different research groups did not differ in terms of background characteristics prior to their assignment to the different research groups. After random assignment, apart from the experiences associated with the research group they are assigned to, families in each site are all exposed to the same broad context, for example, in terms of the job market and local economy. Given that the families did not differ initially, because the assignment to research groups is done randomly rather than according to the backgrounds of the families, and because the families reside in the same broad economic and social context within each study site, significant differences between research and control groups detected upon following up the families can be assumed to reflect the differences in their experiences associated with assignment to differing research groups.
The National Evaluation of Welfare-to-Work Strategies, currently being conducted by the Manpower Demonstration Research Corporation (MDRC), follows such an experimental research design. The purpose of the NEWWS is to assess the impacts of various types of welfare-to-work strategies under the auspice of the JOBS Program on adult human capital and economic outcomes, including program effects on:
- employment stability
- total family income (including earnings and benefits)
- receipt of welfare and other government benefits;
- government expenditures
- maternal educational attainment, and
- maternal reading and math literacy skills.
Findings regarding adult human capital and economic impacts in three of the seven sites in the evaluation have been reported on previously (Hamilton, Brock, Farrell, Friedlander, and Harknett, 1997), and findings on economic impacts in the full set of eleven programs in the seven sites are being released in parallel with the present report (Freedman, Friedlander, Hamilton, Rock, Mitchell, Nudelman, Schweder, and Storto, L., 2000). In addition to the study of program impacts, the NEWWS also includes components examining the implementation of the JOBS program in differing sites (Hamilton and Brock, 1994; Hamilton et al., 1997) and a cost-benefit analysis (see Hamilton et al., 1997).
In the three NEWWS sites in which the Child Outcomes Study is embedded, there are two experimental groups. Each of the experimental groups involves a different program approach within JOBS: the labor force attachment (LFA) approach emphasizes a quick transition into the labor force through job search activities, while the human capital development (HCD) approach emphasizes enhancing welfare recipients' skills through education and training, as a means to obtaining employment at higher wages and with better prospects of advancement. As noted in Chapter 1, these approaches represent differing views on how best to foster economic self-sufficiency in welfare recipients (Hamilton et al., 1997). The labor force attachment approach assumes that participating in the workplace is the best way to learn work behaviors and skills. The human capital development approach assumes that building "human capital," (skills related to employment), is an important step to take prior to employment, that will help assure higher earnings and greater job stability. These three study sites thus use a "planned variation" research design, in which the outcomes of contrasting program approaches can be compared within each of the sites. A more detailed description of the program approaches can be found in the publications reporting on the NEWWS thus far (Freedman et al., 2000; Hamilton and Brock, 1994; Freedman and Friedlander, 1995; Hamilton et al.,1997), and is also provided in Chapter 3 for the particular sites included in the NEWWS Child Outcomes Study.
The NEWWS is being carried out in seven sites across the country. In three of the seven sites (Atlanta, Georgia; Grand Rapids, Michigan; and Riverside, California), families in the evaluation were randomly assigned to one of three research groups: the two program groups (the human capital development group or the labor force attachment group), or to a control group. In a fourth site (Columbus, Ohio), two different forms of case management were contrasted with a control group: an integration of case management for income maintenance and JOBS participation, and case management that focused on these separately. In three further sites (Detroit, Michigan; Oklahoma City and surrounding counties in Oklahoma; and Portland, Oregon), families were randomly assigned to only one of two groups: the site's pre-existing welfare-to-work program or the control group.
Once the income maintenance case worker had reached a decision that a welfare recipient or applicant was not exempt from legislatively mandated JOBS participation, the recipient then received notification to report to a JOBS program orientation. In the Child Outcomes Study sites (Atlanta, Grand Rapids, and Riverside), random assignment to a JOBS program occurred at the orientation. At random assignment, recipients were given a presentation about the evaluation, an assessment of their basic reading and math skills was administered, and they were asked to provide background information. Those who met the criteria for inclusion in the evaluation (noted below) were then randomly assigned to a research group within the evaluation. As we will note below, there was a further step in the random assignment process in the Riverside site than in the other two research sites.
Families were considered eligible for inclusion in the NEWWS when they met the following criteria:
- They had applied for or were receiving Aid to Families with Dependent Children (AFDC) at the time of enrolling in the evaluation.
- They were not exempt from participation in the JOBS program, meaning that the recipient was not ill or incapacitated, caring for a household member who was ill or incapacitated, pregnant past the first trimester, or living in an area where program services were unavailable, and did not have a child younger than age three (or age one at state option, an option taken by three states with sites in the NEWWS, but affecting only one of the study sites in the Child Outcomes Study: Grand Rapids).
It is important to note a variation on the random assignment process that occurred only at the Riverside site, and was necessitated by program regulations at the state level. At this site, regulations required that a distinction be made between those deemed "in need of basic education" and those deemed "not in need of basic education." Individuals were considered in need of basic education when they met any one of the following conditions: they (1) did not have a high school diploma or General Educational Development (GED) degree, (2) had a low score (214 or below) on either the reading or math component of the assessment (the GAIN Appraisal test), or (3) required remediation in English. In a further step in the random assignment process in this site only, individuals in need of basic education, and those not in need of basic education were then assigned to different random assignment processes. Those who were considered to be in need of basic education were randomly assigned to any one of the three research groups. However, those considered not in need of basic education could be randomly assigned only to the labor force attachment or control groups (see Hamilton et al., 1997).
As a result, when contrasts of research groups are carried out in the Riverside site, those in the human capital development group are compared to control group members who are likewise considered in need of basic education, whereas members of the labor force attachment group (who could be in need or not in need) are compared to all control group members. The use of a subset of the control group in the Riverside site for comparisons with the human capital development group is apparent in the program impact tables in Chapters 6, 7, and 9. The fact that all families in the human capital development group in Riverside were in need of basic education, whereas this was not the case for the human capital development groups in the other two sites, should be kept in mind when looking across the three sites at the findings for the human capital development program.
For mothers assigned to either the labor force attachment group or the human capital development group, participation in the activities of the JOBS Program was mandatory. That is, the mother was required to participate in JOBS program activities, or she faced the possibility of sanctioning (reduction in welfare benefits). Mothers in the control group, while eligible for Aid to Families with Dependent Children benefits, were not required to participate in any JOBS activities. Control group members were, however, free to seek out education and training programs in their communities at their own volition, and were guaranteed child care while participating in such approved activities, as required by the Family Support Act provisions.
It is important to note that the experimental evaluation of the JOBS program does not focus on the effects of participating in the JOBS program per se. Rather, the experimental evaluation assesses the impact of assignment to a JOBS experimental group, and thus exposure to program messages, services, and mandate to participate. The evaluation carefully documents how many mothers in each experimental group participated in JOBS program activities, and also considers the implications of participation for the major outcomes. However, when each of the experimental groups is contrasted with the control group in the analyses of program impacts, the experimental groups include all those who were assigned to those groups, whether or not they actually participated in the program. That is, the experimental groups include individuals who did not participate in any JOBS activities despite the mandate to participate, and might have been sanctioned as a result, as well as those who did participate.
For all of the sample members in the NEWWS, background information and attitudinal data are available from information collected just prior to random assignment (standard client characteristics and Private Opinion Survey). In addition, immediately prior to random assignment, recipients were given an assessment of reading and math skills. Because these data were collected prior to random assignment, they provide us with "baseline data" about the families, or data unaffected by assignment to a research group. In addition, administrative data are available for each of the families in the sample from county and state Aid to Families with Dependent Children records, and from state unemployment insurance records.
While baseline and administrative data are available for all sample members in the NEWWS, a subset of the full evaluation sample is also participating in two follow-up survey waves: one completed approximately two years after random assignment and another five years after random assignment. The sample for the client survey is a stratified random sample of the full evaluation sample; that is, those participating in the client survey were randomly selected, with certain subgroups systematically oversampled to permit analyses of specific subgroups. Only respondents who spoke English or Spanish, and thus could be interviewed in one of these languages, were included in the client survey sample. Analyses of the client survey data are weighted to permit generalization to the full population of individuals eligible for the NEWWS at each site. The two and five-year follow-up surveys provide maternal response measures on such issues as participation in educational and training activities and perceptions of such activities, educational attainment, employment, earnings, receipt of benefits, and use of child care while the mother is employed.
Design of the NEWWS Child Outcomes Study
The JOBS program departed from earlier welfare-to-work programs in that it mandated the participation of parents with children as young as age three (or younger at state option). Previous welfare-to-work programs were often voluntary, and had focused their attention on mothers with school-age children. In the context of the JOBS Program, preschool-age children were expected to be particularly likely to experience changes in their daily routines and child care situations. The NEWWS Child Outcomes Study, being carried out by Child Trends under subcontract to the Manpower Demonstration Research Corporation, was launched as a special substudy within the larger NEWWS, in order to study whether and how the development of preschool-age children was affected over time when their mothers were assigned to a JOBS program.
As noted in Chapter 1, there are reasonable bases for hypothesizing quite divergent program impacts for children (ranging from negative impacts, to neutral impacts, to positive impacts, or to impacts only for specified subgroups). Thus, the NEWWS Child Outcomes Study does not begin with a specific hypothesis about the direction of effects on children but rather seeks to document the full range of impacts both in the aggregate and for specified subgroups. At the same time, a priority is placed in our examination of the impacts for children on assessing whether the JOBS Program had unfavorable impacts on children (the "harm hypothesis"). For policy makers, two important bases for assessing the impacts of the JOBS Program are whether it had positive effects on family economic self-sufficiency and, at a minimum, did not harm children.
In the present study, we report all program impacts on children that are statistically significant. These program impacts are reliable: they are very unlikely to have occurred just on the basis of chance. As such, these program impacts warrant continued monitoring. In the Child Outcomes Study, we will want especially to monitor whether the kinds of measures on which statistically significant impacts were found at the two-year point continue to show differences at the final follow-up (five years after the families enrolled in the evaluation), and if such differences grow in magnitude.
We also report on whether a statistically significant result meets a further criterion: that of "policy relevance." At the start of the study and as the study proceeded, researchers and policy makers met to grapple with the question of the point at which child impact findings should be taken into account in considerations about policy. A decision was made that statistically significant findings that were of a particular magnitude should be considered relevant to policy discussions: specifically, statistically significant child impact findings of a third of a standard deviation.
This threshold sets aside impact findings that are so small that, while they are reliable statistically and warrant continued monitoring over time, may at this point in time have limited importance in terms of children's development. At the same time, the threshold for policy relevance does not require that an impact be large in magnitude (1) in order to meet the criterion. By setting the threshold in this way, we can be reasonably confident that we are being inclusive in identifying instances of possible harm (as well as of possible beneficial effects on children), without focusing on effects that are so small as to be of limited importance for children's development.
In presenting results, we go beyond consideration of significant and policy relevant effects to discuss the patterning of findings. We also identify those impacts for which effect sizes substantially exceeded the threshold for policy relevance, in that effect sizes were .50 or larger. The strongest evidence on which to base conclusions about impacts on children is a consistent patterning of impact results, particularly when impacts meet or exceed the criterion for policy relevance. A patterning of results, for example, might show consistently favorable impacts for families in a particular site, or a particular program approach. A patterning of results might also pertain to a type of child outcome, with findings in one aspect of development (such as health) consistently affected favorably (or unfavorably) across programs.
The NEWWS Child Outcomes Study is being carried out in three of the seven sites of the full evaluation. These sites -- Atlanta (Fulton County), Georgia; Grand Rapids (Kent County), Michigan; and Riverside (Riverside County), California -- were chosen on the grounds that they each conducted at least one round of random assignment at the JOBS office, involve a contrast of all three research groups (labor force attachment, human capital development and control groups), and permit an examination of the JOBS Program as implemented in differing regions of the country (with differing populations and differing economic, as well as social, contexts). Chapter 3 includes a discussion of the site characteristics and a description of how the JOBS Program was implemented in each of the three sites of the NEWWS Child Outcomes Study.
The NEWWS Child Outcomes Study is "embedded" within the larger NEWWS; that is, each of the families in the Child Outcomes Study completed the procedures of the full evaluation, including the component of the full evaluation that involved collection of survey data. Thus, we have baseline data, administrative data, and the two-year follow-up survey data from the full evaluation for these families, and we will eventually have five-year follow-up surveys as well.
For the families participating in the NEWWS Child Outcomes Study, the two and five-year follow-up surveys are more extensive than for other families in the survey sample, including extra sections focusing on the development of the child and on aspects of family life and child care that may be important to child outcomes. In addition, families in the NEWWS Child Outcomes Study are asked at the time of the five-year follow-up for their permission to contact the focal child's primary teacher, in order to ask that the teacher complete a mailed questionnaire concerning the child's academic progress and behavior in school (the "Children's School Progress Survey").
In order to be eligible for inclusion in the NEWWS Child Outcomes Study, families participating in the NEWWS in the Atlanta, Grand Rapids, and Riverside evaluation sites had to meet these additional criteria:
- Each family had to have a child between about 3 and 5 years at the time of enrollment. In each family a child of between about 3 and 5 served as the focal child, or child focused upon in the evaluation. If the family had more than one child in this age range, one was selected randomly to be the focal child.
- Single fathers were excluded.
In all, 5,905 families were identified as eligible for inclusion in the NEWWS Child Outcomes Study. Of these, 3,670 families were selected to be interviewed for the two-year follow-up. Overall, a total of 3,194 (or 87 percent) of selected families completed the two-year follow-up survey, with response rates ranging from 80 percent (in Riverside) to 91 percent (in Atlanta and Grand Rapids).
Four further criteria were established in order for families to be included in the analyses of the two-year follow-up data for the Child Outcomes Study:
- The focal child had to be the biological or adoptive child of the mother participating in the evaluation.(2)
- In order to examine developmental outcomes for children within a particular age range, we excluded families from the sample in which the focal child was beyond 99 months (i.e., eight years, three months) of age at the time of the two-year follow-up.(3)
- Families were not included in the two-year follow-up study if the interviewer would have had to travel 100 miles or more in order to complete the two-year follow-up interview in the family's home. Instead, telephone (rather than in-home) interviews were conducted with families who had moved 100 mile or more away, when the families could be located for such interviews. However, these telephone interviews included only the "core" sections of the interview and did not include the sections specific to the Child Outcomes Study. Because these additional sections (including the assessment of the child's cognitive development) are essential to the examination of child impacts, the Child Outcomes Study analysis sample was restricted to those with in-home interviews including the Child Outcomes Study modules.(4)
- Families were not included in the two-year follow-up Child Outcomes Study data analyses if the interview data indicated that the mother and child had not seen each other for the last three months or more. The mother was the source of some of the child outcome measures and also reported on activities that she engaged in with the child (for example, reading with the child). A separation of three months or more would limit the basis on which the mother could make her assessments of child behavior and report on her involvement in activities with the child.(5)
A total of 176 (or 5.5 percent of) respondents to the two-year follow-up survey were dropped from the Child Outcomes Study analysis sample. Thus, the sample for the present analyses of the NEWWS Child Outcomes Study includes a total of 3,018 families. Of these families, 1,422 are from the Atlanta site of the evaluation, 646 are from the Grand Rapids site, and 950 are from the Riverside site. Chapter 4 describes the characteristics of these families at the time they entered the NEWWS.(6)
For the present report, which focuses on child outcomes at the time of the two-year follow-up, we will rely upon data from the following sources:
- Baseline data: the information on standard client characteristics (collected by welfare staff during routine intake interviews with clients prior to random assignment, including information on such issues as AFDC history, educational background, marital status); the Private Opinion Survey (a client-completed survey of opinions, attitudes, and psychological well-being); and direct assessments of maternal reading and math literacy.
- Administrative data: automated county and state administrative records provide data on earnings, employment, and welfare receipt.
- Two-year follow-up survey: Participants in the NEWWS Child Outcomes Survey received both the "core" interview given to all those receiving the two-year follow-up client survey in the full evaluation sample, and a 20-minute interview specific to the Child Outcomes Study.
Core component. The core interview provides us with maternal report measures of participation in education and training programs, educational attainment, employment, earnings, benefits, and child care use while the mother was employed. Some measures in the core interview also concern the well-being of all of the children in the family. Respondents who were determined at baseline to be in need of education, and who were in the human capital development group or the control group, completed math and literacy tests as part of the two-year follow-up as well.
Child Outcomes Study component. The component of the interview specific to the Child Outcomes Study is the source of maternal report measures of the focal child's health and social development, as well as of a direct assessment of the focal child's cognitive school readiness. The specific child outcome measures are described below. This component of the survey also provides maternal report and interviewer rating measures of the home environment and of the mother-child relationship, and maternal report measures of the focal child's child care participation, the mother's psychological well-being, household composition, and the receipt of child support and the child's contact with the father.
While the present report focuses on outcomes two years after random assignment in all three of the sites included in the NEWWS Child Outcomes Study, we note that two special studies have been conducted that involve a subset of the Child Outcomes Study sample specifically in the Atlanta site. In order to provide a descriptive portrayal of families with young children close to the start of the evaluation, 790 families in the Atlanta site participated in an additional wave of data collection called the Descriptive Survey, on average three months after baseline. A descriptive account of these families and of the children's development shortly after the start of the evaluation is presented in a report entitled How Well Are They Faring? AFDC Families with Preschool-Aged Children in Atlanta at the Outset of the JOBS Evaluation (Moore, Zaslow, Coiro, Miller, and Magenheim, 1995).
A second special study, the JOBS Observational Study, is also being conducted in the Atlanta site among a subsample of families from the Descriptive Study. This study is supported by a consortium of private and public funders including the Foundation for Child Development, the William T. Grant Foundation, the George Gund Foundation, an anonymous funder, and (for pretest work only) the U.S. Department of Health and Human Services. The study seeks to provide detailed and sensitive measures of mother-child interaction at two points in time: soon after baseline (4-6 months), and a period of years after baseline, when longer-term effects of the program can be assumed to have occurred (4 ½ years after baseline). The goals of this study are to ask whether the JOBS program affects mother-child interaction during the early months of the program, and at a point in time when longer-term adaptations to the program have been made; to examine the role of parenting behavior in shaping any impacts of JOBS on children's development; and to assess the contributions of different approaches to measuring parenting behavior (observational; interview-based) in the context of an evaluation study.
Procedures of the Two-Year Follow-Up Survey
All families participating in the survey waves of the NEWWS, including the families in the Child Outcomes Study, were told at the time of random assignment that they would be contacted for follow-up interviews. Families were then sent letters when it was time to contact them for the two-year follow-up survey. Interviewers set up appointments for the follow-up survey either by contacting respondents by phone, or by going directly to the respondent's home (for example, because the respondent did not have a telephone, or because the interviewer had difficulty reaching the respondent by phone). Interviews were conducted in the respondents' homes. The in-home survey, including the core as well as the component specific to the Child Outcomes Study, lasted about one and a half hours (range of from half an hour to four and a half hours). During the visit:
- mothers responded to a series of interviewer questions and also completed some questions in a self-administered questionnaire format;
- mothers who had been considered in need of education at the time of random assignment, and who were in the human capital development or control groups of the evaluation, were given an assessment of reading and math literacy;
- children were given a direct assessment of cognitive school readiness, the Bracken Basic Concept Scale/School Readiness Composite; and
- interviewers completed a series of ratings about the home environment, about mother-child interactions, and about the circumstances of the interview (e.g., whether the interview had been interrupted).
Respondents were given a $20 incentive for their participation, and the focal children in the study were also given a small gift (a travel Etch-a-Sketch).
In fielding the study, efforts were made to recruit interviewers from ethnic/racial backgrounds similar to those of the respondents in each site, and all interviewers were female. Bilingual interviewers (Spanish-English) were available in the Grand Rapids and Riverside sites, and all survey instruments were translated into Spanish (with checks on the translation carried out via oral back-translation). If a respondent in one of these sites indicated a preference for the interview to be conducted in Spanish, this was done. The child assessment, the Bracken Basic Concept Scale/School Readiness Composite, was also available in a Spanish version. An adult respondent who did not speak English was not asked to complete the literacy and math tests, because these involved assessing these skills as they would be used in an English-speaking context.
Interviewers all participated in a three-day long training meeting in which they received instruction not only in the administration of the survey modules, but also intensive training in the administration of the child cognitive assessment, the adult literacy assessment, and completing the ratings of the home environment and mother-child interaction. Training for the ratings of mother-child interaction employed a training videotape, and training for ratings of the home environment used photographs of home settings as exemplars.
Data quality was monitored intensively during the first months of fielding through detailed review of completed surveys (including assessments and ratings) in each of the three sites by staff from Child Trends, Manpower Demonstration Research Corporation, and the fielding organization, Response Analysis Corporation. Interviewers were contacted directly regarding any problems in their administration of the follow-up. After the initial months of fielding, ongoing quality control involved review of key questionnaire items and sections on every interview. If information was missing or inconsistent with data provided elsewhere in the interview, staff from Response Analysis Corporation re-contacted the respondent directly or had the original interviewer re-contact the respondent. Discussions occurred periodically between the fielding organization and the research staff at Child Trends regarding any particular child-related questions or issues that emerged during fielding. Response Analysis Corporation also verified the completion of 20-40 percent of each interviewer's sessions.
Child Outcome Measures Included in the Two-Year Follow-Up
The two-year follow-up included measures of children's development in three broad aspects, or "domains," of development:
- cognitive development and academic achievement
- emotional and behavioral adjustment, including both behavior problems and positive social behavior; and
- physical health and safety.
We note that some measures of development are available for the focal child in each family only, and some are available pertaining to any child in the family. Below, we briefly describe the measures used to assess development in each of these domains. A number of the measures pertaining to any child in the family come from the core instrument, and thus were asked of all families in full NEWWS sample participating in the survey component.(7) We note that other measures central to our analysis of child outcomes are described in detail in other chapters. In particular, Chapter 8 lists and describes the particular measures we use in assessing how child impacts come about (i.e., asking whether variables such as total family income, maternal depression, and participation in child care changed in response to JOBS, and whether changes on these variables help explain impacts on child outcomes).
I. Measures of Cognitive Development and Academic Achievement
- The Bracken Basic Concept Scale/School Readiness Composite (BBCS/SRC) (focal child only): This is a direct assessment of school readiness administered to the focal child during the two-year follow-up survey. While the full Bracken Basic Concept Scale consists of 11 subtests, in the Child Outcomes Study we use only the 5 subtests comprising the School Readiness Composite (for which administration averaged about 15 minutes in the present study). These 5 subtests assess the child's knowledge of colors, letters, numbers/counting, comparisons, and shapes. Previous research provides evidence of reliability as well as validity for the Bracken Basic Concept Scale. The New Chance Evaluation, a study examining the development of the young children of teenage mothers on welfare, found the School Readiness Composite to be highly correlated with the full Bracken scale, and also reported significant correlations between the School Readiness Composite and teacher reports of the children's academic progress (Polit, 1996). Within the NEWWS Child Outcomes Study sample, internal consistency reliability(8) for the School Readiness Composite was high (.97). When describing the cognitive school readiness of focal children in the control groups of the Child Outcomes Study and comparing these children to the national standardization sample (see Chapter 5), we will report on BBCS/SRC standardized scores. When presenting program impacts, we will report on focal children's raw BBCS/SRC scores, indicating the number of concepts (out of a possible 61) that the child answered correctly. We used raw scores in impact analyses in order to describe experimental-control group differences in meaningful units (i.e., the number of concepts answered correctly). Because raw scores are not adjusted for the age of the focal child (and since older children tend to answer more concepts correctly than younger children), focal child's age was controlled in all impact analyses involving children's raw BBCS/SRC scores.
In this report, we report on mean BBCS/SRC scores, as well as on the proportions of children scoring at the "low" and "high" ends of the distribution of raw scores. Specifically, we determined that the bottom quartile of the national standardization sample had an age-standardized score of 7.8 or lower. Thus, children in the Child Outcomes Study sample with age-standardized scores of 7.8 or lower were categorized as scoring "low" on the BBCS/SRC. In a parallel fashion, we determined that the top quartile of the national standardization sample had an age-standardized score of 12.2 or higher. Thus, children in the Child Outcomes Study sample with age-standardized scores of at least 12.2 were categorized as scoring "high" on the BBCS/SRC.
- Academic Problems in School Rating (focal child/any child in the family): This is a dichotomous (yes/no) measure, calculated separately for the focal child and for any child in the family. A "yes" on this measure indicates the presence of either one or both of the following academic problems in school: (1) the focal child/any of the children in the family has repeated any grade for any reason; (2) the focal child/any of the children in the family goes to a special class or special school or gets special help in school for learning problems. A score of zero indicates that neither problem has occurred, while a score of one indicates that at least one has occurred.
In previous research, it has proven useful to have both measures of children's cognitive development obtained via direct assessment, and measures of children's academic progress/difficulties. For example, evaluations of early childhood interventions sometimes show enduring impacts on measures like those included here (retention in grade and use of special services for help with learning problems) even in instances where initial positive program impacts on assessments of cognitive development have not been shown to endure over time (Barnett, 1995; Entwistle, 1995). Because of previous evidence that programs can have differential impacts on assessments of cognitive development and measures of school progress, both kinds of measures are included here.
II. Measures of Behavioral and Emotional Adjustment
- Behavior Problems Index (focal child only): In the Behavior Problems Index the mother is asked to indicate whether statements are not true, sometimes true, or often true about her child. These describe such behaviors as: the child is high strung, tense or nervous; the child cheats or tells lies; the child has trouble getting along with other children. Previous work with the Behavior Problems Index indicates high internal consistency reliability. Further, this measure discriminates between children who have and have not received clinical treatment (Zill, 1985).
In the present analyses, we use a total score and subscale scores for externalizing behavior problems (such as bullying, lying, and cheating) and internalizing behavior problems (such as feeling unhappy, sad, or depressed, and feelings of worthlessness). The choice of these particular subscales was made in light of factor analyses with the Behavior Problems Index items in the present sample, and also in light of previous work by Peterson and Zill (1986). In the present sample, internal consistency reliability coefficients were .89 for the total scale; .61 for the externalizing subscale; and .64 for the internalizing subscale.
In this report, we present means on these three measures, as well as the proportions of children scoring at the "low" and "high" ends of the distributions of these three measures, based on a national sample of five- to seven-year-old children. Specifically, we determined that the 25th percentile of total behavior problem scores for five- to seven-year-old children in the full National Longitudinal Survey of Youth-Child Supplement (NLSY79-CS) sample was 0.18, and we categorized Child Outcomes Study children as having "infrequent" total behavior problems if they scored this low or lower on the BPI. Similarly, we determined that the 75th percentile of total behavior problem scores for five- to seven-year-old children in the full NLSY79-CS sample was 0.54, and we categorize Child Outcomes Study children as having "frequent" total behavior problems if they scored 0.54 or higher.
With respect to externalizing behavior problems, we determined that the 25th percentile of scores for five- to seven-year-old children in the full NLSY79-CS sample was 0.20, and we categorized Child Outcomes Study children as having "infrequent" externalizing behavior problems if they scored this low or lower. Similarly, we determined that the 75th percentile of externalizing behavior problem scores for five- to seven-year-old children in the full NLSY79-CS sample was 0.60, and we categorized Child Outcomes Study children as having "frequent" externalizing behavior problems if they scored 0.60 or higher.
Finally, for internalizing behavior problems, we determined that 55 percent of five- to seven-year-old children in the full NLSY79-CS sample were reported to have no internalizing behavior problems; we similarly categorized Child Outcomes Study children at the "low" end if they had no internalizing behavior problems. We determined that the 75th percentile of internalizing behavior problem scores for five- to seven-year-old children in the full NLSY79-CS sample was 0.20, and we categorized COS children as having "frequent" internalizing behavior problems if they scored 0.20 or higher.
- Positive Child Behavior Scale, Social Competence Subscale (PCBS/SCS) (focal child only): The Positive Child Behavior Scale was used in the present evaluation, as in other evaluation studies, to assure that program effects on positive social behaviors (and not only effects on problem behaviors) could be assessed. The Positive Child Behavior Scale was developed by Polit for the New Chance Evaluation (Polit, 1996), using modifications of items from existing scales so as to be appropriate for a sample of disadvantaged mothers. As for the Behavior Problems Index, the mother is asked to indicate whether behaviors are not true, somewhat true, or often true of her child. Seven items from the Social Competence Subscale developed by Polit were included in the Child Outcomes Study. Examples of the behavioral descriptions in these items are: the child is helpful and cooperative; shows concern for other people's feelings; is admired and well-liked by other children. Internal consistency reliability of this 7-item version of the subscale within the present sample is strong (coefficient alpha of .77). In addition, as would be expected, scores on this subscale are significantly negatively correlated with the total score on the Behavior Problems Index, though the magnitude of the correlation is moderate rather than high, suggesting that the two measures are related but not highly overlapping (r=-.29, p<.001).
In this report, we present means on this measure, as well as the proportions of children scoring at the "low" and "high" ends of the distributions of the PCBS/SCS, based on estimated scores of a national sample of five- to seven-year-old children. Because national data on the PCBS/SCS are not available, we calculated z-scores on the Behavior Problems Index, identified the corresponding raw score on the PCBS/SCS, and categorized Child Outcome Study children as "low" or "high" in terms of positive behaviors if their mean scores fell below the low or above the high cutoffs. Specifically, Child Outcomes Study children with a mean PCBS/SCS score of 1.28 or lower were categorized as having "infrequent" positive behaviors, and COS children with a mean PCBS/SCS score of 1.85 or higher were considered to have "frequent" positive behaviors.
- Emotional Problems Rating (focal child/any child in family): This is a dichotomous (yes/no) measure. A "yes" indicates the presence of any one or more of the following, according to maternal report: (1) the focal child/any child in the family is currently getting help for any emotional, mental or behavioral problem; (2) since random assignment in the evaluation, the mother feels or someone has suggested that the focal child/any child in the family needs help for any emotional, mental or behavioral problem; (3) the focal child/any child in the family goes to a special class or special school or gets special help for behavioral or emotional problems. A score of 1 indicates that the focal child/any child has had at least one of these problems, while a score of 0 indicates that none of these is true. As an indication of convergent validity for this measure, the Emotional Problems Rating for the focal child was significantly and positively correlated with the total score on the Behavior Problems Index (r=+.34, p<.001).
- Suspended/Expelled from School (focal child/any child in family): This measure is based on a single maternal response item, recorded separately for the focal child and regarding all the children in the family: "Have any of your children ever been suspended, excluded or expelled from school? Was that [the focal child] or another child?" A score of 0 indicates that the focal child/any child in the family has not been suspended, excluded or expelled from school, while a score of 1 indicates that this has occurred. As an indication of convergent validity for this measure, for focal children there was a significant positive correlation between Suspended/Expelled and the Emotional Problems Rating (r=+.15, p<.001).
III. Measures of Child Health and Safety
- Child Health Rating (focal child only): The mother provided a rating of her child's overall health in response to the single interview question: "Would you say that your child's health in general is excellent, very good, good, fair or poor?" This measure has been widely used(9). Validation work indicates that this health rating primarily reflects physical health problems (Krause and Jay, 1994).
In the present report, we analyze this measure in two ways: (1) we compute mean scores on the full 1-5 rating; and (2) we dichotomize the measure, distinguishing between focal children with a rating of excellent and very good (to which we assigned a score of 1), and those with a rating of good, fair or poor (to which we assigned a score of 0). In analyzing data from the National Health Interview Survey, Coiro and Zill (1994) found it useful to consider the percentage of children given a "favorable" health rating by their mothers (rated as in excellent health and with no limiting condition). For example, in this national dataset, the percentage of children given a favorable health rating was found to differ by family income, parental education, metropolitan vs. rural residence, and race/ethnicity. Children from poor families were much less likely to receive a favorable health rating. In keeping with these findings, in the present analyses we will consider not only mean scores on the child health rating completed by the mothers, but also the proportion of children receiving favorable scores, defined here as ratings of either "excellent" or "very good" health.
- Accident or Injury (focal child/any child in family). This measure was based on a single maternal report item, answered separately for the focal child and for all children in the family: "Since [the date of the respondent's random assignment in the evaluation], has the focal child/any of your children had an accident, injury, or poisoning requiring a visit to a hospital emergency room or clinic?" A score of 0 indicates that the focal child/any child in the family did not have such an accident or injury, while a score of 1 indicates that such an accident or injury did occur since random assignment. There is a low but significant negative correlation between the occurrence of an accident and/or injury for the focal child, and the focal child's overall health rating (r=-.06, p<.001), such that children who had had an accident and/or injury had lower overall health ratings.
Strategy of Analysis
The data analyses that we will present in the following chapters follow a progression across:
- descriptive analyses
- analyses of aggregate impacts
- examination of impacts for subgroups, and
- explanatory analyses.
Below we briefly describe the aim and approach for each of these types of data analyses.
I. Descriptive Analyses
The goal of descriptive analyses is to provide a portrayal of the families and children in the sample apart from any effects of JOBS. Descriptive data on sample characteristics (presented in Chapter 4) are based on the information collected from respondents prior to random assignment. (10) For these analyses, as for all analyses in this report, we present findings separately for the Atlanta, Grand Rapids, and Riverside research sites. However, since we are relying on data collected before respondents were randomly assigned for these particular descriptive analyses, we combine the data across the three research groups (labor force attachment, human capital development, and control groups), and present summary figures. Thus, for example, we present the percentage of mothers with differing levels of educational attainment in each site, and we summarize the mean ages of mothers and children in each site, using baseline data.
Descriptive data on the developmental status of the children (presented in Chapter 5) are based on the child outcome measures described above that were collected as part of the two-year follow-up. Because the intent of providing descriptive data on the children is to portray their development apart from the effects of the program (in order to provide a context for interpreting subsequent findings on child impacts), we restrict our focus here to children in the control groups, the group in each site unaffected by exposure to JOBS. In presenting this descriptive portrayal of the developmental status of the children apart from JOBS, we will sometimes draw upon "benchmark data," or data for the same child outcome measures collected in other samples. For example, the measure of child behavior problems used at the two-year follow-up, the Behavior Problems Index, was also used in a national survey, the National Longitudinal Survey of Youth-Child Supplement. Behavior Problems Index findings for children of the same ages from the National Longitudinal Survey of Youth-Child Supplement can help us get a sense of whether the children in the control group of our sample have more or less frequent behavior problems, compared to children in a national sample.
II. Examination of Aggregate Impacts at Each Site
Having given a descriptive portrayal of the families in the sample and of the developmental status of the children apart from JOBS, we turn to an examination of program impacts on the children's developmental outcomes. A program impact reflects the average difference between families in an experimental group and families in the control group on a given outcome measure. Our examination of program impacts will contrast each experimental group (labor force attachment, human capital development) with the control group separately. We will carry out these contrasts separately within each site.
In examining program impacts, we first consider aggregate impacts. An aggregate impact reflects the difference, in a given site, between the average score on a particular measure for all of the families in one of the program groups, and all of the families in the control group. That is, in examining aggregate impacts we are asking whether, for a particular measure, there is a program impact for a research group as a whole, in a given site. In section III below, we describe analyses aimed at assessing whether program impacts occur in specified subgroups, in addition to or rather than for a research group as a whole, in each site.
We include in these analyses all of the families assigned at random assignment to the research groups of interest. Thus, for example, we consider all of the families assigned to the human capital development group whether or not they actually participated in basic education, job training, or employment activities, and contrast this group with all of the families assigned to the control group.(11) These group contrasts thus reflect, on the average, experiences of families in the different research groups in light of whether they were assigned to a JOBS program group, rather than according to their actual participation in program components.
All analyses of aggregate impacts will be reported separately by site and by program approach.(12) When the examination of impacts involves a continuous dependent variable (for example, children's scores on the assessment of cognitive school readiness), we have carried out ordinary least squares multiple regression. In these analyses, we examined each child outcome measure separately as a dependent variable, and included an experimental comparison "dummy" variable (i.e., either labor force attachment vs. control, or human capital development vs. control) as an independent variable to test program impacts. In each of these analyses we used a common set of covariates to improve the precision of the impact estimate by controlling for variation on background characteristics.(13) These covariates were chosen in communication with researchers at the Manpower Demonstration Research Corporation, so as to coordinate the present analyses of child outcomes with the analyses of economic outcomes at the two-year follow-up point being carried out with the larger NEWWS sample.
Where the examination of aggregate impacts involved a dichotomous child outcome variable (for example, in examining whether or not the focal child had any academic problems) rather than a continuous measure, the analysis was carried out using logistical regression. Again, each experimental group was contrasted with the control group separately; analyses employed the common set of covariates; each child outcome was examined in a separate analysis; and all analyses were carried out separately by site and program approach.
When we report that JOBS had a statistically significant impact on a child outcome, this indicates that the mean difference on a continuous outcome variable (for example, on the assessment of child cognitive development), or a difference in the proportion of children receiving a rating of one on a dichotomous variable (for example, the proportion of children with one or more academic problems), is unlikely to have arisen simply by chance. We will follow the convention of reporting an effect as statistically significant when data analyses indicate that there was a smaller than 10 percent probability that the finding could have arisen by chance, that is, reflected random variation in individuals' scores.
Tables reporting on aggregate impacts will note with a "+" superscript those effects that have less than a 10 percent probability of having arisen by chance. One asterisk will indicate effects that have less than a 5 percent probability of having arisen by chance, two asterisks will indicate a less than 1 percent probability, and three asterisks a less than one-tenth percent probability.(14)
As noted above, this report will also examine impacts on children from the point of view of whether they are of sufficient magnitude for policy makers to consider when developing policy. Such a "policy relevant" impact was defined, statistically, as one in which the effect size was at least one-third of a standard deviation on a given measure.(15),(16) While the "harm hypothesis" directs us to identify unfavorable, and especially "policy relevant," program impacts on children, we acknowledge that policy relevant program impacts may occur in a positive as well as negative direction.
In all discussions of impacts, the patterning of results (according to developmental domain, site, and/or program approach) will also be taken into account. Thus, while we present all statistically significant program impacts on children, we concentrate our discussion on impacts that show a distinct pattern, as well as impacts that are of sufficient magnitude to meet the criterion for policy relevance.
III. Examination of Subgroup Impacts at Each Site
As noted in Chapter 1, an important possibility is that JOBS will affect subgroups of families differently. In order to examine this possibility, we will go beyond the consideration of aggregate impacts to consideration of impacts for specified subgroups. Subgroups are delineated according to characteristics of the families at baseline and are categorized into "lower-risk" and "higher-risk" based on these variables. In an effort to minimize the number of subgroups examined and maximize the clarity of findings, information from ten baseline variables was drawn upon in creating higher and lower-risk subgroups of four different types. We refer to each approach to defining higher and lower-risk subgroups as a "risk composite" because each is based on multiple rather than individual baseline variables:
- information on the number of children in the family and on the age difference between the focal and next oldest or next youngest child was used to create a "sibling constellation risk" composite;
- items relating to depression and locus of control form a "maternal psychological well-being risk" composite;
- information on mothers' educational attainment (i.e., at least a high school degree or GED), literacy, and numeracy was used to create an "educational risk" composite; and
- information relating to practical barriers to employment (e.g., health problems, lack of transportation or child care), duration of welfare, and employment history was used to create a "work risk" composite.
Thus, for example, we ask whether JOBS programs had effects on children in families in which the mother was at higher and lower risk in terms of indicators of psychological distress at baseline; in which the mother was at higher and lower educational risk; in which the mother was at higher and lower work risk; and in families with more or closely spaced children and with fewer or less closely spaced children. For each of the composite risk measures, families were categorized in a mutually exclusive way, as at either higher or lower risk. Families could be categorized as at higher risk on more than one of the risk composites.
The particular baseline variables that formed the basis of the composite risk measures were chosen from the far longer list of available baseline variables on two grounds:
- We hypothesized that mothers varying on these particular baseline variables might respond differentially to JOBS, which could in turn have implications for child impacts; and
- These baseline variables have been documented to be important to the development of children.
Thus, for example, mothers showing few or many indicators of psychological distress at baseline might well differ in their ability to mobilize to respond to the requirements of JOBS. At the same time, there is ample evidence to indicate that maternal psychological distress is an important predictor of children's developmental outcomes (Downey and Coyne, 1990).
In addition to creating these composite risk measures, a summary index of cumulative risk was created, reflecting the number of composite risk factors at baseline for which a family was at higher risk. The risk summary score could range from 0 to 4, with a point assigned when:
- the family was in the higher-risk category on the composite for sibling configuration risk (the child was in a family with three or more children at baseline, or in a family with two children born less than two years apart);
- the family was in the higher-risk category on the composite for maternal psychological well-being risk (the mother received a score of at least two out of seven on a set of baseline indicators of depression and feelings of a lack of control over one's life);
- the family was in the higher-risk category on the composite for work risk (the mother had received welfare for five or more years, or had no history of having worked full time for the same employer for six months or more, or reported at least four of seven barriers to employment);
- the family was in the higher-risk category on the composite for educational risk (the mother had less than a high school diploma or GED; or scored in the lower levels of literacy, or scored in the lower levels of numeracy).
Families experiencing none or one of these baseline composite risks were considered to be at lower cumulative risk, while families with two to four of these composite risks were considered to be at higher cumulative risk.
In addition to the creation of higher and lower-risk subgroups in terms of sibling configuration, educational risk, work risk, maternal psychological well-being risk, and cumulative risk, we examined three further approaches to delineating risk on a more exploratory basis: age of child, maternal attitudes about working, and maternal attitudes toward school. While theoretically important, there is less empirical evidence to suggest that these constructs provide meaningful bases for identifying risk within the present sample. These variables allow us to distinguish among families in which the mother had more and fewer reservations about working (with more reservations about working hypothesized to reflect higher risk); more and less positive attitudes toward school (with less positive attitudes about school hypothesized to reflect higher risk); and in families in which the focal child was the median age or younger at baseline or older than the median age at baseline (with younger child age hypothesized to reflect higher risk). As will be seen in Chapter 7, analyses of child outcome measures for control group families supported the use of only one of these more exploratory bases for grouping families as a risk measure: "attitudes toward work" risk. For this but not the other more exploratory measures, children's scores in the three sites' control groups consistently went in a direction indicating less favorable development in the group hypothesized to be at greater risk.
Table A-1 (in Appendix A) provides the definition and sample sizes for each of the baseline subgroups at each of the research sites.
The examination of subgroup impacts focuses on effects within a particular higher or lower-risk subgroup. For example, we consider impacts on child outcome measures in the subgroup of families at higher educational risk. Within this baseline subgroup, we ask whether families in one of the experimental groups have mean or proportion scores on child outcome measures that differ significantly from the scores of families in the control group. We then ask the same question for the subgroup of mothers in the subgroup at lower educational risk. In the same way, we ask whether there is evidence of significant program impacts within the higher and lower-risk subgroups in terms of work risk, maternal psychological well-being risk, sibling configuration risk, cumulative risk, and the more exploratory approaches to delineating risk (especially "attitudes toward work" risk).
Apart from the delineation of a particular subsample to focus upon as the sample for each subgroup impact analysis, we follow the same strategy here as was noted for aggregate impacts. For example, we use the same set of covariates in all analyses; we carry out ordinary least squares multiple regression or logistical regression in keeping with the nature of the outcome variable examined; and we reporting findings separately by site and program approach.
IV. Explanatory Analyses
Having identified the child outcomes for which there are significant aggregate impacts and impacts for specified subgroups, the focus of analysis will shift to the question of what underlies the program impact findings for children? In a modest set of non-experimental analyses, we will examine the pathways through which particular JOBS programs appear to have affected children using mediation analyses (Baron and Kenny, 1986). The first step requires identifying the child impacts that we wish to examine. We do not attempt to explain all significant program impacts on children; rather, for these mediational analyses, at least one aggregate impact in each developmental domain (i.e., cognitive development and academic achievement; behavioral and emotional adjustment; physical health and safety) was selected that generally illustrates the pattern of results for that domain. In order to conclude that a program impact on a targeted or non-targeted outcome helps to explain statistically, or "mediates," the same program's impact on a given child outcome, three conditions must hold (see Baron and Kenny, 1986): (1) the adult outcome must, itself, be affected by the JOBS program being considered; (2) the adult outcome must predict the child outcome (with the JOBS program dummy also in the model), and (3) with this adult outcome variable in the model, the previous impact of a JOBS program on the given child outcome must be smaller than without this variable in the model.
We should emphasize that, while we draw conclusions regarding the degree to which adult impacts appear to have led to impacts on children, the adult impacts we examine as possible mediators of program impacts on children were measured concurrently with children's outcomes; that is, both adult and child outcomes were measured at the two-year follow-up. Thus, any causal conclusions regarding the pathways through which children were affected by their mother's assignment to a JOBS welfare-to-work program must be made cautiously. Information from the five-year follow-up will allow us to examine the chronological nature of program impacts. This subsequent wave of data, combined with more rigorous statistical techniques that allow the direct testing of alternative hypotheses regarding pathways of program impacts, will improve our ability to identify the ways in which children were affected by JOBS welfare-to-work programs.
1. Researchers in the behavioral sciences often rely on Cohen's (1988) characterization of effect sizes (in standard deviation units) of .20 as "small," .50 as "medium," and .80 as "large."
2. Sixty-four cases were dropped because the focal child was not the respondent's biological or adoptive child. (There is only one adoptive child in the Child Outcomes Study sample.)
3. Two children were too old, and one child was too young, to be focal children; child ages must have been incorrectly reported at baseline and their families should not have been selected for the Child Outcomes Study in the first place.
4. A total of 69 families - all in Riverside - were dropped from the sample because they had moved 100 or more miles away from Riverside County.
5. A total of 67 mothers reported living away from the focal child for at least three months at the time of the two-year follow-up.
6. Despite the fact that not all families eligible and randomly assigned at baseline are contained in the sample for the present report, the "fidelity" of random assignment was maintained -- that is, there is no systematic difference between the experimental and control groups on baseline characteristics - with one exception. In Riverside, among those identified as "in need" of basic education, those assigned to the labor force attachment program differed from those assigned to the control group on a few background characteristics. However, neither group can be considered uniformly more advantaged or less advantaged since, on some characteristics (e.g., prior employment ), the control group appeared more advantaged, whereas on other characteristics (e.g., maternal psychological well-being), the LFA group appeared more advantaged. Moreover, these differences were controlled statistically in all impacts analyses by including the variables on which these groups differed as covariates.
7. A special "synthesis" report (Hamilton, with Freedman and McGroder, 2000) draws together the findings relating to any child in the family from the present Child Outcomes Study report, and these "any child in the family" items from the full NEWWS sample.
8. Internal consistency reliability indicates the extent to which the individual items that make up a scale, all of which should reflect the same hypothetical underlying construct, are interrelated or "hang together" statistically. The measure used to reflect internal consistency reliability, Cronbach's alpha, has a possible range of 0 to 1.0, with higher scores indicating better internal consistency reliability.
9. See the National Health Interview Survey, the National Health and Nutrition Examination Survey, the Rand Health Insurance Experiment, the Medical Outcomes Study, and the Child Health Questionnaire (Krause and Jay, 1994; Landgraf, Abetz, and Ware, 1996).
10. Missing baseline data occurred on selected items from the Private Opinion Survey (POS), which measured clients' attitudes toward welfare, their psychological well-being, and the barriers to employment they faced. Because these baseline variables were important for impacts analyses -- both as covariates and, in subgroup impacts analyses, in defining baseline subgroups -- we imputed values where data were missing. In addition to relying on information regarding site in imputing these data, we selected other POS attitudinal variables to use as the basis for imputation, after examining which particular POS variables were most highly correlated with the variables for which we were imputing scores. Specifically, imputation was done based on data regarding site, JOBS office at random assignment, number of baseline risks, and high school degree status. The descriptive portrayal of families in Chapter 4 do not rely on imputed data.
11. As we have noted, however, in the Riverside site, members of the human capital development group were contrasted only with members of the control group considered, at baseline, to be in need of basic education (i.e., those without a high school diploma or GED, who demonstrated lower levels of literacy, and/or were not proficient in English at baseline. Hamilton et al., 1997).
12. Impact analyses were weighted to adjust for cohort differences in the assignment of clients to a treatment stream or to the control group (to preserve the experimental design), as well as to allow generalizations to populations from which the evaluation sample was drawn, namely, the county's AFDC-eligible population. Additional factors entering into the weighting were the number of JOBS offices (Riverside had more than one), high school/GED status, and cohort differences in Atlanta. Weights were decided upon in collaboration with researchers at MDRC, to assure common analytic approaches in the Child Outcomes Study and the NEWWS.
13. Model covariates included were: marital status, number of children, race, mother's age, average AFDC benefit per month, number of months received AFDC in prior year, focal child's age and gender, high school diploma or GED, literacy, numeracy, time on welfare, work history, depressive symptoms, locus of control, sources of support, family barriers, and number of baseline risks.
14. All tests of program impacts were "two tailed." That is, we did not begin with a hypothesis about direction of effects (for example, that scores for children in the human capital development group would be better than those in the control group), but rather considered the possibility of effects in either a positive or negative direction.
15. Standard deviations were calculated separately for each site's control group(s), yielding a criterion for policy-relevance that is identical in a relative sense (i.e., .33 of a standard deviation) but that varies in an absolute sense, depending on the distribution of the measure in the particular site's control group.
16. For example, on the Bracken Basic Concept Scale/School Readiness Composite, which ranges from 0 to 61, a difference as small as 3.6 points (in Atlanta), 3.8 points (in Grand Rapids), 4.2 points (for the impact of Riverside's LFA program), and 4.3 points (for the impact of Riverside's HCD program) - representing about four school readiness concepts relating to colors, letters, numbers and counting, comparisons, and shapes -- would be considered policy relevant. As another example, regarding the proportion of focal children in "very good" or "excellent health," a difference of at least 13.5 percentage points (in Atlanta), 13.1 percentage points (in Grand Rapids),13.2 percentage points (for Riverside's LFA program), and 13.4 percentage points (for Riverside's HCD program) is considered policy-relevant. In fact, for dichotomous outcomes, one-third of a standard deviation actually represents a relatively large impact in absolute terms.