In addition to the specific suggestions for future research at the end of each section above, there are several comments regarding future research that apply in general based on the review of this corpus of research. From a substantive point of view, more focused attention should be paid to the needs of some subgroups of children, especially ELLs. As aforementioned, although most studies did a commendable job of including a diverse population of children in their studies, impacts on subgroups were seldom examined. In part, this may be an issue of research design, since effectiveness studies are designed to provide an overall mean, and are often not powered to be able to detect subgroup differences. However, future research should certainly focus on the specific needs of these children, who make up more and more of the population of children in early care and education settings.
Another substantive issue that should be addressed in future research is trying to determine the active ingredients in those interventions for which positive effects were found. Because most of the PCER interventions, for example, incorporated multiple components, when effects were found, it was not possible to identify which component had led to the positive effect. More fine-tuned research would be able to disentangle the effects of various components and move the field forward in terms of identifying the most critical ingredients of interventions. In addition, the NELP, a much more extensive review, should be able to provide further insights into this question.
From a methodological perspective, it was quite remarkable that there were more than a dozen randomized controlled trial (RCT) studies of early childhood interventions to review. On the one hand, the national push for more rigorous research in the field has certainly increased the number of RCTs that have been implemented and, in this way, has improved the rigor of the research available. On the other hand, effectiveness research studies utilizing RCT designs have their own set of limitations. For example, in terms of statistical power, it is clear that in order to detect the types of effects on children that we would expect across one school year, sample sizes must be fairly large. Although RCTs require fewer units of randomization than say, regression discontinuity designs, it is still the case that in order to detect small effects, sample sizes must typically be in the range of 60 units with nested designs (observations within children within teachers, for example). Since randomization often occurs at the center level to avoid contamination across teachers within the same center, this can be quite a challenge for most researchers. One way to decrease sample size requirements is to conduct random assignment at the child level. This alternative, however, is not always practically or pragmatically feasible.
In addition, there is a trade-off between internal and external validity. Although the strength of RCTs is their high internal validity, they can suffer from low external validity. Especially in early care and education settings, when researchers are often limited to creating their study samples based on those who agree to participate from their overall recruitment efforts, generalizability can still be quite limited and therefore less policy relevant.
Meaningful detectable effect is another methodological issue that arises after reading these studies. In general, effect sizes were reported in terms of Cohen's d, and Cohen's guidelines for what is considered small (.20), medium (.50), and large (.80) are used. However, unless the author reports what the range of the assessment is and what the expected growth across a school year is, it is difficult to make a judgment about the substantive meaning of a .20 versus a .50 versus a .80 effect size. What does this mean in real world terms? What is a meaningful effect size? How does that vary by assessment or domain? Without diminishing the advances made in the field in the reporting of effect sizes, it would be helpful to also report a translation of Cohen's d into assessment-relevant terms, such as months of growth.
Finally, the PCER and ISRC initiatives have certainly made huge strides in terms of providing examples of conducting evaluations of programs and practices in real world settings. Lessons learned from these initiatives will make an important contribution to the field, both substantive and methodological. Lessons learned could address the wealth of knowledge of the implementers after having done these studies; suggest possible hypotheses for effects, or lack of effects, on child outcomes; and provide direction for future rigorous studies, of which there are certain to be more.
From a policy perspective, the issue of cost was not addressed in any of the studies that were reviewed. In line with the suggestion above regarding cost-benefit analysis in terms of achieving positive child outcomes, research on the cost of implementing the interventions would be useful for policy makers and educators.
"index.pdf" (pdf, 307.32Kb)
"apa.pdf" (pdf, 78.45Kb)
"apb.pdf" (pdf, 401.85Kb)
"apc.pdf" (pdf, 288.78Kb)
"apd.pdf" (pdf, 90.04Kb)
"ape.pdf" (pdf, 190.7Kb)