Requirements for Evaluation. Successful and useful evaluations require that a number of conditions be met:
- The program should have a well articulated set of objectives and a well specified set of activities to achieve them. There should be a "theory of change" that is logical and believable, such that it is credible that the intended objectives will be met. This is a particularly difficult requirement for programs in child welfare, since the problems dealt with are serious and not easily overcome or managed. A corollary of this requirement is that it should be likely that the evaluation will have positive findings, that is, that the program will prove successful.5
- The program should be generalizable. It should be plausible that it could be implemented in other locations with differing contexts, including differing legal and regulatory frameworks. It should be possible to implement the program elsewhere at a scale such that it will make a difference.
- There should be the possibility of implementing a rigorous evaluation, providing the best possible foundation for policy and practice decisions.
- The program should have some promise of cost-effectiveness. Without this requirement it is unlikely that it would be adopted elsewhere.
- The objectives should be measurable, they should be observable within a reasonable period of time after the completion of the intervention and there should be observable indications of their achievement.
- It is necessary that the program be large enough to provide enough subjects for adequate statistical power to detect important effects. This means that it should be reasonable to expect that a sufficient number of cases will enter the program in a reasonable period of time so that an adequate sample size can be accumulated.6 It should be remembered that referrals often drop when an experiment is undertaken.
- If a randomized experiment is to be implemented, it is desirable that the program not have saturation coverage of the eligible cases, requiring some rationing of services, so that the randomized experiment will not result in fewer cases receiving services than before.
- The organizations involved in the evaluation should have adequate administrative data systems that will provide information on important outcomes. Adequate accounting systems should be available to provide information for cost analyses.
- Program managers and relevant agency executives must be supportive of the evaluation and be willing to exert influence to assure cooperation of staff with evaluation requirements that could be somewhat demanding. Staff sabotage of an evaluation is an ever present possibility, so there should be the capacity to implement mechanisms that will minimize it.
- Front-line workers should get regular feedback about the evaluation and data to support their continued participation.
- It should be possible to assess the extent to which a program is implemented as intended, that is, to determine the degree of program fidelity.
Evaluations should provide solid evidence for decisions about the particular program under review and decisions to implement a similar program elsewhere. This requires that evaluations be as rigorous as possible, ideally experiments in which cases are randomly assigned either to a group receiving the service being tested or to a group receiving a different service or no service at all. In the absence of a randomized experiment, it should be possible to implement a convincing quasi-experiment, involving similar groups receiving contrasting services. Besides giving the best possible information on effects of programs, rigorous evaluations maximize the possibility of other learning, such as what can go wrong in program implementation. In any event, the evaluation should be prospective, clients should be followed from the time they enter the program or a contrasting service, through the program and for a reasonable follow-up period, probably not less than one year. Given the length of time required for some actions, such as termination of parental rights, a follow-up period of longer than that would be useful.
While a credible contemporaneous comparison group is essential for a useful evaluation, the set of outcomes examined might depend on available resources for the research. At a minimum, outcomes should include an accounting of the status of children at various points in time, that is, whether in foster care, at home, adopted or freed for adoption, in independent living arrangements, etc. Ideally, assessments should also be periodically performed of the well being of children and the functioning of the families in which they are living. If children are not living in their original homes but reunification is planned, the functioning of the homes to which they are to be returned should also be examined. Sources of data should include administrative data on living arrangements, reports of maltreatment, administrative and judicial actions altering the child's status, and services provided. Again, ideally, periodic interviews should be undertaken with caretakers, workers involved with cases (both in the public agency and in other organizations that are involved), and the child. Data might also be obtained from others in a position to know about the child's well being, such as teachers. Evaluations should also consider cost issues, ideally conducting a cost-effectiveness or cost-benefit analysis.
We note a particular problem in the evaluation of reunification programs: as indicated above, it is the business of virtually the entire body of foster care practice to work toward reunification in the first place and, if that does not work out, alternative permanency arrangements. Thus, many jurisdictions do not have identifiable programs for reunification and permanency, rather seeing all of their "out-of-home" work directed toward those ends.7 Furthermore, there is a lack of well known, well articulated models of reunification practice that have been implemented in large scale and no single program model has captured the attention of the field as a whole.
5. Negative findings are useful, but more is learned and better guidance for decisions is available when results are positive. An exception is the situation in which the program has been widely adopted or is widely touted. In this case, it is desirable to undertake a rigorous evaluation in order to verify or disprove accepted wisdom.
6. We cannot specify more precisely the sample sizes required without further exploration of expected size of effects of particular programs.
7. This is in contrast to the situation in regard to family preservation a decade ago. Although the ideals of family preservation had pervaded the system, many jurisdictions had (and still have) identifiable family preservation programs, some of them operating at quite large scale.