Evaluation of Family Preservation and Reunification Programs: Interim Report. 9.3 Possible Alternative Explanations of the Findings


Positive findings of experimental evaluations provide evidence for the validity of a theory of intervention and confirm the effectiveness of a particular implementation of that theory. Null findings are more ambiguous, they do not necessarily disprove an intervention theory and may not even be evidence of ineffectiveness of implementation. One cannot be sure whether the results are due to problematic program conceptions, inadequate program implementation, unique contextual problems, or flawed evaluation procedures. The findings of this study will be questioned, as have those of the previous studies, for various supposed methodological and implementation shortcomings. We consider here some of the factors that might have affected the findings, beginning with problems in the implementation of the evaluation.

Violations of experimental assignment. In all three states, there were violations of experimental group assignment, that is, families assigned to the control group that were given family preservation services. This was particularly a problem in New Jersey, where 14% of the control group families received family preservation. The dictates of rigorous analysis required that we retain these cases in the control group (we also conducted "secondary" analyses in which we dropped these cases from analysis and there were few differences between our primary and secondary analyses). Violation cases could significantly affect the findings. For example, they could represent cases that would have experienced placement in the absence of the service. To the extent this was the case, the placement rate in the control group would be underestimated. This could affect the conclusions about both the effective targeting rate and experimental-control group differences in placement.

We attempted to examine the extent to which violations might have affected the results in New Jersey (there were too few violations in Kentucky and Tennessee to have significant effects). Even if all of the violations had been placed early on, the proportion of families in the control group experiencing placement would not have reached levels that one would consider close to adequate targeting. Sensitivity analysis in which all violations are assumed to be placed early suggests that under this extreme assumption there would have been differences in placement rates favoring the family preservation group early on but these differences dissipate over time.11 Hence, at the very least, violations could not affect a conclusion that family preservation does not appear to prevent long placements of a year or more.

Inclusion of minimal service cases in the analysis. Some families in the experimental group did not receive family preservation services or received only small amounts of service. These cases were included in the primary analysis and it might be argued that this reduced the apparent effects of the service and that we should have eliminated these cases from analysis to produce a fair estimate of effects. We did drop these cases from our "secondary" analysis, and found few differences compared to the primary analysis. In addition, it should be observed that programs will always have minimal service cases, cases in which the family cannot be found, declines service, or otherwise refuses to cooperate. Retaining them in the analysis is appropriate in determining the average effects of the service over a group of cases thought to need the service. Theoretically, one might be able to reduce the size of the minimal service group through better targeting, but in practice, it is likely to be difficult to identify a substantial proportion of these cases prior to referral.

The "John Henry" effect. The John Henry effect is reputed to be present in some experimental evaluations. This is the situation in which workers in control group cases exert special efforts on behalf of families, providing them far more service than would have been provided in normal circumstances (so the control group is not a "regular service" group). There are a couple of possible reasons this might occur. A worker might be unhappy with the experiment in general and with the assignment of this particular case to the control group in particular, and exert special effort in response. Alternatively, workers might feel the families assigned to the control group really need the experimental service, the prevention of placement is very important, so efforts are made to emulate Homebuilders. (This may be a special case of experimental leakage.)

In Kentucky and New Jersey, there is no evidence in the data on services to suggest this happened. Families in the experimental group did receive much more service than the control group. It is possible that the control group received more than "regular services." We cannot determine that. So it is possible that there is a threshold of services that has placement prevention effects and that was reached by the control group. If this were the case, it would indicate that the desired results can be obtained without intensive family preservation services.

In Tennessee, there is some evidence that families in the control group may have received as much, or perhaps more, service than the experimental group. This is seen in a specific set of questions asked of the caretakers about services received, and is not confirmed in other evidence regarding services provided to the two groups. Nonetheless, we cannot be as confident in Tennessee that experimental group families received much more service than the control group. Since the outcomes of the two groups were similar, this could again be taken as an indication that the results could be obtained without the family preservation services we studied.

Effects of the experiment on the nature of the referred group. It is possible that instituting the experiment caused a change in the character of cases referred to the program. In particular, agencies and workers were required to refer more cases in order to fill the control group as well as the experimental group. This resulted in dipping further into the pool of cases, perhaps taking "less severe" cases, those with less risk of placement. Anticipating this problem, we endeavored to select sites for the experiment in which demand considerably exceeded supply, however, we cannot be sure that we succeeded in this regard. It is also possible that workers referred different cases because of the chance that they would be assigned to the control group and not receive family preservation services. Or they may have changed referral practices to sabotage the research.

We cannot be sure that these factors were not present in referrals of families to the experiment, but we have no strong evidence that they were a strong influence. Operating against such dynamics were the desires of workers to provide significant services to families.

The program implementation was flawed. The family preservation programs in Kentucky, New Jersey, and Tennessee claimed adherence to the Homebuilders model of service. However, it is possible that the implementation did not adequately follow that model, with the result that this evaluation was not a fair test of the model. We attempted to measure certain aspects of model adherence and found some variation from the prescribed ideal. One cannot expect any implementation of a model to adhere totally to it, adaptations must be made to local conditions, the character of individual cases, and to the styles of individual workers. Models of social service do not provide for the same response in all cases nor can they be used to prescribe exactly what should be done in each case. Even for the best specified model, judgment abounds in its application, such that there might be legitimate disagreements as to whether it was applied in a particular case. In fact, one might hope that a model would be "robust" for at least small violations of it, having benefit even when it is not applied in an ideal way.

In the end, it is a matter of judgment as to whether the model was adequately adhered to in these three states. The fact that we have three states with similar findings, that is, similar degrees of adherence to the model, is again relevant. Was the model violated in all three states? Possibly, but that would then suggest the difficulty, perhaps the unlikelihood, of adequately implementing it elsewhere.

Contextual factors caused the model to fail. It is possible that a variety of contextual factors caused the outcomes that we observed. There are a multitude of possible such factors: the political and economic climate, the climate in the agencies, administrative barriers, approaches of judges, competence of workers, availability of other services, etc. These influences would weigh on both the experimental and control groups, presumably in equivalent ways, but they could prevent any new approach from having effects different from usual treatment. While we cannot exclude such factors as explanations for our results, again the fact that we have three states with similar results is relevant. Multiple sites make it less likely that the same contextual factors are explanations of the findings. Furthermore, social programs must operate in less than ideal contexts, to be effective, their conceptualizations must take into account these circumstances.

One set of contextual factors may have prevented positive effects of family preservation services: broad social problems of poverty, racism, inadequate housing, inadequate education, and substance abuse. Perhaps it is unrealistic to expect a short term program to solve such serious problems.

The program conceptualization is flawed. It is always possible that findings such as ours are the result of program design that is flawed. Obviously, this is the interpretation that is most difficult for program advocates to contemplate. But it is possible that the intervention activities of family preservation programs, even if carried out in an ideal way, are inadequate to achieve their goals. We note here one specific aspect of these programs that is often criticized and blamed for perceived failures: their brevity. It is often suggested that a program only four weeks in length, even if it is very intense, cannot expect to have significant effects on very serious individual and family problems which are often of long duration, therefore requiring much longer interventions. Going even further, it is possible that the available intervention technology is simply inadequate in the face of the problems it is expected to solve.

(11) Under the assumption that all violations would have been placed in the first month, 27% of the control group would have been placed in the first six months, compared to 19% of the experimental group. At one year, the proportions would have been 29% in the control group and 28% in the experimental group.