Methodological Issues in the Evaluation of the National Long Term Care Demonstration. D. The Validity of Pooling Observations


In selecting a regression model to estimate channeling impacts, a key issue was “pooling”--i.e., whether channeling impacts for each model could be accurately estimated by a single parameter in a single regression equation estimated on the full sample, or whether segments of the sample were so different from each other that a single equation or parameter would not accurately or adequately reflect the real relationships and would produce distorted impact estimates. Three pooling issues were examined:

  • Can valid estimates of channeling impacts at the model level be obtained by treating observations from any site implementing the model as if they were all from the same site, or must separate impact estimates be obtained for each site and then explicitly averaged to obtain model impacts?

  • Can a single regression equation be used to estimate channeling impacts at the model level, or are separate equations necessary for each site and/or treatment group in order to obtain valid estimates?

  • Can valid estimates of impacts at the site level be obtained from a single regression equation, or are separate equations necessary for each site?

The regression model specified in Chapter III is based on the assumption that the above types of pooling are appropriate. That is, a single equation was estimated using all observations, with impacts for each model represented by a single parameter. The advantage of pooling is that if the restrictions on regression estimates implied by pooling are true, much more precise estimates (i.e., estimates with smaller variances) can be obtained because only one estimate is being made for each model rather than one for each site. The possible disadvantage of pooling is that if the implied restrictions are not true, pooling observations could produce biased and misleading estimates of the model or site level impacts. The analysis described below was conducted to determine whether the smaller variances produced by pooling observations could be obtained without distorting estimates of channeling impacts.

1. Were Separate Impact Estimates for Each Site Necessary to Accurately, Estimate Model Impacts?

The type of pooling of greatest concern for this evaluation was whether a single parameter would be sufficient to estimate the effects of a channeling model or whether impacts were so different across sites that separate impact estimates were required for each site. In the latter case, model impacts would be obtained by computing a weighted average of the estimated impacts for the five sites implementing the model.26

The restriction implicit in using a single parameter is that impacts are the same in all sites implementing a given model. This restriction was tested by estimating an unrestricted version of this--an equation with 10 site* treatment interaction terms in place of the two binary treatment status variables used in equation 1--and testing whether the coefficients on the 5 site* treatment terms involving a given channeling model were equal to each other.

This test was conducted for a set of 14 key outcome variables at 6-,12-, and 18-months, including hospital and nursing home use (whether admitted, number of days), receipt of case management,27 receipt of formal and informal care (whether received, hours received),28 sample member wellbeing (number of unmet needs, number of impairment on activities of daily living, degree of global life satisfaction), and sample members' living arrangement (in community, hospital, nursing home, or, deceased). Of the 82 tests (41 for each model), the hypothesis that impacts were equal across sites was rejected in eight cases. The eight cases included whether case management was received, for both the 6- and 12- month measures in both models, and four scattered outcome measures at 18 months (for which the sample sizes were smaller by half). This is a relatively small proportion of the tests and the fact that results were strongest for case management outcomes made it less troubling, since impacts were large and statistically significant in all sites. Even more compelling was the finding that even for those eight outcomes, impacts at the model level computed from the equation yielding separate site impact estimates tended to differ little from model impacts computed from the equation without site-specific impacts. Thus, even if channeling impacts differed across sites, model level impact estimates were not distorted by the implicit assumption to the contrary in the pooled specification (equation 1). The smaller standard errors led us to prefer the pooled specification.

2. Were Separate Equations for Each Site and/or Treatment Group Necessary to Estimate Channeling Model Impacts?

Estimating a single equation on all of the observations combined implicitly constrains the estimated relationship between client characteristics and outcomes to be the same in all sites. However, if this assumption were not true, the estimated impacts at the model level from the pooled data could be distorted. The test described in the previous section addressed only the issue of whether separate impacts for each site were required, and was based on the assumption that a single equation for all sites was appropriate. If the assumption were incorrect, the results from the above tests could be erroneous as well.

To test the implicit constraints implied by pooling observations from all of the sites, separate equations were estimated for each site and the sum of squared residuals from these regressions was compared to the sum of squared residuals from the single equation. The F-tests constructed from these two sums for each outcome variable showed that in 10 of the 41 instances, the constraints on the regression coefficients implied by pooling were rejected. However, since our concern was only with whether estimates of channeling impacts were distorted by estimating a single equation rather than separate equations, we used the site-specific equations to construct an estimate of channeling impacts at the model level29 and then compared this estimate to the impact obtained from a single equation, for each of the key outcome measures. For each of the 82 comparisons the difference between the two alternative estimates was slight. Thus, despite the greater than chance incidence of formal rejection of the constraints on regression coefficients implied by pooling, the primary estimates of interest for the evaluation (channeling model impacts) were unaffected by estimation of a single equation rather than site-specific equations.

We also tested another set of restrictions that are implicit in the use of a single equation: that the relationship between outcomes and sample member characteristics were the same for treatments and controls. As always, the concern was with whether these implicit constraints, if not appropriate, would lead to different estimates of channeling impacts. Performing statistical tests of these restrictions indicated that for only 3 of the 41 outcomes examined were the implied restrictions rejected. Again, even for the 3 outcomes for which the constraints on the coefficients on explanatory variables were formally rejected, the impact estimates obtained from the separate equations were very similar to those obtained from the single equation.

Based on the above findings, we concluded that use of a single equation provided the best estimates of channeling impacts at the model level. The single equation yielded very similar impact estimates with considerably (up to 20 percent) smaller standard errors, thereby reducing the probability of erroneous inferences of the types discussed in Chapter III.

3. Can Valid Estimates of Site-Specific Impacts be Obtained from A Single Equation?

Despite the widespread findings that impacts at the model level did not seem to be distorted by pooling, there was still some concern that the site-specific impact estimates to be computed (see Applebaum et al., 1986) might be distorted if they were obtained from a single equation (with site*treatment interaction terms) rather than from separate equations for each site. Comparison of the two alternative estimates showed that of the 530 impact estimates,30 438 were not significantly different from zero whether the single or multiple equation variant was used. Of the remaining 92 estimates, 65 were statistically significant under both procedures and in all but 2 of these cases the estimate was quite similar in magnitude. There were 19 cases in which the single equation estimate was statistically significant but the separate equation impact estimate was not. In over half of these cases however, the estimates were quite close in magnitude, and the insignificant estimate had t-values very close to the critical value. The reduction in standard errors achieved by pooling was the primary reason for these differences in significance. Finally, there were 8 instances in which the separate equations produced statistically significant impact estimates at the site level, but the single equation did not. In most of these cases the two estimates differed substantially in size as well as significance.

We concluded that estimates of impacts at the site level obtained from a single regression equation would only rarely yield different conclusions about channeling impacts than would the estimates obtained from the unpooled model. Furthermore, even when different it may well be the case that the pooled estimate would be preferred because the standard errors would be smaller.

View full report


"methodes.pdf" (pdf, 2.16Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®