# Methodological Issues in the Evaluation of the National Long Term Care Demonstration. B. Testing Strategy

The regression procedure described above provides estimates of treatment/control differences in outcomes, controlling for any initial differences between the two groups. These are our best estimates of channeling impacts. However, even if channeling had no impact the treatment and control groups may have somewhat different outcomes strictly by chance. Hence, we relied on statistical tests to determine whether the estimated differences were sufficiently large that they were unlikely to have occurred by chance.

Three types of tests were used in the analysis: t-tests on the estimates of aB and aF, the estimated treatment/control differences, to determine whether they were significantly different from zero; F-tests to test whether the estimated impacts in the basic and financial control models differed from each other by more than might have been expected to occur by chance; and multivariate F-tests to test whether estimates of channeling impacts on sets of related outcome measures were equal to zero. Each of these tests is described below.

1. Tests of Whether Channeling Impacts Existed

The widely used t-test simply tests whether an estimated regression coefficient differs from zero by more than might reasonably be expected to occur because of sample variation. In our application, the regression coefficients aB and aF estimate the treatment/control differences. If the true effect of channeling on some outcome is zero, the estimates of aB and aF should be relatively small. The test enables us to determine, with some known probability of error, whether channeling had some impact on the outcome examined.

Two criteria must be specified by the researcher in conducting t-tests: whether one-tailed or two-tailed tests are to be used and the significance level of the test. The choice of two-tailed or one-tailed tests depends on whether channeling is expected to affect the level of some outcome in a particular direction, or whether the impact could be in either direction. For most outcomes examined, the intention was that channeling would have a particular directional effect (e.g., reduction in nursing home use). However, for the vast majority of the outcomes, there were plausible reasons why the impact could be in the opposite direction, and for some important outcomes there was a high degree of uncertainty about the direction of channeling impacts to expect (e.g., informal caregiving and costs). Since we would clearly not ignore estimates that were of the "wrong" sign but were large and statistically significant had a two-tailed test been conducted, the appropriate test to use is the two-tailed test.

To avoid the appearance of arbitrariness in the selection of tests and confusion in the minds of readers as to which type of test was being employed in any given table, particularly in reports covering multiple outcomes, two-tailed tests were used throughout the analysis, even for those few outcomes where the only plausible hypothesis about channeling's impact is unidirectional. The use of two-tailed t-tests also should result in greater consistency between these tests and the multivariate F-tests (described below), which are, by definition two-tailed.

The use of two-tailed tests did on occasion result in the inference that channeling's impact on a particular outcome in some time period was not significantly different from zero when a one-tailed test would have led to a different conclusion. In such cases, supporting evidence from other time periods and related outcome measures was used to obtain the correct inference about whether channeling appeared to have impacts on the behavior under examination. The magnitude of the estimate also was considered in drawing these inferences. The interpretation of coefficients and test statistics is described in more detail in Section C below.

The significance level at which to conduct the t-tests was the other testing decision. To make it relatively unlikely that chance differences between the two groups would be interpreted as channeling impacts, we followed customary conventions of statistical testing, conducting the t-tests at the .05 (5 percent) significance level. This means that based on the sample size and observed sample variation, there was a small prior probability that treatment/control differences of the magnitude estimated would have occurred by chance, and that such differences are therefore likely to be due to the effects of channeling. Tables in final reports on channeling impacts containing estimated impacts also indicated which estimates would still have been statistically significant had the test been conducted at the .01 level, implying an even smaller likelihood that the observed difference was due to chance sample variation.

Although we believe these decisions about one-tailed versus two-tailed tests and significance levels are the most appropriate, throughout the final technical reports on channeling impacts we provide the t-statistics along with the estimates. Readers can therefore determine for themselves whether and how inferences would change if alternative choices had been made.

2. Tests of Equivalence of Impacts Between Basic and Financial Control Models

In addition to determining whether the basic and financial control models affected specific outcome measures, we were also interested in knowing whether the models differed from each other in the-size of the impact. It was hypothesized that the greater resources and flexibility of funding available under the financial control model would result in larger impacts for this group. However, differences between the environments into which the two models were introduced could also produce differences in the size of impacts achieved by the alternative models.22

Simple F-tests of the equivalence of aB and aF (from the regression equation) provided the tests of this hypothesis. The tests were conducted at the .05 level, consistent with the significance level selected for the t-tests. To reduce the likelihood of inconsistencies in the test results (such as the estimate for one model being significantly different from zero and the other not, but an F-test indicating no significant difference between the two models), the F-tests were conducted in two stages. We first tested whether aB and aF were both equal to zero using a joint F-test. If that hypothesis could not be rejected, no further test of equivalence was necessary. If the test did indicate rejection of the hypothesis that both were equal to zero we then tested whether they were equal to each other.

3. Multivariate Tests of Whether Channeling Impacts Existed

The individual tests described above were conducted at a significance level that made it relatively unlikely that, for any particular outcome measure, chance differences between treatment and control groups would be interpreted incorrectly as channeling impacts. However, because so many outcomes were examined (each for 2 models and 3 time periods), the probability that such errors would occur in at least a few instances was very high. To lessen the probability of making such errors, multivariate tests were employed that simultaneously tested the hypothesis that channeling impacts on a set of related outcome measures were jointly equal to zero. For example, estimates of channeling impacts on nursing home days, the probability of being admitted to a nursing home, and nursing home expenditures were tested jointly to determine whether any were significantly different from zero. The advantage of this type of test is that if (for example) only one of the 6 impact estimates (3 for each model) were significantly different from zero using the individual t-tests, and the other impact estimates were all small and far from being statistically significant, it is probably unlikely that channeling really influenced nursing home use. The multivariate test in such cases would typically indicate (depending on the size and significance of the estimates) that we could not reject the hypothesis that channeling's impact on the set of nursing home outcomes was zero.

Tests that impacts on the set of outcomes being considered jointly were all zero were conducted for the basic model, the financial control model, and for both models together. We also used multivariate tests to determine whether impacts on given sets of outcomes in the basic model were equal to those in the financial control model. In each case the tests were conducted on related outcome measures, such as alternative measures of well-being or informal care, for a given time period. Because the tests require that the same observations be used in all of the equations for which the coefficients are being tested jointly, outcomes in different time periods were tested separately.

The lower likelihood of erroneously concluding that channeling affected outcomes when the treatment/control differences were actually due to chance makes the use of multivariate tests attractive. Furthermore, it suggests that they should be used hierarchically, that is, that t-tests should only be examined if the multivariate tests indicate that not all channeling impacts in a given substantive area are zero. In this instance t-tests would indicate which of the outcomes channeling did appear to affect. However, strict adherence to test results in this fashion would increase the probability of making the opposite type of error--concluding that channeling had no impacts when in fact it had. The method of assessing and interpreting the many estimates and test statistics produced in the analysis, described in the section below, was designed to strike a balance between these two types of errors.

#### View full report

"methodes.pdf" (pdf, 2.16Mb)