Under certain statistical assumptions, the regression procedure described in Chapter III will provide unbiased estimates of channeling impacts. The assumption on which unbiasedness depends is that the disturbance term representing the unobserved factors affecting outcomes be uncorrelated with the screen/baseline control variables and treatment status. This condition is not definitely verifiable, but the fact that sample members were randomly assigned to treatment and control groups makes it unlikely that the disturbance term is correlated with treatment status; hence, estimates of channeling impacts obtained by regression are expected to be unbiased.
Unbiasedness is not the only desirable property of the estimates, however. When outcome variables are not normally distributed, regression estimates lose some of their other desirable properties and may exhibit other characteristics that are undesirable. Two types of channeling outcome variables that had non-normal distributions were those that were binary or truncated at zero, and those that were skewed (i.e., that had extremely large values for a small number of observations). Analyses were conducted to determine whether the regression estimates of impacts on these two types of outcomes were distorted or less reliable in some way than alternative estimates.
1. The Validity of Regression Estimates of Channeling Impacts for Binary and Truncated Dependent Variables
Estimates that are unbiased are known to be accurate on average; however, we also want impact estimates that in any particular instance are unlikely to deviate greatly from true impacts. The smaller the variance of the estimates, the narrower the confidence intervals around the estimates and the lower the probability of failing to detect important channeling impacts. However, the requirement for regression estimates to have minimum variance--homoscedasticity of the disturbance terms--will not be met for many of the dependent variables examined in the channeling evaluation because they are binary (e.g., whether admitted to a nursing home) or bounded at zero (e.g., number of days spent in the hospital). Furthermore, if the disturbance term is not homoscedastic, the test statistics calculated by the regression program will not be strictly correct. Finally, the predicted value for some observations may be less than zero when regression is used for binary or bounded dependent variables, which is obviously inappropriate. (Predicted values may also be greater than one, which is equally inappropriate for binary variables.)
For cases such as these, econometric procedures have been developed to provide estimates with desirable properties (under certain assumptions). Probit and logit models are the estimation procedures most widely used for binary dependent variables and Tobit analysis is used by economists for bounded variables. (See Maddala, 1983, for a discussion of these procedures, their statistical properties, and the assumptions on which they are based.) In practice however, these more complex and expensive estimation procedures typically provide estimates of the effects of explanatory variables on dependent variables which closely resemble in size and significance the estimated effects obtained from least squares regression. This result has been demonstrated in several previous applied studies (Corson et al., 1985; Grossman, et al., 1986; Hollister, et al., 1985; and others) as veil as in the recent econometric literature (Greene, 1981, 1983). Furthermore, all of the statistical properties of the probit and Tobit estimators, including unbiasedness, depend on the assumption that the disturbance term is normally distributed, a condition not required by regression.
The much greater ease with which statistical tests can be performed with least squares regression and the much lower computational cost compared to probit, logit, and Tobit (which require iterative maximum likelihood estimation) led us to strongly prefer least squares as an estimation strategy. However, to ensure that computational ease and cost savings were not achieved at the cost of seriously distorted impact estimates or test statistics, we compared estimates of channeling impacts obtained from regression to estimates obtained from the more complex procedures, using key outcome variables that were binary or truncated at zero.^{32}
Comparison of the probit model estimates to least squares estimates for binary dependent variables.^{33} The probit model is based on the assumptions that individuals will take a given action (e.g., enter a nursing home) when a certain unobserved threshold is reached, that this threshold is determined by observed and unobserved factors, and that the threshold differs across individuals. Consider, for example, the decision to enter a nursing home. The'probit model for this outcome is written as:
Y* = | a_{o} + a_{B}T_{B} + a_{F}T_{F} + a_{s}S + a_{x}X - e |
Y = | 1 if Y* > 0 |
0 if Y* < 0. |
where Y* is the unobserved indicator of the propensity to enter a nursing home, which depends on the set of variables specified as explanatory variables in the standard regression equation given in Chapter III. The disturbance term e is the unobserved individual-specific threshold, for example, the individual's unwillingness to enter nursing homes.^{34} Sample members whose unmet need for services is so great that it outweighs their distaste for nursing homes are assumed to enter such institutions (given the availability of beds). The observed binary dependent variable (Y) is equal to 1 for those who enter nursing homes and 0 for those who do not. The parameters of this probit model (the a_{i}’s) are estimated by maximum likelihood, i.e., by choosing the values that maximize the product of predicted probabilities of entering a nursing home (for actual entrants) or not entering (for nonentrants). Predicted probabilities from this model will always be between zero and one, and if the assumed model is correct, the resulting estimates have the minimum variance possible. The estimated impacts of channeling are obtained by computing the predicted probability of entering for a treatment group member, with all of the other characteristics X set at the sample mean, and subtracting the predicted probability for controls computed at the same values of X.
TABLE IV.2. Impact Estimates from Least Squares Regression and from Probit for Selected Binary Outcome Measures (In percentage points; t-statistics in parentheses) |
|||||||||
---|---|---|---|---|---|---|---|---|---|
Basic Model | Financial Control | Sample Size | |||||||
Regression | Probit^{a} | Regression | Probit^{a} | ||||||
Whether Received Any Formal Care - 6 months | 6.96** | (3.49) | 7.35** | (3.49) | 16.31** | (8.09) | 17.23** | (8.12) | 4,974 |
Whether Had Any Visiting Informal Caregiver - 6 months | -2.33 | (-1.22) | -2.34 | (1.18) | -2.57 | (-1.33) | -2.77 | (-1.33) | 4,899 |
Whether Received Any Informal Care - 6 months | -2.97 | (-1.50) | -3.12 | (-1.44) | -2.64 | (-1.32) | -2.92 | (-1.38) | 4,899 |
Whether Received Comprehensive Case Management - months 1-6 | 51.17** | (26.33) | 52.67** | (26.44) | 56.34** | (28.93) | 58.35** | (29.36) | 3,955 |
Whether Admitted to Hospital - | |||||||||
months 1-6 | -2.80 | (-1.44) | -2.93 | (-1.47) | 2.04 | (1.04) | 2.12 | (1.07) | 5,554 |
months 7-12 | -0.36 | (-0.20) | -0.43 | (-0.23) | 0.37 | (0.20) | 0.48 | (0.26) | 5,554 |
Whether Admitted to Nursing Home - | |||||||||
months 1-6 | -0.52 | (-0.37) | -0.20 | (-0.15) | -0.37 | (-0.27) | -0.16 | (-0.12) | 4,593 |
months 7-12 | -2.23 | (-1.88) | -2.22 | (-1.93) | 0.29 | (0.25) | 0.40 | (0.36) | 4,752 |
NOTE: Regression estimates and sample sizes do not in all cases correspond exactly with those presented in final channeling reports, because some changes may have taken place between the time that this analysis was conducted and the final analyses were completed.
** Significantly different from zero at the .01 level (2-tailed test). |
The least squares and probit estimates of channeling impacts on a set of key binary outcome variables are compared in Table IV.2. The impact estimates and t-statistics were very similar for all six of the variables examined, for both models. For no outcome was there a change in the statistical significance when probit was used. Even estimates that were statistically insignificant exhibited only small changes in magnitude.
Comparison of Tobit estimates to least squares regression estimates. When the dependent variable is truncated at zero but not binary, such as nursing home expenditures or days, regression estimates lose some of their desirable properties. The Tobit procedure, which is closely related to the probit procedure, was designed to overcome these weaknesses. A Tobit model of the number of days spent in nursing homes, for example, would be written as:
Y* = | a_{o} + a_{B}T_{B} + a_{F}T_{F} + a_{s}S + a_{x}X - e |
Y = | Y* if Y* > 0 |
0 if Y* < 0. |
where observed nursing home days (Y) is equal to the expression given for Y* for individuals whose need for nursing home care outweighs their unobserved unwillingness to enter nursing homes (e), and equal to zero for others. Again, maximum likelihood methods are used to estimate the coefficients and the standard error of e. The effects of channeling are estimated by computing the expected value of the outcome Y for treatments and for controls, both at the point of means of the other explanatory variables, and taking the difference. (See Moffitt and McDonald, 1980, for the correct expression for obtaining predicted outcomes from Tobit models.)
The regression and Tobit estimates of channeling impacts on a set of key outcome variables that are bounded at zero are contained in Table IV.3. For most of the 24 comparisons, the differences between the two alternative estimates were quite small (though somewhat greater than the differences observed between probit and regression). However, in 3 instances, the differences were fairly large and resulted in a change in the statistical significance of the impact estimates: hours of formal care at 6 and 12 months in the basic model and nursing home expenditures at 6 months in the basic model. The impact of channeling on formal care in the basic model went from essentially zero using the regression model to nearly 1 hour per week at 6 months (about 15 percent of the control group mean) using the Tobit model, with the latter being statistically significant at the .05 level. The same change in statistical significance occurred at 12 months for this outcome in the basic model, although the two estimates were not that different in magnitude. The effect on nursing home expenditures went in the opposite direction. The regression estimate was a reduction of 165 dollars (about 25 percent of the control group mean), which dropped to 47 dollars when Tobit was used.
TABLE IV.3. Impact Estimates from Least Squares Regression and from Tobit for Selected Truncated Outcome Measures (t-statistics in parentheses) |
||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Basic Model | Financial Control | Sample Size |
||||||||
Regression | Tobit^{a} | Regression | Tobit^{a} | |||||||
Hours of Formal Care | ||||||||||
6 Months: | impact | 0.14 | (0.22) | 0.92* | (2.00) | 5.35** | (8.15) | 5.09** | (9.99) | 4,974 |
control mean^{b} | 6.4 | 6.2 | 4.8 | 6.3 | ||||||
12 Months: | impact | 1.14 | (1.78) | 1.46** | (3.38) | 3.58** | (5.56) | 3.39** | (6.62) | 5,040 |
control mean | 5.2 | 4.8 | 4.5 | 6.0 | ||||||
Hours of Informal Care | ||||||||||
6 Months: | impact | -0.98 | (-1.29) | -0.74 | (-1.42) | -0.31 | (-0.41) | -0.59 | (-1.02) | 4,899 |
control mean | 6.02 | 6.27 | 6.31 | 7.08 | ||||||
12 Months: | impact | -0.03 | (-0.04) | 0.08 | (0.21) | 0.07 | (0.12) | -0.29 | (-0.62) | 4,998 |
control mean | 3.69 | 3.96 | 4.56 | 5.15 | ||||||
Hospital Days | ||||||||||
6 Months: | impact | -0.35 | (-0.41) | -0.59 | (-0.83) | -0.71 | (-0.83) | -0.00 | (-0.01) | 5,554 |
control mean | 11.5 | 12.8 | 16.2 | 14.3 | ||||||
12 Months: | impact | -0.18 | (-0.25) | -0.20 | (-0.33) | -0.56 | (-0.75) | -0.20 | (-0.33) | 5,554 |
control mean | 7.0 | 8.1 | 9.0 | 8.6 | ||||||
Nursing Home Days | ||||||||||
6 Months: | impact | -2.36 | (-1.93) | -0.59 | (-0.67) | -1.14 | (-0.94) | -0.27 | (-0.33) | 4,593 |
control mean | 12.2 | 6.4 | 9.6 | 5.6 | ||||||
12 Months: | impact | -1.19 | (-0.63) | -2.56 | (-1.59) | -2.19 | (-1.15) | -0.02 | (-0.02) | 4,752 |
control mean | 16.3 | 12.8 | 16.7 | 10.1 | ||||||
Hospital Expenditures | ||||||||||
6 Months: | impact | -119 | (-0.45) | -206 | (-0.94) | -68 | (-0.25) | 89 | (0.36 | 5,554 |
control mean | 3,412 | 3,869 | 4,899 | 4,643 | ||||||
12 Months: | impact | 59 | (0.29) | -11 | (-0.06) | -161 | (-0.79) | -63 | (-0.34) | 5,554 |
control mean | 2,015 | 2,307 | 2,706 | 2,641 | ||||||
Nursing Home Expenditures | ||||||||||
6 Months: | impact | -165* | (2.15) | -47 | (-0.92) | -8 | (-0.11) | 6 | (0.12) | 4,593 |
control mean | 666 | 369 | 560 | 332 | ||||||
12 Months: | impact | -58 | (-0.56) | -120 | (-1.42) | -103 | (-0.99) | 1 | (0.01) | 4,752 |
control mean | 819 | 657 | 894 | 546 | ||||||
NOTE: Regression estimates and sample sizes do not in all cases correspond exactly with those presented in final channeling reports, because some changes may have taken place between the time that this analysis was conducted and the final analyses were completed.
* Significantly different from zero at the .05 level. |
Despite these differences, it was not clear that the Tobit procedure produced better estimates than regression even in these two instances. The predicted nursing home expenditures for controls was far below the actual mean, suggesting that Tobit may not have provided reliable estimates. Furthermore, for both the variables for which least squares and Tobit produced substantially different estimates there was evidence that the Tobit estimates reflected the probability of any use of these services more strongly than the extent of use. Both of these problems were due to outliers, cases with extremely large values of the outcome variable, which affect Tobit estimates somewhat differently than least squares estimates. Although less sensitivity to outliers would be a desirable feature, the distorting effects of outliers on Tobit estimates may be even greater than their effects on least squares estimates, especially if there are treatment/control differences in the number of outliers. These potential problems, combined with the greater expense and difficulty of hypothesis testing with the Tobit model, again led us to prefer least squares regression as the estimation procedure, and to analyze the effects of outliers on these estimates directly.
2. The Effects of Outliers on Regression Estimates of Channeling Impacts
The effects of outliers (i.e., extremely large values of the outcome variable that are not simply data errors) on estimates of population means and regression coefficients are well-known, but there is much less documentation about what should be done when confronted by such problems. A common "solution", discarding the outliers, may distort estimates of program impacts more than leaving them in, since one of the effects of the program may be to reduce extreme use of or expenditures on services. This effect would be totally missed if outliers are discarded. However, it may be the case that differences between the two groups in the very small proportion of outliers could arise strictly by chance and affect the estimated treatment/control difference so greatly that it no longer provides a reliable estimate of channeling impacts.
Duan et al. (1983) cite examples of -how even estimates which are unbiased can yield very misleading inferences about program impacts in cases where the outcome variable is zero for a substantial fraction of the sample but has extremely large values for a small fraction of the remaining cases. They then propose an alternative estimator for such situations. This procedure seemed potentially appropriate for the channeling evaluation, since several of the key outcome variables exhibit these characteristics, especially hospital and nursing home days and expenses.
The procedure advocated, by Duan et al. is to break such service use variables (measured either in physical units or expenditures) into two separate variables: whether the service is used at all, and for those who use it, the amount of such services. The expected value of use is the product of the probability of use and the expected amount of use given that some occurred. Thus, a probit model is estimated first for whether any use occurred, as a function of treatment status and other explanatory variables. Then, using only observations that had some service use, a regression model is estimated to predict the amount of use (again dependent on treatment status and control variables), with the amount being expressed in logarithmic form to reduce the influence of outliers on the estimates. These two equations are then used to obtain predicted probabilities of use and amounts of use by service users for treatments and for controls with the same characteristics. These estimates in turn are used to compute overall expected use for the treatment and control groups and the difference between them.
This procedure was used on a set of key hospital and nursing home outcome variables with skewed distributions. Table IV.4 contains a comparison of the 2-part, least squares and Tobit estimates of channeling impacts. The 2-part method yielded estimates which differed somewhat from the regression estimates, but not by enough to change the inference about whether channeling affected hospital and nursing home outcomes. The 2-part estimates were also generally closer to the least squares estimates than to the Tobit estimate, especially for the outcomes exhibiting the largest discrepancy between least squares and Tobit.
These results suggested that the more cumbersome two-part method was not necessary, at least for hospital and nursing home outcomes where outliers were most likely to occur. However, the results from the Tobit analysis suggested that estimates of channeling impacts on hours of formal care received at 6 months was also affected by outliers. To investigate this, the 2-part method was used for this outcome variable as well. In the financial control model, estimated impacts from least squares and the 2part methods were both large and statistically significant. In the basic model, however, the estimated impact from regression was small (.14 hours) and not statistically significant, but the 2-part method estimate was much larger (2.5 hours) and the impact on both the probability of receiving care and the amount of care received by service recipients were statistically significant.
The nonsignificant effect on hours was unexpected because other estimates indicated that the basic model led to an increased proportion of sample members receiving any services. Thus, to have no effect on hours channeling would have had to decrease the average amount of services received by those who would have received some services even in channeling's absence. Further examination of the data showed that the small regression estimate of treatment/control differences was heavily influenced by the receipt of continuous (24 hours per day) formal care by 7 control group members (representing 20 percent of total use by the 1,000 controls in the sample) but only 2 treatment group members. Use of the 2part method dampened the effect of these outliers on the estimated treatment/control difference, and completely reversed the inference about channeling's effects on the average amount of care received by recipients. The estimate in column 7 of Table IV.4 indicates that treatment group recipients received significantly (2.8) more hours of care than recipients in the control group.
TABLE IV.4. Comparison of Least Squares, Tobit, and 2-Part Estimates of Channeling Impacts for Skewed Outcome Variables | |||||||||
---|---|---|---|---|---|---|---|---|---|
Outcome | Alternative Estimates of Impacts | Control Group Mean |
Components of 2-Part Method Estimate | Sample Size |
|||||
Tobit | Least Squares |
2-Part Method^{s} |
Probability of Use | Quantity of Users | |||||
Impact | Control Mean |
Impact | Control Mean |
||||||
6 Month Outcomes | |||||||||
Hospital Days | |||||||||
Basic | -0.59 | -0.35 | -0.74 | 11.5 | -0.024 | 0.539 | -0.4 | 22.19 | 5,554 |
Financial Control | 0.00 | -0.71 | -0.77 | 16.2 | 0.018 | 0.546 | -2.3 | 29.03 | |
Hospital Expenditures | |||||||||
Basic | -206 | -119 | -227 | $3,412 | -0.024 | 0.539 | -131 | 6,632 | 5,554 |
Financial Control | 89 | -68 | -178 | $4,889 | 0.018 | 0.546 | -596 | 8,813 | |
Nursing Home Days | |||||||||
Basic | -0.59 | -2.36 | -2.42 | 12.2 | -0.004 | 0.113 | -19.2* | 81.30 | 4,593 |
Financial Control | -0.27 | -1.14 | -0.08 | 9.6 | 0.001 | 0.107 | -1.4 | 68.37 | |
Nursing Home Expenditures | |||||||||
Basic | -47 | -165* | -131 | $666 | -0.004 | 0.113 | -1035 | 4,521 | 4,593 |
Financial Control | 6 | -8 | -30 | $560 | -0.001 | 0.107 | -320 | 4,158 | |
Hours of Formal Care | |||||||||
Basic | 0.92* | 0.14 | 2.50* | 6.50 | 0.074** | 0.400 | 2.82* | 16.24 | 4,974 |
Financial Control | 5.09** | 5.35** | 8.41** | 5.02 | 0.172** | 0.474 | 10.20** | 10.60 | |
6 Month Outcomes | |||||||||
Hospital Days | |||||||||
Basic | -0.20 | -0.18 | 0.40 | 7.0 | -0.005 | 0.339 | 1.5 | 21.06 | 5,554 |
Financial Control | -0.20 | -0.56 | -0.44 | 9.0 | -0.0003 | 0.350 | -1.2 | 25.17 | |
Hospital Expenditures | |||||||||
Basic | -11 | 59 | 139 | $2,015 | -0.005 | 0.339 | 506 | 6,079 | 5,554 |
Financial Control | -63 | -161 | -132 | $2,706 | -0.0003 | 0.350 | -370 | 7,597 | |
Nursing Home Days | |||||||||
Basic | -2.56 | -1.19 | -0.78 | 16.3 | -0.025 | 0.129 | 19.3 | 111.41 | 4,752 |
Financial Control | -0.02 | -2.19 | -2.43 | 16.7 | 0.004 | 0.103 | -27.5 | 128.66 | |
Nursing Home Expenditures | |||||||||
Basic | -120 | -58 | -4 | $819 | -0.025 | 0.129 | 1,345 | 5,757 | 4,752 |
Financial Control | 1 | -103 | -124 | $894 | 0.004 | 0.103 | -1,420 | 6,910 | |
* Significantly different from zero at the .05 level (2-tailed test). |
Given the similarity of the 2-part estimates to the ordinary least squares regression estimates for nursing home and hospital days and expenditures, the final reports on these outcomes relied upon the ordinary regression results. This was done because the standard errors of impacts from the 2-part method are more cumbersome to calculate, and multivariate tests would be especially difficult to conduct. Even for hours of formal care, we chose in the final reports to rely on least squares estimates (computed both with and without the outliers), despite the fact that the 2-part method did yield estimates that were less sensitive to outliers than the ordinary least squares estimates. The reason for this decision was that if channeling did in fact reduce the service use of a small number of cases who would otherwise have used large amounts of services, the savings from such effects could be very substantial. The two-part method may understate the importance of such cases.
The 2-part method therefore may never give the most appropriate estimates. If important channeling effects occur for outliers, the two-part method may mask them. On the other hand, if treatment/control differences in outliers were due strictly to chance, the optimal approach is to drop them, rather than to just reduce their influence. Thus, throughout the evaluation, least squares regression was used to estimate channeling impacts. As shown in Table IV.4, this yields the same inferences about impacts on hospital and nursing home outcomes as the 2-part method. For formal care at 6 months, impacts were estimated in the final report with outliers included and then with them excluded. Evidence was presented indicating which estimates provided the most accurate indication of channeling impacts. (See Corson et al., 1986 for further discussion of those results.) No other outcome measures appeared to have skewed distributions; hence, no other analyses of the effects of outliers were conducted.
View full report
"methodes.pdf" (pdf, 2.16Mb)
Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®