Regression analysis was used throughout the evaluation of channeling impacts to eliminate potential bias in impact estimates that could arise due to treatment/control differences on observed characteristics at baseline, to control for the different distribution of treatment and control groups across sites, and to provide more efficient impact estimates than a simple treatment/control group comparison of means would yield. The regression model used to estimate channeling's impacts can be described as follows. Let Y_{1} be the outcome of interest, such as number of hospital days or number of nursing home days. Define T_{B} = 1 if the sample member belongs to the treatment group in the basic case management model, and T_{B} = 0 if he or she does not. Similarly, define T_{F} = 1 if the sample member belongs to the treatment group in the financial control model, and T_{F} = 0 otherwise. Finally, define a set of auxiliary control variables (X_{1}) such as site, sex, race, income, and impairment in functioning. These variables are included in the outcome equation to control for preexisting differences between sample members on characteristics that affect the value of the outcome. The model is then
(1)  Y_{1}  =  a_{B}T_{B} + a_{F}T_{F} + X_{1}b_{1} + u_{1} 

=  Zb = u_{1} 
where a_{B} and a_{F} are the estimates of the impact of channeling on the .outcome, Y_{1}, for the basic and financial control models, respectively; b_{1} is a vector of coefficients on the auxiliary variables; and u_{1} is the disturbance term capturing all of the unobserved factors which influence Y_{1}. To facilitate the exposition below, this equation is rewritten in terms of Z and b, where Z is a vector that contains variables T_{B}, T_{F}, and X_{1}, and b is the true, unobserved value of the regression parameters (a_{B}, a_{F}, and b_{1}) in equation (1). In the absence of sample attrition, if the random assignment to treatment or control groups was performed correctly and the usual assumptions of leastsquares regression are satisfied,^{16} then regression estimates of a_{B} and a_{F} are unbiased estimates of the impacts of channeling on outcome Y_{1} for the basic case management and the financial control models, respectively.
As noted above, however, we are not able to estimate this model on the full sample because of attrition. To the extent that the included auxiliary control variables (X_{1}) account fully for the effect of any differences between responders and nonresponders on the outcome variable (Y_{1}), the estimated coefficients in equation (1), including a_{B} and a_{F}, remain unbiased. However, if there are unmeasured characteristics that affect both the probability of attrition and the outcome of interest, the estimated coefficients in equation (1) will in general be biased.
The following exposition describes the mechanism by which this bias occurs. Suppose that the attrition process can be described by the equations:
(2)  Y_{2}*  =  X_{2}b_{2} + u_{2}, and 

(3)  Y_{2}  =  1 if Y_{2}* > 0 (in the analysis sample) 

0 if *Y_{2}< 0 (lost from sample due to attrition). 
The dependent variable in equation (2) is an unobserved continuous variable, Y_{2}*, representing the sample member's propensity to respond to the interviews (or otherwise be included in a particular analysis sample). Each sample member has his or her own tendency to cooperate with the research and refuses when the perceived effort to respond exceeds a tolerance. This tendency to respond is not observed directly, but individuals with values exceeding a constantwithout loss of generality assumed to be zeroare observed to respond (Y_{2} = 1), while those with values less than or equal to zero are nonresponders (Y_{2} = 0). Propensity to respond is assumed to be a function of observable characteristics, X_{2} (which includes treatment status and may include other variables also included in X_{1}), as well as unobservable characteristics and circumstances, represented by the disturbance term u_{2}, assumed to follow a standard normal distribution.^{17}
Bias arises in the estimates of a_{B} and a_{F} if the unobserved factors affecting attrition (u_{2}) are correlated with the unobserved factors (u_{1}) that affect the outcome measure (Y_{1}). This can be seen by examining the general expression for the vector of regression coefficients for equation (1), which we will refer to as b~:
(4)  b~  =  (Z ' Z) ^{1}Z ' Y_{1}, 

=  b + (Z ' Z) ^{1}Z ' u_{1}. 
Without sample attrition, the expected value of the estimated regression coefficients is the true value of the parameters (b), because the last term in the expression above has an expected value of zero. With attrition, however, the expected value of b~, given that it is estimated on only observations in the analysis sample is:
(5)  E(b~  in the analysis sample)  

=  b + E [(Z ' Z) Z ' u_{1}  Y_{2} = 1]  
=  b + E [(Z ' Z) ^{1}Z u_{1}  Y_{2}* > 0]  
=  b + E [(Z ' Z) ^{1}Z u_{1}  u_{2} > X_{2}b_{2}]  
=  b + (Z ' Z) ^{1}Z E [u_{1}  X_{2}b_{2}]. 
If u_{1} and u_{2} are correlated (i.e., if there are unobserved factors that affect both Y_{1}, and the probability of attrition), the expected value of the final expression in square brackets will not be zero, and therefore the expected value of the regression estimates of the parameters of equation (1), including the expected value of the estimates of a_{B} and a_{F}, will not be equal to the true values of these parameters. Thus, the estimates are biased by sample attrition, and the size and direction of the bias are unknown.^{18}
The nature of this bias and a procedure for correcting it were expounded by Heckman (1976, 1979). Heckman showed that the bias due to sample attrition is analogous to the bias due to omitting an important explanatory variable. That is, we have
(6)  E(Y_{1}  Y_{2}* > 0)  =  Zb + E(u_{1}  Y_{2}* > 0) 

=  Zb + E(u_{1}  u_{2} > X_{2}b_{2}). 
As noted above, one of the assumptions of least squares regression is that the expected value of u_{1} is zero, so estimates of b will be unbiased. However, when sample attrition exists, the regression can be estimated only on those sample members with complete data, so unbiasedness of the resulting estimates requires that the expected value of u_{1}, conditional upon the sample members' availability for analysis, be equal to zero. If u_{1} and u_{2} are correlated, however, this conditional expectation of u_{1} is not zero but is a function of u_{2} and X_{2}. In this case, if Y_{1} is regressed on Z, and there is correlation between the variables in Z and those in X_{2}, regression estimates of b will be biased because an "omitted" term (the nonzero conditional expected value of u_{1}) is correlated with the regressors Z. The estimated coefficients on the variables in Z, including those on treatment status, will reflect not only the effect of Z on Y_{1}, but also the relationship between Z and the conditional expectation of u_{1}.
In this evaluation attrition could lead to bias in estimates of channeling impacts because those conditions that lead to bias may well be present. For example, suppose that the sample members who are the most impaired at followup are least likely to respond and also likely to have systematically higher (or lower) values of Y_{1} (e.g., hospital days). Since the auxiliary control variables measured at screen or baseline do not fully reflect impairment levels at the time of followup, u_{1} and u_{2} will be correlated. Furthermore, many of the variables Z and X_{2} that affect the outcome and the likelihood of attrition, respectively, are likely to be the same or to be highly correlated (e.g., both the outcome and likelihood of attrition may be affected by treatment/control status). Thus, there is a strong possibility that the two conditions that together produce biased estimates of regression parameters may be present and, therefore, that estimates of channeling impacts will be biased by attrition.
Fortunately, with an additional assumption, a statistical correction for attrition bias is possible. Heckman showed that although the second term on the righthand side of equation (6) is unobserved, the n term has a relatively simple form if u_{1} and u_{2} are assumed to have a bivariate normal distribution, and this term can be estimated. Heckman shows that
(7)  E(u_{1}  u_{2} > X_{2}b_{2}  =  _{12}  f(X_{2}b_{2} / _{2})  

_{2}  F(X_{2}b_{2} / _{2})  
=  (_{12} / _{2})M 
where _{12} is the covariance of u_{1} and u_{2}, _{2} is the standard deviation of u_{2}, b_{2} is the vector of the estimated coefficients from the attrition equation, f(X_{2}b_{2} / _{2}) is the standard normal density function evaluated at X_{2}b_{2} / _{2}, and F(X_{2}b_{2} / _{2}) is the standard normal distribution function evaluated at the same point. If the parameters b_{2} of the attrition equation were known, the term M could be constructed for each sample member and used as an additional variable in the regression model. Inclusion of this variable in this regression eliminates it from the error term and therefore eliminates the correlation between Z and the error term in equation (6), thereby eliminating the (asymptotic) attrition bias in estimates of b. The regression coefficient obtained on this M term is an estimate of _{12} / _{2}, the (normalized) covariance between u_{1} and u_{2}.
The parameters b_{2} are not known, but can be readily estimated. Thus, the procedure developed by Heckman and used in this report to eliminate attrition bias can be described as follows:

Using all observations (both responders and nonresponders), estimate the parameters of the attrition model given in equations (2) and (3) using maximum likelihood probit.^{19}

From the estimated probit coefficients (b_{2}) and the data on X_{2}, form the correction term (M) for the observations which have valid data for the outcome regressionthis excludes those lost due to attritionand estimate equation (8) by least squares:^{20}
(8)  Y_{1}  =  a_{B}T_{B} + a_{F}T_{F} + X_{1}b_{1} + cM = u_{1}*, 

where this equation is simply equation (1) with the nonzero conditional expectation (cM) of the old disturbance term plus a new disturbance term (u_{1}*) substituted for the old disturbance term (u_{1}). The statistical significance of c, the coefficient on M, is an indication of whether there are unobserved factors affecting both attrition and Y_{1}, a necessary condition for the estimates of a_{B} and a_{F} to be biased.
In the discussion of results in the next section, we assess the extent of attrition bias in estimates of channeling impacts in two ways: first, by examining the estimate of c, to determine whether the condition necessary for bias is met, and if so, the size and sign of the correlation between the two disturbance terms; and second, by comparing the regression estimates of a_{B} and a_{F} obtained when potential attrition bias is not controlled for (i.e., from estimating equation (1)) to the impact estimates obtained when this potential bias is controlled for (by estimating equation (8)). In interpreting these results, it is useful to bear in mind the determinants of the bias in a particular coefficient. Inserting the expression in equation (7) into equation (5), the bias in the uncorrected estimates of a_{B}, a_{F}, and b_{1} is shown to be
(9)  bias ≡ E(b~)b  =  (_{12} / _{2}) (Z Z) ^{1}Z M 

=  (_{12} / _{2}) P_{Z,M}, 
where the term P_{Z,M} is a vector of auxiliary regression coefficients obtained from regressing the constructed M term on the other variables (Z's) in equation (8).^{21} Thus, the bias in the regression coefficient on any particular explanatory variable (e.g., a_{B}, the coefficient on T_{B} in the outcome equation) is equal to the covariance between u_{1} and u_{2} (normalized by _{2}), multiplied by the coefficient on this same variable (e.g., T_{B}) from a second, auxiliary regression of the constructed M variable on all of the Z variables.
The usefulness of this expression is best demonstrated by elaborating on our previous example. Suppose that we are interested in estimating the impacts of channeling on the number of hospital days (using a followup sample). Also, suppose that those who are most impaired at the time of the followup are less likely to be available for analysis than are less impaired sample members and that the effects of this impairment on hospital days is imperfectly controlled for with the baseline control variables. Since the most impaired individuals are most likely to be in a hospital and least likely to be in the analysis sample, the covariance between u_{1} and u_{2} (_{12}) will be negative. Furthermore, since treatment group members are more likely to be available for analysis than control group members, it can be shown that the auxiliary regression coefficient of treatment status contained in P_{Z,M} is expected to be negative.^{22} Thus, we would expect the attrition bias in the estimate of a_{B} to be positive. That is, the estimated impact will be a larger number than it should be. Thus, we could find an estimated impact of zero when in fact the impact was negative, implying a reduction in hospital days due to channeling. This analytic assessment of the direction of bias is consistent with the heuristic argument that the sample members most likely to be lost to analysis are control group members with relatively large numbers of hospital days, and if these cases were appropriately represented in the analysis sample, the treatment/control difference in expected hospital days would have been a larger negative number. Based on this reasoning, the following reference table can be used to draw inferences about the expected direction of the bias (if any) due to attrition in estimates of channeling impacts:
Expected Relationship (_{12}) Between Outcome (Y_{1}) and the Likelihood that Sample Member is Available For Analysis  Expected Bias^{1} in Estimated Impacts  Interpretation 

0  0  Impact estimates unbiased 
+    Impacts understated if channeling is predicted to increase Y (a_{B}, a_{F} positive); impact overstated if channeling is predicted to decrease Y_{1} (a_{B}, a_{F} negative)^{2} 
  +  impacts overstated if channeling is predicted to increase Y_{1} (a_{B}, a_{F} positive); impacts understated if channeling is predicted to decrease Y_{1} (a_{B}, a_{F} negative)^{2} 

Using this table for our example, we expect _{12} to be negative because those who are most impaired are likely to have more hospital days, but are less likely to respond. Thus, the expected bias in the impact estimate is positive, and since channeling is predicted to reduce hospital days, the estimated reduction in hospital days will be understated if attrition bias is not corrected for.
View full report
"atritn.pdf" (pdf, 4.74Mb)
Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®