Methodological Issues in the Evaluation of the National Long Term Care Demonstration. A. The Regression Model


Regression, or equivalently, analysis of covariance, offers three advantages over simple differences in means as a way of estimating program impacts. First, although the two experimental groups should have very similar average characteristics initially, there may in fact be differences between them, either by chance or because of different patterns of sample attrition for treatment and control groups. If these differences are fully reflected in the observed initial characteristics of the sample, the regression model can control for such differences between the groups. Second, the ratio of treatments to controls differs across sites, ranging from 1:1 to 2:1. If sample members differ across sites, the treatment/control differences in mean outcomes will reflect not only effects of channeling but the different distributions as well. Again, regression will control for these differences.12 Finally, to the extent that outcomes are related to baseline or screen characteristics, regression can explain some of the variation between individuals, leading to more precise estimates of channeling impacts than are obtained from differences in means.13

The regression model used was

 (1)   Y = ao + aBTB + aFTF + asS + axX + e, 

where Y is the outcome variable that is hypothesized to be affected by channeling; TB and TF are binary variables equal to one for sample members in the basic (B) and financial control (F) sites; S is a set of binary site variables; X is a set of explanatory variables taken from the screen or baseline interviews; e is a disturbance term, and the a's are coefficients to be estimated. Under this model the coefficients aB and aF measure the treatment/control differences in mean outcomes, controlling for any differences which exist between the two groups on baseline explanatory variables. Hence, aB and aF are our estimates of channeling impacts.14

The same regression model was used to estimate the impacts of channeling. on all outcome measures examined in the evaluation. Although it may seem unlikely that the factors which affect well-being (for example) are exactly the same as the factors which affect nursing home use and other outcomes, there exists a strong justification for this approach. The outcome variables are highly interrelated and depend on each other as well as on many of the same exogenous variables. However, the interrelationship of the outcome variables is very complex, and trying to model it could lead to biased estimates, since some of the explanatory variables would be endogenous (i.e., correlated with the disturbance term e in the regression).15 Furthermore, we are interested first and foremost in the total effect of channeling on outcome variables, and are not particularly interested in how much of the impact on well-being (in our example) was indirectly due to channeling's effect on nursing home use and how much was due directly to the case management services provided by channeling. Therefore, we estimate the "reduced form" equation for the outcome variables. In the reduced form all explanatory variables must be exogenous (baseline/screen) variables and any exogenous variable that affects any of the interrelated outcome variables of interest is included. Thus, the explanatory variables in the reduced form include any baseline or screen variables that directly or indirectly affect the particular outcome variable being examined.16

The advantage of this approach is that we need not make arbitrary exclusions of explanatory variables from some outcome equations but not others. Including in the regression explanatory variables that do not really affect a given outcome variable, directly or indirectly, does not bias the estimates of channeling impact, and given the large number of observations available, has no discernible effect on the standard errors of the estimates. On the other hand, excluding from the regression equation explanatory variables that do affect the outcome of interest can lead to biased estimates. Thus, the reduced form approach is more likely to yield unbiased estimates of channeling impacts than would specifications in which the set of control variables assumed to affect a particular outcome variable is arbitrarily restricted. This approach has the added benefit of providing consistency across the many analyses of channeling impacts that were conducted by different individuals at different points in time, and associated economies in estimation through standardizing estimation programs.

The explanatory variables that were used in the regression model fell into six categories:

  • Sample member's absolute level of need for assistance due to physical or mental disabilities
  • The availability of informal caregivers to provide this assistance
  • The amount of formal care received by sample members at baseline
  • The sample member's ability to pay for additional services or nursing home care
  • The availability of nursing home beds and other area-specific factors
  • The sample member's -outlook on life and demographic characteristics.

These six categories of characteristics were represented in the regression model by variables obtained from the baseline and screen interviews. Need for assistance was reflected by sample members' impairment on activities of daily living (ADL) tasks (eating, dressing, toileting, mobility, bathing), continence, whether they had a recent change in health condition, whether they were cognitively impaired (i.e., whether they had behavioral problems or were disoriented), the number of unmet needs for assistance that they had and expected to continue for 6 months or more, the number of physician visits in the two months prior to baseline, and whether the individual was referred to channeling by a hospital or nursing home or home health agency.17 We also included variables indicating whether the sample member completed the baseline without help from a proxy, required some help from a proxy, or required a proxy to complete the entire baseline.

The availability of informal care was captured by two variables: sample members' living arrangement and the number of hours of care they were receiving care from visiting informal caregivers during a typical week at the time of the baseline. Living arrangement was defined by whether sample members lived alone but were receiving informal care at baseline, lived alone without such care, lived with one of their children, or lived with someone but not with their child.

The receipt of formal care is also represented by two variables: whether such care was received from visiting caregivers, and the number of hours of in-home care received from visiting formal caregivers during a "typical" week at the time of the baseline.

Sample members' outcomes also depended on the availability of hospital and nursing home beds and on other area characteristics such as the availability of formal services and case management, population density, what services the state Medicaid program covered, and any other city, state or regional differences that could affect outcomes. Since the area characteristics faced were the same for all sample members residing in a given site, binary variables indicating in which site the sample member resided were sufficient to capture the effects of any such differences across sites.18 Hence, we included 9 binary site variables in the regression.19

In addition to the amount of services received or available at baseline, another important factor affecting outcomes was the ability to pay for additional services, either in the community or in an institution. To capture these effects we included variables for whether sample members were eligible for Medicaid at baseline or would be eligible within a short period of time after entering a nursing home, based on their current income and assets.20 Whether sample members were homeowners was also included as a measure of their wealth.

The attitudes of elderly individuals are also important in explaining outcomes; hence, we included the baseline measure of sample members' overall satisfaction with life. Variables indicating whether the sample members had already applied for admission to a nursing home or were in a nursing home at the screen were included, because they indicate individuals' predisposition towards institutionalization. Also included was a binary variable indicating whether the sample member had lost a close friend or relative to death within the two months prior to baseline, since major losses are felt by many to have serious effects on elderly individuals' health.

Gender, age, and ethnic background are demographic variables included in virtually every study of the impaired elderly. There may be differences between elderly men and women in ability to care for themselves, in the difficulty caregivers face in caring for them, and in the likelihood that they will have a surviving spouse to care for them. Age is included because individuals' health deteriorates with age. Furthermore, the older a sample member is, the older his or her children and friends are likely to be and the less able to provide informal care. Ethnicity was included to capture any cultural differences in the intergenerational dependency, informal support systems, or attitudes toward nursing homes of the aged.

TABLE III.1. Mean Values of Explanatory Variables Used in Regression Model
Variable Mean Variable Mean
Need for Assistance Availability of Informal Care
   ADL Impairment (S)      Living Arrangement (B)  
      Extremely Severe   0.233         Lives Alone, No Informal Support   0.073  
      Severe 0.348       Lives Alone, Informal Support 0.282
      Moderate Impairment 0.223       Lives with Child 0.251
      Mild or No Impairment 0.196       (Lives with Someone Other than Child) 0.377
   Incontinence (S)         Missing Information 0.016
      Incontinent 0.472    Hours of Care Received per Week from Visiting Informal Caregiver (B)   12.0
      Needs Help with Colostomy Bag or Other Device 0.102 Demographic Characteristics and Attitudes
      (Continent) 0.426    Whether Male (B) 0.285
   Cognitive Impairment (S)      Age (B) 79.6
      Severe 0.153    Ethnicity (S)  
      (Moderate Impairment) 0.318       Black 0.223
      Mild or No Impairment 0.471       Hispanic 0.037
      Missing Data 0.058       (White or Other) 0.740
   Unmet Needs (S)         Whether Currently Married (B) 0.318
      High Unmet Needs 0.303    Overall Satisfaction with Life (B)  
      (Moderate Unmet Needs) 0.340       Completely 0.117
      Low Unmet Needs 0.302       (somewhat) 0.248
      Missing Data 0.054       Not Very 0.288
   Whether Experienced Recent Change in Health (S) 0.818       Missing Data 0.348
   Whether Death of Close Friend or Relative Other Than Spouse (S)     Ability to Pay for Care
      Death of Close Person 0.244       Whether Home Owner (B)   0.421
      (No Death) 0.406       Medicaid Coverage (B)  
      Missing Data 0.350       Currently Eligible 0.226
   Referral Source (S)         Eligible Within 3 Months 0.304
      Hospital or Nursing Home 0.297       (Not Eligible in 3 Months) 0.401
      Home Health Agency 0.173       Missing Information 0.069
      (Other) 0.531 Site
   Number of Physician Visits in Previous Two Months (B) 1.7    Basic  
   Whether Waitlisted or Applied to Nursing Home, or in Nursing Home at Screen (B)   0.097       Baltimore 0.108
   Type of Respondent at Baseline         Eastern Kentucky 0.079
      Self Respondent 0.417       Houston 0.111
      (Mixed Proxy/Self Respondent) 0.298       Middlesex County 0.112
      All Proxy Respondent 0.285       (Southern Maine) 0.079
Receipt of Formal Services    Financial Control  
   Whether Received Formal In-Home Care (B) 0.600       Cleveland 0.093
   Hours of Formal I-Home Care Received per Week (B) 7.3       Greater Lynn 0.96
Model       Miami 0.118
   (Basic) 0.488       Philadelphia 0.140
   Financial Control 0.512       (Rensselaer County) 0.065
NOTE: Means were computed for the Medicare sample, the largest analysis sample (N = 5,554) employing these standard control variables (see test for description of this sample). Letters in parentheses following variable names indicate whether data used were from the baseline (B) or screen (S) interviews. For variables represented by a set of binary indicators (e.g., ADL) one of the categories must be excluded from the regression to avoid perfect colinearity. Parentheses indicate which category was excluded, although this choice has no bearing on the estimates of treatment/control differences.

The means of the variables included in the model are given in Table III.1. All variables were obtained from the screen or baseline interviews, as designated in the table. Most of the variables are binary and self-explanatory. However, a few require some explanation. Impairment on activities of daily living (ADL) was defined according to sample members' most serious impairment, using the following hierarchy: eating, transfer or toileting, dressing, bathing. Thus, sample members impaired on eating were classified as extremely severe, those whose most serious impairment was transfer or toileting were severely impaired, those whose most serious impairment was dressing were moderately impaired, and others were classified as mildly impaired. Cognitive impairment was defined by whether sample members at screen exhibited behavioral problems or disorientation that required constant supervision (severe cognitive impairment), had behavioral problems that did not require daily supervision (moderate impairment), or had only mild or occassional problems with disorientation (mild or no cognitive impairment). Unmet needs was simply a count of the number of areas (0 to 5) in which the sample member needed more help and expected this need to continue for six months or more. "Change in health status" is a binary variable indicating whether the sample member reported experiencing the onset or worsening of any of several health conditions or illnesses.

Observations lacking data on one or more of the control variables were retained in the analysis by imputing values for missing variables. Data on some of the control variables were available from both the screen and baseline interviews; if data from the primary source for these variables was missing, 'values were imputed from the other source. Sample means were imputed in instances where no data were present on the desired variable from either the screen or the baseline, provided that less than 3 percent of the sample required imputation on that variable.21 If more than 3 percent of the sample were missing data on a particular variable, zero values were imputed and a separate binary variable was created, indicating for which observations the data were missing on the control variable. This missing data indicator was included in the regression equation to capture differences in outcomes between those with and without available data on a particular control variable.

View full report


"methodes.pdf" (pdf, 2.16Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®