Methodological Issues in the Evaluation of the National Long Term Care Demonstration. B. The Comparability of Baseline Data for Treatment and Control Groups


Another aspect of the evaluation design which could have raised questions about the accuracy of the estimates of channeling impacts was that the baseline data were collected by different types of interviewers for the treatment and control groups. The combination of several factor--conflicts between research needs and good case management practices, data collection costs, and the desire to minimize the burden on sample members--led to the decision that baseline data would be collected by channeling staff for members of the treatment group, and by research interviewers for the control group. For a variety of reasons, this difference in data collection could result in differences between the two groups on observed data for some characteristics, when in fact no real differences exist between the two groups on these baseline characteristics. Estimates of channeling impacts that are obtained from regression models which use these baseline data as explanatory variables could then be distorted, because these artificial differences between the two groups are treated as real pre-treatment differences that must be accounted for (netted out) by the regression.

Brown and Mossel (1984) conducted an extensive analysis to determine whether the baseline data for treatments and controls were comparable and, if not, what needed to be done to ensure that regression estimates of channeling impacts would not be biased by such differences. Reasons why baseline data may differ for the two groups were identified, including:

  • True differences at randomization due to chance
  • True differences due to different patterns of attrition between randomization and baseline
  • Spurious differences due to differences in the length of time between randomization and baseline for the two groups
  • Spurious differences due to incentives of clients or their proxy respondents to overreport needs and impairments to channeling staff (who used the baseline to prepare a care plan for the client), and to underreport ability to pay for needed services
  • Spurious differences due to differences between research interviewers and channeling staff in how questions were asked (including clarifications and probing), and how answers were recorded
  • Treatment-induced differences due to anticipated or actual effects of channeling on the treatment group prior to baseline (and known lack of assistance from channeling for the control group)
  • Spurious differences due to the differential usage of proxy respondents

As indicated in the previous section, comparison of treatment and control groups on screen variables for the full sample indicated virtually no differences outside the range of normal chance variation. Comparison of the screen characteristics of treatments and controls for baseline respondents indicated that attrition at baseline had led to very few differences between the remaining treatment and control groups. A model of baseline attrition confirmed that only for a few screen variables was the relationship between sample member characteristics and the probability of response significantly different between treatment and control groups.

Despite the overwhelming evidence, based on screen characteristics, that there were essentially no true treatment/control differences at randomization due to chance, and only minor differences due to differential attrition, Brown and Mossel (1984) found a substantial number of large and statistically significant differences between the two groups on baseline variables, including some of the same variables for which no differences were found on the screen. Although real differences between the two groups (either due to differential attrition, or to pre-existing differences not detected by screen measures) could not be ruled out entirely, they concluded that differential measurement was largely responsible for the observed baseline differences between treatments and controls. This conclusion was based on several pieces of evidence:

  • The finding that very few screen variables exhibited statistically significant differences between treatments and controls among baseline respondents
  • The finding that few screen variables exhibited a significantly different impact on the probability of baseline response for treatments than for controls
  • The many statistically significant and occasionally very large treatment/control differences found on baseline variables, including some for which no difference was found on the screen version of the same variable
  • The general correspondence of results with a priori expectations about which variables were likely to be affected by noncomparable measurement and the direction of the treatment/control differences
  • The timing and proxy use differences that were known to exist at baseline and which were obviously responsible for the observed differences on some of the baseline variables and probably responsible for the differences on some others
  • The general correspondence of treatment/control differences at baseline with baseline-reinterview differences observed for a subsample of treatment group members who were given a second baseline by research interviewers

Brown and Mossel then showed how regression estimates of channeling impacts would be affected by the use of noncomparable data items as explanatory variables in the regression. The expressions for bias induced by noncomparable data suggested two types of tests of baseline variables to determine whether the baseline differences were so large that it was unlikely that they represented true treatment/control differences and therefore might cause significant bias in estimates of channeling impacts, or small enough that they may well be due to chance and were unlikely to affect impact estimates. The two tests-one for baseline variables for which comparable measures were available on the screen, and one for variables that had no such screen counterparts-made use of all of the available information.

For baseline variables with screen counterparts, the test was for whether the treatment/control differences at baseline were significantly different from the treatment/control differences in the screen version of the variable for the same individuals. For those variables for which the hypothesis of no differential was rejected, the baseline version of the variable was considered noncomparable, and only the screen version was used in future analyses. Variables for which no significant differential was found were considered to be comparably measured at baseline and therefore the baseline version could be included as a control variable in later analyses. The conclusions based on this procedure were then compared to the results obtained from the reinterview sample, which were based on comparison of baseline and reinterview responses on these same questions. The two sets of results were found to be broadly consistent in terms of which variables appeared to be noncomparable, and the direction of the differences.

For baseline variables that had no screen counterpart, the procedure used was to regress these variables on treatment status, site, and the variables selected from the group with screen counterparts, and test whether the coefficients on the two treatment status variables (for basic and financial control models) were significantly different from zero. This was a test of whether there were treatment/control differences in these baseline variables beyond what could be explained by the small observed differences at screen in a set of other variables. Variables for which this hypothesis was rejected were then considered noncomparable, under the assumption (based on the evidence cited above) that any such remaining differences were more likely to be due to noncomparable data rather than real differences. Again, the results obtained were found to be broadly consistent with the reinterview sample comparisons of baseline and reinterview responses.

The two sets of tests yielded the following conclusions regarding the comparability of the baseline variables that were used as control variables in a preliminary analysis of channeling impacts at 6 months (Kemper et al., 1985) and were then being considered for use in the final analyses:

  Comparable Baseline Variables     Noncomparable Baseline Variables  
Age (*)
Sex (*)
Insurance (*)
Living arrangement (*)
Nursing home waiting list (*)
Home ownership
Stressful events
Hours of informal care received (per week)  
Hours of formal care received (per week)
Number of physician visits
Global life satisfaction
Ethnicity (*)
Income (*)
ADL (*)
Unmet needs (*)
Attitude toward nursing home  
Medical conditions
Self-rating of health
Restricted days last 2 months
Hospital days last 2 months
Nursing home days last 2 months  
(*) indicates that a screen version of the variable exists

Only those baseline variables found to be comparable were included as control variables in the final channeling analyses. For noncomparable baseline variables with screen counterparts, the screen version was used as a control variable in its place. The other noncomparable baseline variables were excluded from the set of control variables, with the exception of hospital and nursing home days, which were replaced with information from the screen on whether the sample member was in a hospital or nursing home at screen or referred to channeling by hospital or nursing home staff.

The exclusion of the noncomparable variables is not likely to have caused serious problems for the analysis. Estimates of channeling impacts obtained from regressions with control variables drawn only from the screen were found to be different for some outcome measures from those obtained from regressions using the (comparable and noncomparable) baseline control variables, as expected, but the standard errors of these impact estimates were virtually unaffected by this difference in regressors. Thus, the argument that increased precision would be obtained if the more complete baseline data were used as control variables was not borne out in this case. It is the case, however, that any attrition-induced differences between treatments and controls on excluded characteristics were not controlled for in estimating channeling impacts. The evidence' in Brown and Mossel suggests that real differences between the two groups are likely to be considerably smaller than the observed differences in the data. Thus, failure to control for such real differences, if they exist, is likely to have caused less bias than attempting to account for them by using control variables that were not comparably measured for the two groups. However, the inability to examine impacts for subgroups defined on potentially important but noncomparable variables such as SPMSQ, medical conditions, attitude toward nursing home, and IADL weakens that analysis.

View full report


"methodes.pdf" (pdf, 2.16Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®