The CATMOD procedure, however, cannot be used with a statistical package that accounts for the complex sampling design of the 1984 NHIS-SOA. Without accounting for sampling design effects, inaccurate variance estimates and significance levels may result. Experience shows that without accounting for such complexity, the variances of the regression coefficients produced in general are likely to be underestimated on the order of 5-20%.
To gauge the magnitude of the sample design effects in this analysis, results from the SAS procedure PROC LOGIST were compared with the results from the PROC RTILOGIT procedure (Shah et al. 1984), a SAS supported logistic regression package developed specifically to account for complex sample designs when calculating variances and significance levels. Because RTILOGIT has the ability to account for only a two level dependent variable, for comparative purposes, a model for ADL dependent verses not dependent was fit. To calculate the design effects, the variances produced with the PROC RTILOGIT procedure were divided by the variances produced with the PROC LOGIST procedure. The results showed that design effects were relatively small (i.e. less than 1.2) for all the parameters of interest (i.e. age, sex, race, and age-squared). Since adjustment of the chi-square statistics produced from CATMOD by division by the design effects would not influence the clear significance of the parameters in our model, there was not a problem with the use of the CATMOD procedure; i.e., the slightly larger variance estimates likely to be produced by complex sample methods such as RTILOGIT would not alter results or conclusions.
For model testing, the database was randomly divided in half within each primary sampling unit. In the first half of the database candidate models were fit for the dependent variable. Once model development was completed, the goodness-of-fit of the model was validated in the other half of the database by three methods. First the model was run in the other half of the data set, and the goodness-of-fit of the model was evaluated with the chi-square statistics associated with the individual parameters and with the lack-of-fit statistic. As the parameter estimates remained significant (p<.001), and the lack of fit statistic remained nonsignificant (p>.25) the structure of the model appeared to fit the data quite well.
In addition, the model was run on the entire sample to test the fit of the estimated coefficients. This was done by including an indicator variable representing the half of the data set from which each observation came, as well as all of its pairwise interactions. As the parameter estimates for the indicator and each of its interactions, were non-significant (p>.25) in an overall test, goodness-of-fit of the model was supported.
Third, the goodness-of-fit of the model was evaluated by comparing the similarity of the model-predicted dependency rates with their observed counterparts in the other half of the data set. In so doing, the candidate models were used to determine the predicted values of the probability of dependence for individuals in the other half of the database. The differences between these predicted values and their true value gives a residual value for that individual. The closeness of the averages of the residuals to zero for various subgroups of individuals (e.g. males, females, different age groups, etc.) and their lack of correlation of the residuals with characteristics of individuals are indicative of goodness-of-fit. In almost all cases (28 out of 30) the t-statistic indicated that the mean value of the residuals for each of the subgroups was not significantly (p>.05) different from 0. In addition, Pearson correlations were evaluated for the residuals and each of the explanatory variables, and their low values supported the fit of the model.