
Statistical Package

The dependent variable necessitated the use of a statistical procedure that accounted for its three levels. As the variable is theoretically ordered, it seemed logical to consider using an ordered method. The use of ordered logistic regression, a method commonly used in such situations and one that corresponds to a proportional odds ratio model, therefore was evaluated. However, the structure such a model imposes on the data was found to be inappropriate. This was learned by estimating two logistic component equations: ADL or IADL dependent verses no dependency; and ADL dependent verses IADL or no dependency. While the parameter estimates for race, age, and agesquared were similar for each of the two component models and thereby compatible with the proportional odds model, the parameter estimates for sex contradicted it by differing by almost 19 fold. Thus, the proportional odds ratio model imposed by logistic regression was considered inappropriate for modelling our dependent variable.
Instead a multicategory extension of logistic regression which provides a more general structure was used. The loglinear model was fit using a SAS supported procedure designed for categorical data modeling, PROC CATMOD. For loglinear model analysis CATMOD uses maximum likelihood estimation. Given the three category dependent variable, two sets of parameter estimates were produced: one for the logged ratio of not dependent to ADL dependent, and one for the logged ratio of IADL dependent to ADL dependent. Working with these two equations simultaneously yielded a formula for each category of the dependent variable: (1) not dependent; (2) IADL dependent; and (3) ADL dependent. (See Appendix A.)


Design Effects

The CATMOD procedure, however, cannot be used with a statistical package that accounts for the complex sampling design of the 1984 NHISSOA. Without accounting for sampling design effects, inaccurate variance estimates and significance levels may result. Experience shows that without accounting for such complexity, the variances of the regression coefficients produced in general are likely to be underestimated on the order of 520%.
To gauge the magnitude of the sample design effects in this analysis, results from the SAS procedure PROC LOGIST were compared with the results from the PROC RTILOGIT procedure (Shah et al. 1984), a SAS supported logistic regression package developed specifically to account for complex sample designs when calculating variances and significance levels. Because RTILOGIT has the ability to account for only a two level dependent variable, for comparative purposes, a model for ADL dependent verses not dependent was fit. To calculate the design effects, the variances produced with the PROC RTILOGIT procedure were divided by the variances produced with the PROC LOGIST procedure. The results showed that design effects were relatively small (i.e. less than 1.2) for all the parameters of interest (i.e. age, sex, race, and agesquared). Since adjustment of the chisquare statistics produced from CATMOD by division by the design effects would not influence the clear significance of the parameters in our model, there was not a problem with the use of the CATMOD procedure; i.e., the slightly larger variance estimates likely to be produced by complex sample methods such as RTILOGIT would not alter results or conclusions.
For model testing, the database was randomly divided in half within each primary sampling unit. In the first half of the database candidate models were fit for the dependent variable. Once model development was completed, the goodnessoffit of the model was validated in the other half of the database by three methods. First the model was run in the other half of the data set, and the goodnessoffit of the model was evaluated with the chisquare statistics associated with the individual parameters and with the lackoffit statistic. As the parameter estimates remained significant (p<.001), and the lack of fit statistic remained nonsignificant (p>.25) the structure of the model appeared to fit the data quite well.
In addition, the model was run on the entire sample to test the fit of the estimated coefficients. This was done by including an indicator variable representing the half of the data set from which each observation came, as well as all of its pairwise interactions. As the parameter estimates for the indicator and each of its interactions, were nonsignificant (p>.25) in an overall test, goodnessoffit of the model was supported.
Third, the goodnessoffit of the model was evaluated by comparing the similarity of the modelpredicted dependency rates with their observed counterparts in the other half of the data set. In so doing, the candidate models were used to determine the predicted values of the probability of dependence for individuals in the other half of the database. The differences between these predicted values and their true value gives a residual value for that individual. The closeness of the averages of the residuals to zero for various subgroups of individuals (e.g. males, females, different age groups, etc.) and their lack of correlation of the residuals with characteristics of individuals are indicative of goodnessoffit. In almost all cases (28 out of 30) the tstatistic indicated that the mean value of the residuals for each of the subgroups was not significantly (p>.05) different from 0. In addition, Pearson correlations were evaluated for the residuals and each of the explanatory variables, and their low values supported the fit of the model.

View full report
"smareaes.pdf" (pdf, 2.82Mb)
Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®