Policy Research for Front of Package Nutrition Labeling: Developing and Testing a Summary System Algorithm. 4.1.4 Regression Models


To determine the ability of algorithm scores to predict overall dietary quality, we calculated a composite algorithm score for each of the 16,587 individuals based on foods consumed in NHANES 2005-2008 based on 1-day intakes for each individual. To calculate the composite algorithm score for each individual, we summed the algorithm scores for each food consumed on the survey day and divided by the number of 100 kcal units or RACC servings in the daily intake. A detailed example of the calculated composite algorithm score for an individual is provided in Appendix C. HEI scores, which measure diet quality based on the 2005 DGA, were calculated using methods described by Guenther et al. (2007). Details of the HEI calculation are presented in Appendix D. Composite algorithm scores were then regressed against HEI scores. The beta-coefficient for the algorithm score and R2 were generated using SUDAAN (Version 10) procedure Proc Regress, which accounts for the complex sampling design of NHANES by including information supplied in the NHANES data set: day 1 dietary weights, strata, and primary sampling units (PSU). Covariates included age (as continuous variable), gender, and ethnicity (as five categories: Mexican American, other Hispanic, non-Hispanic white, non-Hispanic black, and other race). R2 values of the models (e.g., the coefficient of determination that measures the proportion of variation explained by the model) were examined. Because of the large sample size in NHANES 2005-2008, we expect that small differences in R2 between two models could be statistically different. It was not feasible to statistically test the difference in R2 between the various models; however, based on previous experience of Nutrition Impact, LCC with testing different algorithms for NRFI, we consider a 5% unit difference in R2 to be meaningful.