Policy Research for Front of Package Nutrition Labeling: Developing and Testing a Summary System Algorithm. 4.4.2 Maximum R-Square Approach for Calculating Nutrient Density Scores

05/01/2011

After running the analysis of the proposed algorithms for RTI, Nutrition Impact, LLC developed a new approach to decide which nutrients to include in a nutrient density index. The method involves examining the various nutrient combinations in models with individual nutrient intake values as independent variables rather than as a composite score. Nutrient intake values were capped at 100% of the daily value. HEI was the dependent variable, and other covariates were age, gender, and ethnicity as defined previously. The MAXimum R2 (MAXR) option in some SAS procedures allows for examining every possible combination of variables of interest. However, a MAXR option is not available in the SurveyReg procedure necessary to analyze NHANES data, so a unique macro was developed to assess every possible combination of the 17 nutrients or components included in our testing. This method identifies the best one-variable model producing the highest R2, the best two-variable model, etc. The MAXR approach is different from stepwise regression in that it evaluates the possible switching of the order of variables entered into the model, which can affect the model results. These analyses resulted in evaluation of 131,072 regression models for nutrients or components expressed as per 100 kcal and the same number of models with the nutrients or components expressed on a per RACC basis. Adjusted R2 values were used to compare the various models because the number of variables in the model affects the R2.

The MAXR analyses were conducted with some modifications. We conducted the analyses with three sets of nutrients/components: (1) with whole grains (17 nutrients/components), (2) without whole grains (16 nutrients/components), and (3) with total sugars in place of added sugars (17 nutrients/components). We proceeded with further testing of algorithms without whole grains and with added sugars after consultation with the ASPE/FDA team. Tables 4-3 and 4-4 present the beta coefficients and p-values, respectively, from regressions yielding the best models containing 1 through 16 nutrients or food components as a percentage of recommended intake levels per 100 kcal. The third columns of Tables 4-3 and 4-4 show the adjusted R2 for the models. Tables 4-5 and 4-6 present the regression results for models for nutrients or components as a percentage of recommended intake levels per RACC. (Results of the analyses with whole grains and with total sugars are presented in Appendix G.)


Table 4-3. Beta Coefficients from Regression Models Using the Maximum R2 Option Using Nutrient or Component Values as a Percentage of Recommended Intake Levels per 100 Kcala
Number
of
Variables
R-
Square
Adjusted
R-Square
Protein Fiber Vitamin
E
Vitamin
D
Calcium Iron Potas-
sium
Unsatu-
rated
Fat
Magne-
sium
Vitamin
A
Vitamin
C
Folic
Acid
Vitamin
B12
Satu-
rated
Fat
Sodium Added
Sugar
a Nutrient intakes and HEI scores are from 1-day dietary intakes for 16,587 participants in NHANES 2005-2008.
- Means no data.
0 0.0417 0.0413 - - - - - - - - - - - - - - - -
1 0.3657 0.3655 - 5.3217 - - - - - - - - - - - - - -
2 0.4346 0.4343 - 4.6677 - - - - - - - - - - - −1.8039 - -
3 0.5212 0.5209 - 3.5910 - - - - - - - - - - - −2.3942 - −0.8846
4 0.5526 0.5524 - 3.5525 - 1.2382 - - - - - - - - - −2.4271 - −0.8191
5 0.5819 0.5816 1.4179 3.5979 - - - - - - - - - - - −2.3481 −1.3151 −0.7920
6 0.6081 0.6078 1.5858 3.7412 - - - - - 1.7797 - - - - - −2.7342 −1.4101 −0.6454
7 0.6330 0.6328 1.3746 3.5073 - - 1.1420 - - 2.4082 - - - - - −3.0957 −1.3673 −0.5568
8 0.6476 0.6473 1.3963 3.1259 - - 1.0006 - - 2.5145 - - 0.3651 - - −2.9478 −1.3350 −0.5164
9 0.6519 0.6516 1.2879 3.1872 - 0.5950 0.6705 - - 2.4946 - - 0.3655 - - −2.8976 −1.2675 −0.5204
10 0.6536 0.6533 1.3078 2.8759 - - 0.8162 - 0.8945 2.6864 - - 0.3159 0.4951 - −2.8498 −1.3595 −0.4647
11 0.6556 0.6552 1.2476 2.9657 - 0.4258 0.6125 - 0.7275 2.6423 - - 0.3255 0.4127 - −2.8308 −1.3072 −0.4768
12 0.6565 0.6561 1.2723 3.0880 - 0.3824 0.7285 - 0.9552 2.6759 −0.5357 - 0.3102 0.4037 - −2.9089 −1.3266 −0.4890
13 0.6569 0.6566 1.3278 3.0675 - 0.4769 0.7201 - 0.9772 2.6714 −0.5278 - 0.3115 0.4800 −0.1722 −2.8825 −1.3307 −0.4807
14 0.6574 0.6570 1.3088 3.1120 - 0.3569 0.6839 −0.3275 0.9391 2.6825 −0.4921 0.2453 0.3034 0.5766 - −2.9330 −1.3146 −0.4879
15 0.6577 0.6572 1.3442 3.0821 - 0.4281 0.6805 −0.2609 0.9569 2.6762 −0.5003 0.2509 0.3033 0.5957 −0.1392 −2.9116 −1.3212 −0.4833
16 0.6579 0.6574 1.3505 3.0722 0.2420 0.4419 0.6843 −0.2546 0.9741 2.5964 −0.5626 0.2185 0.2926 0.5862 −0.1544 −2.8918 −1.3188 −0.4838



Table 4-4. P-Values from Regression Models Using the Maximum R2 Option Using Nutrient or Component Values as a Percentage of Recommended Intake Levels per 100 Kcala
Number
of
Variables
R-
Square
Adjusted
R-Square
Protein Fiber Vitamin
E
Vitamin
D
Calcium Iron Potas-
sium
Unsatu-
rated
Fat
Magne-
sium
Vitamin
A
Vitamin
C
Folic
Acid
Vitamin
B12
Satu-
rated
Fat
Sodium Added
Sugar
a Nutrient intakes and HEI scores are from 1-day dietary intakes for 16,587 participants in NHANES 2005-2008.
- Means no data.
0 0.0417 0.0413 - - - - - - - - - - - - - - - -
1 0.3657 0.3655 - 0.0000 - - - - - - - - - - - - - -
2 0.4346 0.4343 - 0.0000 - - - - - - - - - - - 0.0000 - -
3 0.5212 0.5209 - 0.0000 - - - - - - - - - - - 0.0000 - 0.0000
4 0.5526 0.5524 - 0.0000 - 0.0000 - - - - - - - - - 0.0000 - 0.0000
5 0.5819 0.5816 0.0000 0.0000 - - - - - - - - - - - 0.0000 0.0000 0.0000
6 0.6081 0.6078 0.0000 0.0000 - - - - - 0.0000 - - - - - 0.0000 0.0000 0.0000
7 0.6330 0.6328 0.0000 0.0000 - - 0.0000 - - 0.0000 - - - - - 0.0000 0.0000 0.0000
8 0.6476 0.6473 0.0000 0.0000 - - 0.0000 - - 0.0000 - - 0.0000 - - 0.0000 0.0000 0.0000
9 0.6519 0.6516 0.0000 0.0000 - 0.0000 0.0000 - - 0.0000 - - 0.0000 - - 0.0000 0.0000 0.0000
10 0.6536 0.6533 0.0000 0.0000 - - 0.0000 - 0.0002 0.0000 - - 0.0000 0.0000 - 0.0000 0.0000 0.0000
11 0.6556 0.6552 0.0000 0.0000 - 0.0000 0.0000 - 0.0007 0.0000 - - 0.0000 0.0000 - 0.0000 0.0000 0.0000
12 0.6565 0.6561 0.0000 0.0000 - 0.0000 0.0000 - 0.0000 0.0000 0.0187 - 0.0000 0.0000 - 0.0000 0.0000 0.0000
13 0.6569 0.6566 0.0000 0.0000 - 0.0000 0.0000 - 0.0000 0.0000 0.0188 - 0.0000 0.0000 0.0023 0.0000 0.0000 0.0000
14 0.6574 0.6570 0.0000 0.0000 - 0.0000 0.0000 0.0006 0.0000 0.0000 0.0305 0.0372 0.0000 0.0000 - 0.0000 0.0000 0.0000
15 0.6577 0.6572 0.0000 0.0000 - 0.0000 0.0000 0.0127 0.0000 0.0000 0.0258 0.0356 0.0000 0.0000 0.0255 0.0000 0.0000 0.0000
16 0.6579 0.6574 0.0000 0.0000 0.3175 0.0000 0.0000 0.0156 0.0000 0.0000 0.0139 0.0845 0.0000 0.0000 0.0150 0.0000 0.0000 0.0000



Table 4-5. Beta Coefficients from Regression Models Using the Maximum R2 Option Using Nutrient or Component Values as a Percentage of Recommended Intake Levels per RACCa
Number
of
Variables
R-
Square
Adjusted
R-Square
Protein Fiber Vitamin
E
Vitamin
D
Calcium Iron Potas-
sium
Unsatu-
rated
Fat
Magne-
sium
Vitamin
A
Vitamin
C
Folic
Acid
Vitamin
B12
Satu-
rated
Fat
Sodium Added
Sugar
a Nutrient intakes and HEI scores are from 1-day dietary intakes for 16,587 participants in NHANES 2005-2008.
- Means no data.
0 0.0417 0.0413 - - - - - - - - - - - - - - - -
1 0.2074 0.2071 - 3.9647 - - - - - - - - - - - - - -
2 0.4036 0.4033 - 4.7469 - - - - - - - - - - - −2.0714 - -
3 0.4733 0.4730 - 4.5968 - - - - - - - - - - - −1.7867 - −0.6530
4 0.5218 0.5216 - 3.1268 - - - - 3.1408 - - - - - - −2.1353 - −0.5858
5 0.5416 0.5413 - 2.8474 - - - - 3.1095 1.3402 - - - - - −2.7629 - −0.6115
6 0.5655 0.5652 - 3.1201 - - - - 3.3032 1.7655 - - - - - −2.5132 −1.0466 −0.6171
7 0.5883 0.5880 1.4790 3.6687 - - - - - 1.6051 - - 0.4734 - - −2.5592 −1.5007 −0.6410
8 0.6042 0.6039 1.1864 3.4481 - - 0.9462 - - 1.9831 - - 0.3892 - - −2.9859 −1.4494 −0.6371
9 0.6066 0.6063 1.0272 3.1973 - - 0.8442 - 0.9134 1.9644 - - 0.3236 - - −2.9811 −1.4161 −0.6241
10 0.6080 0.6076 1.0243 3.1658 - - 0.7168 - 0.8498 1.9767 - 0.4500 0.3133 - - −2.9757 −1.4145 −0.6260
11 0.6084 0.6080 1.0624 3.2439 - - 0.7307 −0.2021 0.8251 1.9845 - 0.5244 0.3162 - - −2.9937 −1.3871 −0.6177
12 0.6093 0.6088 1.0874 3.2450 - - 0.6794 −0.4727 0.9240 2.0045 - 0.5220 0.3139 0.3325 - −2.9897 −1.4063 −0.6167
13 0.6095 0.6091 1.0799 3.3072 - 0.1796 0.5987 −0.4655 0.8039 1.9986 - 0.4698 0.3248 0.3088 - −2.9727 −1.3880 −0.6172
14 0.6097 0.6092 1.0814 3.2919 0.2105 0.1926 0.5910 −0.4675 0.7920 1.9330 - 0.4339 0.3196 0.3009 - −2.9493 −1.3830 −0.6155
15 0.6099 0.6094 1.1092 3.3693 0.2896 0.1878 0.6276 −0.4352 0.9035 1.9384 −0.3446 0.4369 0.3132 0.2870 - −2.9663 −1.4041 −0.6181
16 0.6099 0.6094 1.1021 3.3763 0.2832 0.1729 0.6297 −0.4505 0.8976 1.9441 −0.3368 0.4352 0.3138 0.2855 0.0291 −2.9710 −1.4014 −0.6182



Table 4-6. P-Values from Regression Models Using the Maximum R2 Option Using Nutrient or Component Values as a Percentage of Recommended Intake Levels per RACCa
Number
of
Variables
R-
Square
Adjusted
R-Square
Protein Fiber Vitamin
E
Vitamin
D
Calcium Iron Potas-
sium
Unsatu-
rated
Fat
Magne-
sium
Vitamin
A
Vitamin
C
Folic
Acid
Vitamin
B12
Satu-
rated
Fat
Sodium Added
Sugar
a Nutrient intakes and HEI scores are from 1-day dietary intakes for 16,587 participants in NHANES 2005-2008.
- Means no data.
0 0.0417 0.0413 - - - - - - - - - - - - - - - -
1 0.2074 0.2071 - 0.0000 - - - - - - - - - - - - - -
2 0.4036 0.4033 - 0.0000 - - - - - - - - - - - 0.0000 - -
3 0.4733 0.4730 - 0.0000 - - - - - - - - - - - 0.0000 - 0.0000
4 0.5218 0.5216 - 0.0000 - - - - 0.0000 - - - - - - 0.0000 - 0.0000
5 0.5416 0.5413 - 0.0000 - - - - 0.0000 0.0000 - - - - - 0.0000 - 0.0000
6 0.5655 0.5652 - 0.0000 - - - - 0.0000 0.0000 - - - - - 0.0000 0.0000 0.0000
7 0.5883 0.5880 0.0000 0.0000 - - - - - 0.0000 - - 0.0000 - - 0.0000 0.0000 0.0000
8 0.6042 0.6039 0.0000 0.0000 - - 0.0000 - - 0.0000 - - 0.0000 - - 0.0000 0.0000 0.0000
9 0.6066 0.6063 0.0000 0.0000 - - 0.0000 - 0.0000 0.0000 - - 0.0000 - - 0.0000 0.0000 0.0000
10 0.6080 0.6076 0.0000 0.0000 - - 0.0000 - 0.0001 0.0000 - 0.0209 0.0000 - - 0.0000 0.0000 0.0000
11 0.6084 0.6080 0.0000 0.0000 - - 0.0000 0.0375 0.0001 0.0000 - 0.0048 0.0000 - - 0.0000 0.0000 0.0000
12 0.6093 0.6088 0.0000 0.0000 - - 0.0000 0.0013 0.0000 0.0000 - 0.0043 0.0000 0.0017 - 0.0000 0.0000 0.0000
13 0.6095 0.6091 0.0000 0.0000 - 0.0422 0.0000 0.0010 0.0003 0.0000 - 0.0111 0.0000 0.0021 - 0.0000 0.0000 0.0000
14 0.6097 0.6092 0.0000 0.0000 0.4645 0.0211 0.0000 0.0012 0.0003 0.0000 - 0.0115 0.0000 0.0040 - 0.0000 0.0000 0.0000
15 0.6099 0.6094 0.0000 0.0000 0.2161 0.0177 0.0000 0.0002 0.0000 0.0000 0.4285 0.0095 0.0000 0.0023 - 0.0000 0.0000 0.0000
16 0.6099 0.6094 0.0000 0.0000 0.2222 0.0296 0.0000 0.0002 0.0000 0.0000 0.4399 0.0099 0.0000 0.0027 0.6419 0.0000 0.0000 0.0000

The best one-nutrient variable model is fiber, explaining 36.6% of the variance in HEI scores on a per 100 kcal basis and 20.7% on a per RACC basis. The highest R2 two-nutrient variable model included fiber and saturated fat, and the highest R2 three-nutrient variable model included fiber, saturated fat, and added sugars. There was a consistent increase in adjusted R2 up until about eight nutrient/component variables, explaining approximately 65% of the variation in HEI on a per 100 kcal basis and 60% on a per RACC basis, which is more than a 40% improvement from our original baseline algorithm presented in Table 4-1. These eight nutrients/components were protein, fiber, calcium, unsaturated fat, vitamin C, saturated fat, sodium, and added sugars, for both per 100 kcal and per RACC basis. Supplementary analyses were conducted with the addition of whole grains and the replacement of added sugars with total sugars. The results of these supplementary analyses are reported in Appendix G.

The results of these analyses (Tables 4-3 through 4-6) could be used to help identify nutrients to include in a nutrient density algorithm. The beta coefficients signify a relative importance of nutrients in predicting the HEI and could potentially be considered as weighting factors. However, these data should be interpreted with several factors in mind.

First, the calculation method of the HEI score will influence the strength of the associations with nutrients. Second, foods are complex and contain a combination of positive and negative nutrients. For example, the beta coefficients in Table 4-3 for magnesium and vitamin B12 are negative, which may be due to their association with high-meat diets that are high in saturated fat and thus are associated with lower HEI scores. Saturated fat is weighed heavily in the negative direction in the HEI scoring, because it is included both as a nutrient and in the "calories from solid fat" component. Fortified breakfast cereals are also a primary source of vitamin B12, and the associated added sugar may also be driving the negative coefficient for vitamin B12.

Nevertheless, these nutrients do not seem to be contributing more information to the overall explanation of HEI, because the R2 of the models with magnesium and vitamin B12 (13- to 16-nutrient/component models) is not much higher than the eight- to nine-nutrient/component models.

The MAXR approach does allow further insight into the nutrients selected in the proposed RTI algorithms described in Sections 3 and 4. As shown in Table 4-3, vitamin E was only retained in the 16-nutrient model and iron only remained in models with 14 to 16 nutrients. Interestingly, vitamin D remained in seven of the higher variable number models despite a high correlation with calcium, which was also retained in all models with seven or more nutrients, and vitamin D was retained in the four-nutrient model and calcium was not. Unsaturated fat seems to be important; it was included in all models with six or more nutrients. Of the positive nutrients tested individually in the RTI algorithm (vitamin A, vitamin C, vitamin B12, folic acid, and magnesium), vitamin C was the only one included in the eight- and nine-nutrient models. Magnesium, vitamin A, and vitamin B12 were retained in higher variable number models but did not add to the explanation of variance in HEI scores.

The results of the MAXR analyses show the best regression models for 1 through 16 (Tables 4-3 through 4-6) and 1 through 17 (Tables G-1 through G-4, G-7, and G-8) nutrients or components. Additional analyses were performed to examine the distributions of R2 of models and the properties of "next-best" models (i.e., those that had the next lowest R2 from the best R2 model). These analyses were conducted with the 1 through 17 nutrients or food components models that included whole grains. These results are presented in Appendix G. In brief, there was a wide distribution of R2 among the 24,310 possible eight nutrient or food component models, with models on a per 100 kcal basis having a minimum R2 of 0.21 and interquartile range of 0.44 to 0.53. We also examined the top 10 eight-nutrient or food component models to examine differences in R2, nutrients, and beta coefficients (Tables G-5 and G-6). The R2 for the top models were high and close to the maximum R2 values. All of the top 10 models included fiber, unsaturated fat, saturated fat, sodium, and added sugar. All but one of the models included protein. Whole grains were not retained in the top two models but were present in 4 or 5 of the top 10 models for both unit bases. Vitamin D replaced calcium in the second highest models for both unit bases. Some of the top 10 models had potassium, one had vitamin A, one had vitamin B12, but none included vitamin E, iron, magnesium, or folic acid. The fact that the R2 values for the top 10 models were extremely close suggests that there are a number of possible eight-nutrient or food component algorithms that would be similar in predicting dietary quality based on the HEI.

After discussions with the ASPE/FDA team, it was decided to use information from the MAXR analyses to develop final models and conduct further analyses. RTI recommended the eight-nutrient/component model, because the higher-term models did not significantly improve R2. The ASPE/FDA team requested models scored using both unit bases, per 100 kcal and per RACC.