Policy Research for Front of Package Nutrition Labeling: Developing and Testing a Summary System Algorithm. 2.5 Validation of Summary Systems to Rank Foods in Nutritional Quality

05/01/2011

The optimal method of assessing validity of a summary system to profile the nutritional quality of foods would be to test for criterion validity, that is, to compare the new method with a gold standard. However, no gold standard for profiling the nutritional quality of foods exists; thus, researchers must opt for other methods of validation. Construct validity measures how well the measure relates to its theoretical concept, that is, how well the summary system relates to other measures of the nutritional quality of foods.

Approaches to construct validity of summary systems have focused on nutritional ratings of the foods themselves or on overall dietary quality. In construct validity of foods themselves, foods are rated using the summary system, and the results are compared with results from other measures of nutrition quality of foods, such as expert ratings or authoritative recommendations. In construct validity focused on overall diet quality, foods are rated using the summary system and aggregate results for diets are compared with other measures of overall diet quality, such as the HEI.

Convergent and discriminant validity are two types of construct validity that have been used in testing FOP systems:

  • Convergent validity tests that constructs that are expected to be related are, in fact, related.
  • Discriminant validity tests that constructs that should not be related are not related.

The application of convergent or discriminant validity tests to a summary system will differ, depending on whether the summary system is a scoring or a threshold system and whether the validation is focused on foods themselves or on overall diet quality.

Comparing food rankings by the nutrient profiling system with the food rankings by expert opinion is relatively easy and inexpensive to conduct, but the relative importance of this method to validate the accuracy of a nutrient profiling system was deemed to be low-medium (Townsend, 2010). The subjective nature of expert opinion is a disadvantage of using this method to validate systems. Rankings by professionals could be biased by the nutrient information provided to them, as well as by the food descriptions (Scarborough, Rayner, Stockley, & Black, 2007).

In one study, to assess the characteristics of expert food ratings, 850 nutrition professionals from the British Dietetic Association and the (British) Nutrition Society were asked to rank 120 foods in nutritional quality (Scarborough, Rayner et al., 2007). The experts were given the nutrient values for 10 nutrients for each food. The average rankings and standard deviations for each food were calculated and grouped into food categories based on the UK food guide "The Balance of Good Health" (BGH). The results were that the average expert rankings of foods were in accordance with the guidance in the food guide; that is, the highest average ranks were attained by foods in the fruit and vegetable group, and the lowest average ranks were attained by foods in the foods high in fat, foods high in sugar group. The composite foods showed the highest variance in ranks by nutrition professionals reflecting difficulty in categorizing these foods. Most of the variation in scores was explained by providing the nutritionists with nutrient values for fat, total sugars, sodium, and nonstarch polysaccharides. The nutritionists were influenced by food descriptions; for example, "wholemeal fruit crumble" was ranked slightly higher than "apple, stewed with sugar" despite the fact that the crumble contained more sugar, fat, and saturated fat per 100 g than the stewed fruit.

The scores of foods by these experts as described previously (Scarborough, Rayner et al., 2007) were also compared with the categorization of foods as "healthy" and "less healthy" by the Ofcom/WXYfm model, a threshold system (Scarborough, Boxer et al., 2007). Researchers found a strong relationship between quintiles of food scores by the experts and categorization of foods by the Ofcom/WXYfm model (χ2 = 64.8). They also compared the rankings of foods by the experts and the model and found that the ranking of 11 out of 120 foods differed by 40 or more positions in rank.

Expert rankings of foods were used during the development of the Overall Nutrition Quality Index (ONQI), an algorithm that is the basis for the NuVal system, a scoring system. Members of the scientific expert panel were asked to rank approximately 1,000 foods and correlation analyses were used to compare the expert ranks to rankings produced by the ONQI algorithm (Katz et al., 2009). Any apparent anomalies were examined and the ONQI algorithm was adjusted as needed. The final version of the ONQI algorithm was highly correlated with the pooled expert panel ranking of 21 diverse foods (Spearman rank correlation coefficient 0.92, p < 0.001).

Arambepola et al. (2007) tested for convergent validity of the Ofcom/WXYfm model, a threshold system. The researchers used the model to categorize foods consumed by adults in the British National Diet and Nutrition Survey as healthy or less healthy and compared the categorized foods to food group recommendations in the BGH, the authoritative UK food guide. The model classified as healthy 97% of fruit and vegetables and 72% of bread, other cereals, and potatoes as classified by the BGH. In addition, 95% of fatty and sugary foods as classified by the BGH were classified as less healthy by the model. The к-value for the level of agreement between the model and BGH in categorizing these foods was 0.69, considered good agreement.

In the UK, indicator foods derived from a healthy eating index were used in another validation of the WXY model. First, indicator foods were identified using national dietary surveys by categorizing the population into quintiles of the healthy eating index and identifying foods that were eaten in statistically different amounts by individuals in the first and fifth quintiles of healthy eating index scores (Volatier et al., 2007). Then, the validation tested the ability of the Ofcom/WXY model, a threshold system, to correctly classify the indicator foods (Quinio et al., 2007). The WXY model identified 73.7% of indicator foods as healthy that were classified as healthy by the reference index.

Some FOP or nutrient profiling systems have tested convergent validity to examine the relation between the healthiness of diets measured by algorithm scores of the foods consumed and the healthiness of diets measured by an overall diet quality score (Fulgoni et al., 2009; Katz et al., 2010).

Fulgoni et al. (2009) assessed convergent validity of a nutrient density index, a scoring system, by calculating the mean nutrient density scores of foods consumed by participants in the National Health and Nutrition Examination Survey (NHANES) 1999-2002. The calculated scores were regressed against HEI scores, an indicator of diet quality, in the NHANES sample. For a nutrient density index that included nine positive nutrients and three negative nutrients to avoid, the regression model explained 45.3% of the variation in HEI scores in the NHANES sample (Fulgoni et al., 2009). This validation also examined scores of foods themselves within food categories. Whole grain products scored higher than nonwhole grain foods, fruits with less added sugar scored higher than those with more added sugar, and 100% fruit products scored higher than soft drink choices. This agreement between food scores and dietary recommendations indicated that an across-the-board index can be useful for ranking foods within food categories.

Katz et al. (2010) also validated a scoring system, the ONQI algorithm, by calculating mean scores of foods consumed in a national survey, using dietary intake data from NHANES 2003-2006. The calculated ONQI scores were regressed against HEI scores in models adjusted for age, race, and gender. The beta coefficients for ONQI scores to predict the HEI scores were significantly different from zero (p < 0.001), and the model explained 29% of the variation in HEI scores in the NHANES sample (Katz et al., 2010).

Relatively few researchers have used discriminant validity with respect to FOP systems or nutrient profiling. The Ofcom/WXY model was used to define healthy and less healthy foods consumed by adults in the National Diet and Nutrition Survey (Arambepola et al., 2007). An overall score of dietary quality was calculated using the Diet Quality Index (DQI). The validation compared the energy intake from "less healthy" foods, as defined by the Ofcom/WXY, among two groups of adults in the survey - those with the least healthy and most healthy diets according to the DQI. As predicted, the group with the least healthy diet by DQI had higher energy intake (by about a factor of two) from less healthy foods as defined by the Ofcom/WXY model, compared with those with the healthiest diets according to the DQI.

For the ONQI algorithm, mean scores of foods in a 7-day meal plan from the Dietary Approaches to Stop Hypertension (DASH) diet, representing a "healthy diet," were compared with the mean scores of foods in the typical American diet using data from NHANES 2003-2006 (Katz et al., 2009; Katz et al., 2010). The aggregate ONQI score for the 7-day DASH meal plan at the 2,300 mg sodium level was 46 (95% CI, 40-53) and for the typical American diet in the NHANES cohort was 26.5 (95% CI, 26.2-26.7). As expected, the aggregate score for the healthy diet (DASH) was significantly higher than the score for the typical diet (NHANES) (p < 0.05).

The ONQI algorithm was also validated with chronic disease outcomes, including CVD, cancer, and diabetes, by scoring diets from over 100,000 participants in Harvard's Nurses' Health Study and the Health Professionals Follow-Up Study (Chiuve, Sampson, & Willett, 2011). This validation approach could be considered a type of criterion validity as defined by Townsend (2010). Dietary data were collected from food frequency questionnaires that were administered to subjects at baseline. Each food was scored by the ONQI algorithm, and the average ONQI score for the diet consumed by each participant was computed. The ONQI score was inversely associated with risk of total chronic disease, CVD, diabetes, and all-cause mortality, but not cancer, in both cohorts. The multivariate relative risk of chronic disease, comparing the highest to lowest quintile of ONQI scores, was 0.91 (95% CI: 0.87-0.95) in women and 0.88 (95% CI: 0.83-0.93) in men. A limitation of the study is that the diet information is for only one point in time.