Understanding the High Prevalence of Low-Prevalence Chronic Disease Combinations: Databases and Methods for Research. Data Aggregation and Grouping Systems


Grouping systems, such as AHRQ’s clinical classification system and CMS’s Hierarchical Conditions Categories, are used to organize and aggregate diagnosis codes into different disease categories. These systems serve a variety of different purposes (e.g., research, risk-adjustment, etc.) and vary significantly in terms of which clinical conditions are considered and the number of diagnosis codes that are included in each disease group, as well as the number of groups (See ICD-9 Comparison Excel File). Regardless of their original intent or grouping methodology, however, many different types of grouping systems have been used to conduct MCC research, raising concerns about interpreting research results and comparing findings across MCC studies.

The decision to use specific grouping systems for MCC research should be informed by four key considerations: 1) the function, purpose and original intent of the grouper, 2) the behavior change that is desired by using the grouper to produce actionable information, 3) the end-users and their data needs ( e.g., data granularity), and 4) the research question. Researchers should not assume that a grouping system designed by and for one stakeholder group for one purpose is appropriate for another purpose. In fact, none of the currently available groupers are meant to serve multiple purposes (e.g., clinical decision support and risk-adjustment). Grouping systems are carefully designed and statistically calibrated to serve a specific aim. Using a grouping system for a different aim than intended can lead to meaningless results and misguided interpretation. MCC research which aggregates diagnosis codes should use grouping systems that are well documented, produce useful information for end-users (e.g., fine granularity for clinical decision support), and provide information that is meaningful, actionable and promotes provider behavior change (e.g., to reduce cost or improve care for specific groups). Grouping systems should be in alignment with the research questions at hand; research questions should ultimately drive MCC research designs (Wallace & Salive, 2013).

In choosing which grouping system to use for MCC research, stakeholder agendas matter. Each stakeholder group needs different types of information at varying levels of granularity. For example, those interested in clinical decision support needs a finer level of diagnostic information than risk-adjusters. Similarly, healthcare economists may need more detailed data than public health interventionists. Thus, it is important to consider the degree of coding granularity needed by each stakeholder. Understanding which stakeholder aims can be supported at specific levels of diagnostic granularity may be a beneficial area for investment for MCC researchers.

To determine which clinical classification systems exist and have been used for MCC or disease complexity research, a comprehensive grouping systems review was conducted. Grouping systems were identified through the literature review as well as input form the Co-Project Officers, TAG and key informants. Full descriptions of each classification system and the methodological issues to consider when using the grouper can be found in Appendix C. A condensed version of the results is shown in Exhibit 10 below.

Exhibit 10: Summary of Diagnostic Grouping Systems

Grouping System Sponsor Level of Diagnosis Aggregation Number of ICD-9 Codes Included
Legend: Sponsor: agency, organization or company that maintains the grouping system; Level of Diagnosis Aggregation: the number of chronic condition categories included in the grouping systems; Number of ICD-( Codes: Grouping systems that are proprietary do not make ICD-9 codes available for public review
Adjusted Clinical Groups Case-mix System (ACG) Johns Hopkins University 102 discrete categories Proprietary
Aggregated Diagnosis Groups (ADG) Johns Hopkins University 32 discrete categories Proprietary
All Patient Refined Diagnosis Related Groups (APR-DRG) 3M Health Information Systems 314 base categories and 1256 subclasses Proprietary
Chronic Conditions Data Warehouse Algorithm Centers for Medicare & Medicaid Services 27 chronic condition categories 581
Chronic Illness Disability Payment System (CDPS) University of California, San Diego/Medicaid Programs 96 categories of diagnoses that correspond body systems and specific diagnoses 11603
Clinical Classification System (CCS) Agency for Healthcare Research & Quality 285 mutually exclusive categories 14567
Clinical Risk Groups (CRG) 3M Health Information Systems 272 clinically-based categories and 1,080 subclasses Proprietary
Diagnosis Related group (DRG) Centers for Medicare & Medicaid Services 538 categories Not Specified
Dyani Diagnosis Grouper Axiomedics Research, Inc. 200-300 categories depending on the criteria being examined Proprietary
Hierarchical Condition Categories (HCC) Centers for Medicare & Medicaid Services 70 CMS-HCC categories 2916
International Shortlist for Hospital Morbidity Tabulation (ISHMT) World Health Organization 130 categories Not Specified
Major Diagnostic Categories Health Level Seven International 25 categories Not Specified
Medicare Severity Diagnosis Related Grouper (MS-DRG) 3M Health Information Systems 745 categories Proprietary
Thomson Medstat Medical Episode Grouper Thomson Medstat Inc. 550 disease conditions Proprietary

We reviewed fourteen grouping systems which were found to serve a variety of different purposes ranging from risk adjustment to comparing morbidity across hospitals internationally. The grouping methodologies of the systems are remarkably different and vary in level of complexity. For example, diagnosis aggregation ranged from 25 categories for the Major Diagnostic Categories to 272 clinically-based groups with 1,080 subclasses for 3M’s Clinical Risk Groups. The difference has a dramatic consequence for the number of disease combinations that can be explored by researchers because the number of combinations (without replacement) scales as per the following formula: C(n,k)=n!/k!(n-k)! (Ammann 2011). In this formula “C” is the number of disease combinations, “n” is the number of disease groups in the grouping system, “k” is the number of disease groups included in the calculation, and “!” stands for factorial. Applying the formula to the Chronic Illness Disability Payment System (CDPS) for two-way disease combinations would result in the following calculation: C(n,K)=20!/(2!)*(18!); or 190 disease combinations could be studied. Using the same formula, but with three-way and four-way combinations, the CDPS model would provide 1,140 and 4,845 disease combinations respectively.

As shown in Exhibit 11 (logarithmic scale), the number of disease combinations for analysis increases rapidly as the number of chronic condition categories and number of diseases that are included in the combinations are increased. Thus, grouping systems with more chronic condition categories (greater “n”) will generate more chronic disease combinations (“C”) for analysis, especially when the number of diseases allowed in the disease combination calculation (“k”) is not truncated at an arbitrary level (i.e. calculate dyads or triads and then truncate at four or more diseases).

The number of diagnosis codes included in each grouping system could not be evaluated across all systems because the information is proprietary for privately owned grouping systems. The lack of transparency represents a methodological limitation and bias for researchers, as they cannot know which diagnoses were included in analyses and therefore assess the level of complexity captured by the grouping system. Despite their differences, the majority of groupers have been used in some form of multimorbidity research to-date. For example, Sorace and colleagues used the HCC model to study complexity in Medicare patients, while Salisbury and colleagues used John’s Hopkins ACG system to study general practice patients and Steinman and colleagues used the CCS to study VA patients (See Exhibit 5 in Section 6).When interpreting published MCC literature as well as designing future MCC research, the methodological differences between grouping systems should be reviewed and considered. For example, grouping systems that provide the finest level of diagnostic information and the greatest number of chronic condition categories, such as AHRQ’s CCS, would be most appropriate for research on less prevalent chronic disease combinations.

Exhibit 11: Possible Number of Chronic Disease Combinations by Diagnosis Grouping System

Exhibit 11: Possible Number of Chronic Disease Combinations by Diagnosis Grouping System

It is also important to note that many MCC researchers have designed and employ their own groupers or modify an existing grouper which affects the methodological quality of results. Decisions to include, exclude or aggregate diagnoses often are not reported in author’s methodology sections. Authors may state that the decisions were guided by physician consensus or technical expert panels, but do not list specific diagnosis codes that were included or excluded. The impact of grouping algorithms on other analysis steps and how they may affect the interpretation of results are also missing from studies. For example, authors do not discuss how costs are allocated to disease categories after eliminating certain diagnosis codes from analyses, nor the percentage and types of patients that are excluded from a study.

Consequently, researchers are creating unique diagnostic categories that may be fundamentally different from one another making it difficult to interpret how one researcher’s disease category for “cancer” compares to another. If researchers utilized publicly available, well documented grouping systems (standardization) such as AHRQ’s CCS, the challenges of interpreting results across studies would be minimized. However, it is not practical and may not make clinical sense to use only publicly available grouping systems. For example, some diagnosis codes may warrant exclusion from analyses because they are ambiguous (physician consensus does not yet exist on the diagnostic criteria for a particular condition) and over time grouping systems will become obsolete as new coding systems are adopted ( e.g., ICD-10) and new, more robust groupers are developed. Regardless of the future of grouping systems in MCC research, providing researchers and readers with the ability to understand how disease categories are constructed across studies will help make methodologies more transparent and results more interpretable.

View full report


"rpt_LowPrevMCCData.pdf" (pdf, 1.37Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®