Understanding the High Prevalence of Low-Prevalence Chronic Disease Combinations: Databases and Methods for Research. Defining Diagnosis of Chronic Condition


There are two main sources of information about patients’ chronic conditions: 1) surveys that collect self-reported disease status, and 2) claims and clinical systems that contain diagnosis codes (e.g., International Classification of Disease, 9th edition [ICD-9], ICD-10, Systematized Nomenclature of Medicine Clinical Terms [SNOMED CT]). Other sources of information, such as pharmaceutical prescription or laboratory data, can also be used to identify patients’ chronic conditions. However, these additional modalities are not thoroughly discussed in this paper.

MCC research has been conducted using both primary sources of diagnostic information noted above. For example, Schoenberg and colleagues analyzed Health and Retirement Study (HRS) data to understand the relationship between chronic disease constellations and out-of-pocket medical expenditures. In the study, chronic conditions were identified using eight self-reported chronic conditions from the HRS (Schoenberg et al. 2008). Similarly, Bae and Rosenthal used 177 ICD-9 codes derived from self-reported chronic conditions from the Medical Expenditure Panel Survey to study MCC and quality of care (Bae & Rosenthal, 2008). Conversely, Sorace et al., used approximately 3,000 ICD-9 codes derived from the HCC model to study the complexity of disease combinations in the Medicare population (Sorace et al., 2011).

There are strengths and weaknesses of self-reported versus claims-based information for identifying chronic conditions (See Exhibit 9). Claims-based diagnosis codes allow researchers to study a large number of chronic conditions at a very fine level of granularity and to understand the full range of patients’ diagnoses, including which specific diagnoses are present ( e.g., primary malignant neoplasm of the lung or carcinoma in situ of the lung vs. simply lung cancer). Sensitivity is critically important in enabling the study of less prevalent or rare chronic disease combinations. Claims are usually provider-generated and based on a differential diagnosis and supporting clinical documentation, eliminating potential error associated with patient self-reported information and other survey-related biases, such as recall and selection concerns. However, there are systematic limitations associated with ICD-9 codes, such as misspecifications, unbundling, and upcoding by providers and coders (O’Mailley et al., 2005). There is also a tendency for providers and billers to under-report diagnoses that lack payment incentive, such as mental health conditions. These issues can lead to inaccurate estimates of chronic disease prevalence and imprudent results. Diagnosis coding using ICD-9 and ICD-10 codes has also been shown to misestimate the prevalence of certain conditions.

Exhibit 9: Strengths and Weaknesses of Self-Reported versus Claims-Based Chronic Conditions

Coding Type Strengths Limitations
Self-Report Easy to collect, used to identify prevalent conditions, patient-derived. Subject to recall, sampling and selection bias. Few diagnoses studied and at a coarse level of granularity. Limited number of patients surveyed/studied.
ICD-9 A large number of diagnoses are considered at a fine level of granularity. Commonly used in the United States. Used in large administrative databases; large sample size. There are a number of well documented limitations, such as over and underestimation of certain diseases, as well as inaccuracies due to malicious coding behavior
ICD-10 Associated with improved coding accuracy. Greater number of diagnoses considered and at a more granular level. Used in large administrative databases; large sample size. Not in widespread use in the United States and won’t be for a number of years. Limited research available on coding inaccuracies and other shortcomings.
SNOMED CT Greatest number of diagnosis codes considered at the finest level of granularity. Limited research available on coding inaccuracies and other shortcomings. Potentially too granular for use in certain healthcare settings.

Underestimation is a concern when a significant proportion of the population may not have a claim during the study period; overestimation may occur for conditions that lead to higher payment rates if they are reported as being present. Woo et al. found that obesity identified by discharge ICD-9 codes underestimated the true prevalence of obesity in an inpatient pediatrics population (Woo et al., 2009), while Kern et al. found that ICD-9-CM codes failed to identify the majority of veteran patients with comorbid chronic kidney disease (Kern et al., 2005). ICD-10 codes have also been shown to overestimate the prevalence of certain diagnoses, such as post-traumatic stress disorder (Rosner & Powell, 2009). However, recent evidence suggests that the introduction and use of ICD-10 coding may be associated with improved accuracy of co-morbidity coding for the majority of clinical conditions (Januel et al., 2011). It is unclear whether the improvement is due to the ICD-10 coding system itself or changes in coder and physician behavior.

Self-reported diagnoses from surveys or those that are mapped to ICD-9 or ICD-10 codes from surveys provide a much smaller number of chronic conditions for analysis, at a very coarse level of detail. Typically surveys do not include the breadth of chronic conditions a patient has or the specific types of chronic conditions (e.g., a specific type of cancer). For example, the HRS only allows researchers to investigate eight chronic conditions (hypertensions, diabetes, cancer, chronic lung disease, heart conditions, arthritis, stroke and psychiatric/emotional problems) and it does not allow them to drill down to what specific types of conditions a patient has (e.g., what type of cancer?). Thus, the use of surveys limits the ability to understand the true complexity of chronic disease combinations a patient is experiencing as well as the occurrence of less prevalent chronic conditions. In addition, self-reported diagnoses can be limited due to survey-related biases, such as recall, ascertainment and selection bias. For example, those individuals who avoid or who do not have access to healthcare may not be evaluated for potential chronic conditions of interest. Although evidence suggests that self-reported chronic conditions may be reasonably valid (Martin et al., 2000), self-reported diagnoses are not provider generated, may be subject to recall error by patients, and may not be captured in a sufficiently structured and systematic manner for analysis. Biases in self-reported diagnoses may be reduced through survey question structure; many surveys typically ask patients, “Has the doctor told you….?”. Overall, self-reported conditions can lead to non-uniform and inaccurate diagnosis categories and errors when mapping self-reported information to ICD-9 or ICD-10 codes.

In addition to the considerations described above, it is also important to note that validity of the presence of chronic conditions and reliability of reporting/detecting chronic conditions are two key issues that challenge MCC research. Researchers have attempted to improve validity by examining diagnoses across care settings and determining if patients have two or more claims reporting a specific diagnosis code over a given period of time to confirm disease occurrence. However, validity and reliability will remain a challenge given the vastness and complexity of many of the large databases and systems used to collect and analyze diagnostic information.

It is important to recognize that the trajectory of diagnosis coding in the United States is moving away from ICD-9 codes and towards larger, more detailed coding schemes, such as ICD-10 and SNOMED. In fact, on January 16th, 2009 the Department of Health and Human Services published a final rule specifying an anticipated ICD-10 implementation date of October 1, 2013 (although this may be delayed). The World Health Organization (WHO) has already begun work on developing ICD-11. It is inevitable that diagnosis coding will continue to become more refined over time, providing researchers with the ability to study disease complexity at a level of detail not currently possible. Although “new” coding schemes will improve our ability to identify specific diagnoses of individuals with MCC, they will have some limitations.

The transition from ICD-9 to ICD-10, as well as to other future coding schema, will present challenges to researchers. During coding transition periods back-coding ICD-10 codes to ICD-9 and forward-coding ICD-9 codes to ICD-10 will be necessary for longitudinal analyses and comparative investigations. ICD-9 based indexes and measures, such as the Charlson Comorbidity Index and AHRQ’s Patient Safety Indicators, will also need to be translated to ICD-10 systems to support their continued use. There may be a “lag time” associated with re-specifying these tools, which researchers will need to be aware of. Additionally, there will most likely be a “testing” period after new coding systems are implemented, as researchers will need to explore the nuances and limitations of new systems prior to conducting analyses (Iezzoni, 2010). Researchers may also need to observe a data “black out” period as clinicians learn, perfect and then settle into new coding behaviors associated with the transition to ICD-10 (Januel et al., 2011). This “black out” period may also be needed by individual health systems and providers. The transition from ICD-9 to ICD-10 in the United States will not be smooth and universal. Health systems and providers will “go live” with ICD-10 at various points in time with different levels of success.

Despite the challenges, more refined coding systems will greatly enhance our ability to conduct research on less prevalent combinations of MCC. New coding systems will provide a very detailed level of diagnostic information.

View full report


"rpt_LowPrevMCCData.pdf" (pdf, 1.37Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®