Understanding the High Prevalence of Low-Prevalence Chronic Disease Combinations: Databases and Methods for Research. 7. Data Systems and Datasets Review (Study Question #3)


What data systems and data sets exist that can be analyzed to better improve HHS’s understanding of and approaches to addressing numerous less prevalent combinations of chronic conditions? To answer the question we conducted a comprehensive review of data systems and datasets that were identified through the literature review, as well as input from the Co-Project Officers, TAG and key informants.

Overall, 17 data sources were reviewed and specific criteria were used to evaluate the appropriateness of each data source for use in less prevalent MCC research. The full data systems and datasets review is contained in Appendix B. A small set of excerpted data are shown in Exhibit 13.

Exhibit 13: Excerpt of Data Systems and Datasets Review

Data Type Description Less Prevalent MCC Research Considerations
Medicare Claims
  • CMS Chronic Conditions Warehouse, MedPAR, raw Medicare claims data.
  • Nationally representative, but only for Medicare population.
  • Large sample size.
  • Longitudinal Diagnoses can be aggregated at various levels.
  • Potentialconcerns for claims accuracy and sampling algorithms.
  • Appropriateness for Less Prevalent MCC Research: Strong
  • Provides ability to study less prevalent MCC due to sample size, longitudinal design and diagnosis coding granularity.
  • NIS, KID, and NEDS.
  • Nationally representative, all-payer data source.
  • Large sample size, NIS represents 20% of United States hospitals as specified.
  • Larger versions of the NIS are also available that represent >90% of hospitals.
  • Longitudinal. Diagnoses can be aggregated at various levels.
  • Appropriateness for Less Prevalent MCC Research: Strong
  • Provides ability to study less prevalent MCC due to sample size, longitudinal design and diagnosis coding granularity.
  • Not all states report the same number of diagnoses for each patient.
  • Not all states capture unique patient identifiers; not all patients can be tracked across hospitalizations to identify all chronic conditions.
Medicaid Data
  • MAX
  • Nationally representative, but only for Medicaid population.
  • Longitudinal.
  • Large sample size.
  • Appropriateness for Less Prevalent MCC Research: Strong
  • Provides ability to study less prevalent MCC due to sample size, longitudinal design and diagnosis coding granularity.
Survey/ Questionnaire Data
  • Nationally representative, but contain unique limitations.
  • Small sample size.
  • Either cross-sectional or longitudinal.
  • Limited number of diagnoses studied; limited granularity of data.
  • Appropriateness for Less Prevalent MCC Research: Weak
  • Limited number of conditions investigated, reduced granularity, small sample size, often cross-sectional and focused on common conditions only
  • VA, IHS, disease registries, state all-payer claims registries
  • Non-nationally representative.
  • Longitudinal.
  • Appropriateness for Less Prevalent MCC Research: Moderate
  • Not appropriate for nationally-focused research

As discussed above, in general, research on less prevalent combinations of MCC can be most appropriately conducted using Medicare Claims, Medicaid Claims or HCUP data. These data sources are nationally-representative, longitudinal to capture the accumulation of diagnoses over time and contain a fine level of diagnostic codes. Other healthcare claims-based datasets such as state all-payer claims registries or Veteran Affairs data, are also good sources although they would not produce nationally-representative results and may be generalizable only to the specific populations included.

Survey or questionnaire data, while useful for certain types of MCC research, are limited because they include a small, select list of chronic conditions; typically not less prevalent conditions. Furthermore, diagnosis information from these data sources is often at a gross level of detail that inhibits the ability to study specific chronic conditions. For example, the Behavioral Risk Factor Surveillance System asks respondents about 15 conditions: myocardial infarction, coronary heart disease, skin cancer, other cancer, chronic obstructive pulmonary disease, depression, kidney disease, vision impairment, diabetes, and HIV/AIDS (CDC, 2011). Not only is the list not comprehensive, it also doesn’t capture information on the specific type of condition (i.e. what specific mental illness or cancer does the person have?). There is also the issue of respondents not specifying all of the chronic conditions they may have when interviewed or surveyed perhaps due to reluctance to divulge a specific condition. To ensure completeness, major national surveys were included in our review.

To enhance data richness and the ability to understand drivers of healthcare cost in addition to diagnostic information, researchers are able to link or match many of the datasets contained in our review. For example, Health and Retirement Study data can be linked with Medicare claims to better articulate the relationship between patient medical history, financial status, age, diagnoses and healthcare costs (ResDAC, 2013). Although data linking may improve data quality and robustness for specific variables, most linked datasets will not be advantageous for research on low-prevalence MCC due to small sample sizes and limited diagnostic information. However, these types of linked datasets may be an important source of information for future study on more prevalent chronic disease combinations. Linking claims to other Medicare data sources is one way to overcome limitations related to small sample sizes. For Medicare beneficiaries who are nursing home residents, linking Medicare claims data to the Minimum Data Set (MDS) assessment tool is possible. The MDS is part of the federally mandated assessment of all residents in Medicare or Medicaid certified nursing homes and contains items that measure physical, psychological and psychosocial functioning. Linking the MDS to claims data would permit in-depth analysis of how MCC patterns differ based on patient characteristics and also support analysis of the relationship between MCC and patient outcomes, at least for Medicare beneficiaries in nursing homes. Linking claims data to the Outcome and Information Assessment System (OASIS) would allow similar exploration for patients receiving home health services.

It is important to note that additional data sources may become available in the future that will be appropriate for research on less prevalent combinations of MCC. New data sources may include electronic healthcare record based registries, large employer databases, managed care patient registries, practice-based network data, and other data sharing and collection initiatives. Descriptions of these other potential data sources were not included in our review.


View full report


"rpt_LowPrevMCCData.pdf" (pdf, 1.37Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®