Understanding Disparities in Persons with Multiple Chronic Conditions: Research Approaches and Datasets. 7. National Datasets and Data Systems Review


To determine which data systems and data sets can be analyzed to better improve our understanding of disparities among persons with MCC, the Project Team revisited the data systems and datasets that were reviewed for the first White Paper funded by this project, (Rezaee, 2013). Appendix E provides: 1) a description of each database, 2) diagnostic variables, 3) cost, utilization, and clinical information, and 4) the strengths, limitations and feasibility of the database for MCC research. We conducted a supplemental review of each database to assess its appropriateness for MCC disparities research and results are shown below in Exhibit 7.

Almost all of the data sources included information on patient age, gender, and race/ethnicity. The availability of other disparity-related variables varied substantially by dataset, however. For example, the Medical Expenditure Panel Survey (MEPS) collects information on patient disability status, family income, family size and employment status, in addition to age, gender, and race/ethnicity, while the National Health Interview Survey (NHIS) collects information on sexual orientation, availability of paid sick leave and length of time at current residence.

Data Source Demographic and Socioeconomic Variables Included Considerations for MCC Disparities Research
Agency for Healthcare Research and Quality
Consumer Assessment of Healthcare Provider & Systems (CAHPS) Age, Gender, Educational Attainment, Hispanic or Latino, Race/Ethnicity, Language, and Health Literacy.
  • Self-reported information; not ascribed by interviewer.
  • Sampling and data collection procedures vary by CAHPS survey type and individual users.
  • Younger patients and patients other than non-Hispanic whites have the highest survey nonresponse rates. Individual question nonresponse rates have been found to increase with patient age (Elliot et al., 2005).
Healthcare Cost & Utilization Project - Kids’ Inpatient Database (KID) Age, Gender, Race/Ethnicity, Place of Residence and Median Household Income.
  • Information derived from inpatient claims; data collection methods vary depending on local hospital and state procedures.
  • Sampling frame is limited to pediatric discharges from community, non-rehabilitation hospitals in participating HCUP partner states.
  • Some hospitals and HCUP State Partners do not supply certain patient demographic information; for example, race is missing on 15% of discharges for the 2009 KID.
Healthcare Cost & Utilization Project–Nationwide Emergency Department Sample (NEDS) Age, Gender, Urban-Rural designation, Expected Payment Sources, and Zip Code.
  • NEDS is developed using a 20% stratified sample of institutional ED discharge data; a sample of U.S. hospital-based EDs who participate in the program.
  • Information derived from inpatient claims; data collection methods vary depending on local hospital and state procedures.
  • Available patient demographic information can vary by state, such as race/ethnicity, geographic location and primary payer data.
Healthcare Cost & Utilization Project-Nationwide Inpatient Sample (NIS) Age, Gender, Race/Ethnicity, Zip Code, Expected Primary and Secondary Payment Sources, and Place of Residence.
  • Information derived from inpatient claims; data collection methods vary depending on local hospital and state procedures.
  • Some hospitals and HCUP State Partners do not supply certain patient demographic information; race is missing on 10% of discharges for the 2011 NIS. A 20% sample of the NIS is available, containing information from states/hospitals with known high quality demographic reporting. Some states have begun to collect patient language.
Medical Expenditure Panel Survey (MEPS) Age, Gender, Race/Ethnicity, Insurance Status, Marital Status, Disability Status, Family Income as Percent of Poverty Line, Employment Status, Total Income, Geographic Location, and Size of Family.
  • Self-reported information; not ascribed by interviewer.
  • Insufficient sample size is often a problem to report information by patient subgroups.
  • MEP identifies all five OMB race/ethnicity categories (White, American Indian or Alaska Native, Asian, Black or African American, and Native Hawaiian or other Pacific Islander), and a multiple race category for those who identify more than one race (SHADAC, 2009).
  • Does not provide information on immigrant groups, but additional detail on Hispanic origin so Hispanic subgroups can be disaggregated.
Center for Disease Control and Prevention
Behavioral Risk Factor Surveillance System (BRFSS) Age, Gender, Race/Ethnicity, Hispanic vs. Latino, Military Status, Insurance Status/Type, Educational Obtainment, Disability Status, Income, Household Size, Employment Status, Household Income, Zip Code, and Own vs. Rent Home Status.
  • Self-reported information; not ascribed by interviewer.
  • The BRFSS provides several race variables, allowing researchers to choose one race category with multiple races or a recode that allocates multiple race individuals to a race category based on self-identified preferred race; does not identify place of birth or immigrant group (SHADAC, 2009).
  • State age, gender and race data are compared to census data on a monthly basis to ensure data accuracy and catch potential coding mistakes; considered to be more valid and reliable compared to other household surveys (Mokdad, 2009).
National Ambulatory Medical Care Survey Age, Gender, Race/Ethnicity, and Place of Residence
  • Data for a systematic random sample of visits are recorded by the physician or office staff on an encounter form.
  • Provides ability to study nationally representative populations over the age of 18, by gender, and three racial/ethnic categories: 1) White, 2) Black, and 3) Other
  • Subject to non-sampling errors, including reporting and processing error, and biases due to nonresponse and incomplete data. In 2010, race data were missing for 24.9% of visits and ethnicity data from 23.3% of visits (CDC, 2012).
National Health Interview Survey (NHIS) Age, Gender, Sexual Orientation, Employment Status, Type of Employment, Employment-related Activities, Size of Business, Paid by Hour or Salaried, Paid Sick Leave, Multiple Job Held Status, and Time at Current Residence.
  • Self-reported information; not ascribed by interviewer.
  • The NHIS provides several race variables, allowing researchers to choose one race category with a residual multiple race category or a recode that allocates multiple race individuals to a race category base on self-identified preferred race; only public use data set with expanded race variables for Asian subgroups (SHADAC, 2009).
  • Distinguishes individuals U.S. born from those born in 10 broad global regions including a residual foreign-born category (SHADAC, 2009).
National Health and Nutrition Examination Survey (NHANES) Age, Gender, Race/Ethnicity (including subgroups), Language, Educational Attainment, Marital Status, Health Insurance Status, Veteran Status, Occupation, Employment Status, and Income.
  • Self-reported information; not ascribed by interviewer.
  • Low-income persons, adolescents 12-19 years of age, persons 60 years of age and over, African Americans, and persons of Mexican origin are purposely oversampled.The sample is not designed to provide nationally representative estimates for the population of U.S Hispanics; the survey is not geographically representative. Able to distinguish Mexican from other Hispanic and non-Hispanic individuals (SHADAC, 2009).
  • For most estimates by race and ethnicity, 3 years of NHANES data is needed to obtain an adequate sample size. Many of the results of the NHANES that are reported are still limited to reports of only whites, blacks, and Mexican Americans because of constraints of sample size (Anderson et al., 2004).
Centers for Medicare & Medicaid Services
Medicare Claims Age, Gender, Race/Ethnicity, Geographic Location (including mailing zip code), Dual Eligibility Status, and Medicare Enrollment Dates.
  • Often based on administrative observation or a clinical employee’s observation.
  • Race/ethnicity codes for White and Black Medicare beneficiaries are fairly accurate, but the codes for the other categories are much less so. The Hispanic race/ethnicity codes capture one-third beneficiaries who identify as being from Hispanic/Latino origin (Waldo, 2005).
  • Race/Ethnicity misclassification is most prevalent for Asian and American Indians/Alaskan Natives; most minority groups are misclassified as whites (McBean, 2004).
Medicaid Claims Age, Gender, Race/Ethnicity, Marital Status, Insurance Type, Dual Eligibility Status, Geographic Location, and Enrollment Dates.
  • CMS does not provide instructions to state programs on how race/ethnicity information should be collected and coded. As a result, some states may rely on the observations of eligible workers, while other use self-reported data from applicants (Kronick et al., 2007).
  • Significant amount of missing demographic information; in 2003 race and Hispanic ethnicity data were listed as “unknown” for more than 20% of Medicaid individuals in New York, Rhode Island and Vermont (McAlpine et al., 2007).
CMS Chronic Condition Warehouse Age, Gender, Race/Ethnicity, Insurance Type, Dual Eligibility Status, Age, Preferred Language, Marital Status, Zip Code, Primary Payment Source.
  • In addition to Medicare claims race/ethnicity coding, the warehouse contains the Research Triangle Institute (RTI) Race Code. This code provides enhanced race/ethnicity designation based on an algorithm that analyzes a beneficiary’s first and last name (CMS, 2013).
CMS Medicare Provider Analysis and Review (MedPAR) File Age, Gender, Race/Ethnicity, and Geographic Location.
  • Information obtained from inpatient hospital and Skilled Nursing Facility final records.
  • Race information is present for nearly all MedPAR discharges (Barrett et al., 2010).
  • Race/ethnicity categories prior to July 1994 included: White, Black, Other and Unknown; 1994 to present, race/ethnicity categories include Asian or Pacific Islander, Hispanic, Black (not of Hispanic origin), American Indian or Alaskan Native, White (not of Hispanic Origin), Other or Unknown.
Medicare Health Outcomes Survey Gender, Age, Race/Ethnicity, Educational Attainment, Marital Status, Annual Household Income, English Language Skills, Household Size, and Place of Residence.
  • Self-reported information; not ascribed by interviewer.
  • Subject to small sample sizes for patient groups, resulting in the need for data aggregation.
  • Provides ability to study Hispanic/Spanish subgroups (i.e. Cuban, Puerto Rican) and an extended number or race/ethnicity categories (i.e. Korean, Samoan, Japanese).
HMO Research Network Age, Gender, Race/Ethnicity, Insurance Type, Hispanic vs. non-Hispanic, Educational Attainment, Employment Status, Geographic Location, and Income.
  • Health plans employ a variety of different strategies to collect demographic information on their enrollees; both indirect and direct methods are utilized. A significant percentage of health plans do not collect disparity-related demographic data at this time (AHIP-RWJF, 2006).
  • Electronic abstraction of race from progress notes in electronic medical records is possible, but subject to limitations (i.e. spelling, abbreviations) (Roblin et al., 2010).
National Institute on Aging Age, Gender, Race/Ethnicity, Insurance Type, Hispanic vs. non-Hispanic, Educational Attainment, Employment Status,income, assets, housing status. Non-Hispanic Blacks are oversampled
  • Self-reported information
  • Small sample size (N=8,000)
  • Allows assessment of functioning/ ability to perform valued activities of daily living/environmental/social adaptations made to allow independent/safe living.
State All Payer Claims Databases (general) Age, Gender, Race/Ethnicity, Insurance Type, Marital Status, and Geographic Location.
  • Demographic and SES information collected for patients differs by state; non-standardized collection procedures.
Health and Retirement Study Age, Gender, Race/Ethnicity, Educational Attainment, Disability Status, Language, Marital Status, Occupation, Employment Status and Income.
  • Self-reported information; not ascribed by interviewer.
  • Uses a national area probability sample of U.S. households with supplemental oversamples of Blacks, Hispanics and residents of the state of Florida. Complete data on longitudinal socioeconomic experiences for specific metrics (Hayward).
Health insurer databases Race/Ethnicity
  • Kaiser and Aetna are moving towards self-reported race/ethnicity, with Aetna achieving about 30% reporting and Kaiser achieving about 60-70% reporting due to greater integration with providers.
  • Data may only be available to researchers within each plan, and results may only be applicable within the plan


View full report


"rpt_ResearchAddressing.pdf" (pdf, 1.34Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®