Developing Quality Measures for Medicaid Beneficiaries with Schizophrenia: Final Report. 6. Pilot-Test Measures to Assess Usability, Validity, and Reliability


To assess the usability and scientific acceptability of the measures, we examined the distribution, content and convergent validity, and test-retest reliability of the candidate measures using MAX data from 2007 and 2008. Use of MAX data permits real-world assessment of measure usability for state Medicaid officials. At the same time, operationalization of quality measures in Medicaid claims data provides an opportunity to retrospectively assess measure validity by correlating measure performance with outcomes such as schizophrenia-related hospitalization and emergency department (ED) use. The MAX data are standardized eligibility and claims files for each state that include person-level on every beneficiary enrolled in Medicaid during the calendar year. The MAX files are created from claims data that each state submits to the Centers for Medicare and Medicaid Services (CMS).

Defining the Population

Diagnosis of schizophrenia was inferred by either a single primary inpatient diagnosis or two outpatient primary diagnoses of schizophrenia.1, 2 In response to comments from Medicaid medical directors, we modified and tested some measures to include persons with serious mental illness (SMI) defined by a single primary inpatient diagnosis or two outpatient primary diagnoses of either schizophrenia or bipolar disorder.

In addition, we required that enrollees have 10 months of Medicaid eligibility, non-dual status, and qualification for Medicaid on the basis of a disability, which resulted in 1,019,123 Medicaid recipients who met our inclusion criteria.3

Overall, 9.7 percent of Medicaid recipients in our dataset had schizophrenia and 12.8 percent had SMI (bipolar disorder and/or schizophrenia) in 2007. Both of these populations were demographically diverse (Appendix Table E.2). About one in five enrollees with schizophrenia were diagnosed with diabetes (17 percent).

Pilot-Test Methodology: Usability, Validity, and Reliability

Pilot-testing the measures using MAX data took several forms. First, we evaluated measure importance (gaps in quality) and scientific acceptability (meaningful differences in performance) by assessing the distributional properties of each measure. This was accomplished by tabulating the minimum, maximum, median, mean, and interquartile range (IQR) for each measure at the state level. The IQR is demarcated by the values at the 25th and 75th percentiles of a distribution. Generally speaking, measures with a broader IQR are preferable to measures with a narrowly distributed IQR or those with an IQR at the very low or very high end of the distribution. For example, a measure with a narrow IQR may not be sufficiently sensitive to detect differences in quality. Measures with an IQR of at least 10 percentage points were considered to have the strongest evidence of usability for quality measurement purposes.

Validity and reliability are important characteristics of measure scientific acceptability. Construct validity was evaluated by examining enrollee outcomes with results displayed by quartile of state-level performance for each measure. We compared rates of schizophrenia-related hospitalization and ED utilization, for beneficiaries in the highest and lowest performing quartile for each quality measure. The difference between the outcomes among enrollees in the best and worst quartiles of state performance for each measure was tested using a one-way analysis of variance; an F-test significance level of p<0.01 was used to determine statistically different outcomes. For a given measure, construct validity was inferred when rates for adverse events among enrollees in high performing states were significantly better (i.e., lower) than the rates of adverse events among enrollees in low performing states.

Convergent validity was examined through between-measure correlation coefficients. For example, we hypothesized that adherence to antipsychotics, as measured by a high rate of antipsychotic medication possession ratio, would be negatively associated with measures of mental health ED use and positively correlated with the measures of 30-day outpatient follow-up after a mental health related discharge. We identify measures with a Pearson correlation of at least 0.15 with two or more measures.

We assessed measure reliability using state-level test-retest correlations with data from 2007 and 2008 MAX data.4 We identify measures with a year-to-year correlation of >0.30. We also examined the stability of relative performance quartiles between 2007 and 2008, with the expectation that at the state level, performance measures should not exhibit any discernible pattern of performance instability over time. In other words, measure stability would be demonstrated if a state was in the top quartile of performance for a given measure in 2007, the same state should demonstrate similar relative performance in 2008. Results from the pilot and field-testing efforts are summarized in the next section.

View full report


"schqm.pdf" (pdf, 5.77Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®