Developing Quality Measures for Medicaid Beneficiaries with Schizophrenia: Final Report. 4. Pilot-Testing


The objective of pilot-testing was to determine the scientific acceptability of the measures based on NQF criteria. Table III.5, summarizes the evidence found for each measure through our pilot-testing activities using our 22-state MAX dataset (2007) and our 16-state MAX dataset (2008). Cells containing an 'X' indicate that a measure met predetermined criteria, summarized in Chapter II, which we used to assess differences in performance across states, validity, or reliability. An empty cell indicates that a measure did not meet the criterion in the corresponding column; however, as we discuss in the paragraphs that follow, this does not indicate a measure is without merit or should not be considered useful. In general, as we described below in further detail, caution is warranted in interpreting our pilot-testing findings, as testing results using Medicaid claims should not be used as the sole criteria for judging the merit of the measures.

TABLE III.5. Summary of Pilot-Testing Results: Evidence of Measure Usability, Validity, and Reliability
Measure Detection of Meaningful Differences Validity Reliability
Use of Antipsychotic Medication   X      
Antipsychotic Possession Ratio (>80%)       X  
Diabetes Screening (SMI)f X X X X X
Diabetes Monitoring X X X X X
Cardiovascular Health Screening (SMI)f   X   X  
Cardiovascular Health Monitoring X X   X X
Cervical Cancer Screening       X X
ED Utilization for Mental Health Conditions   N/A   X  
Follow-up after Mental Health Hospital Discharge (7-day) X X     X
Follow-up after Mental Health Hospital Discharge (30-day) X X X   X
  1. Dispersion indicated by an IQR of at least 10 percentage points (Appendix Table E.13).
  2. Construct validity indicated by significant performance differences between top and bottom quartile of measure performance for either schizophrenia-related hospitalization or ED utilization (Appendix Table E.14).
  3. Convergent validity indicated by Pearson r>0.15 in hypothesized direction with at least 2 other measures (Appendix Table E.15).
  4. Reliability indicated by state-level test-retest correlation (2007-2008) Pearson r>0.30 (Appendix Table E.16).
  5. Stability indicated by no more than 1 performance quartile change for any state between 2007 and 2008. For some measures, states had denominators <100 in 2008; these measure/state combinations were excluded from this analysis.
  6. Measure calculated among enrollees with schizophrenia or bipolar disorder.
  1. Five of the ten proposed measures demonstrated significant variability in state-level performance. A key indicator of a quality measure's utility is its ability to capture a wide range of performance. Appendix Table E.13 lists each measure and its distribution across the 22-state dataset. Table III.5 identifies the four measures with an IQR of at least 10 percentage points and those where the lower and upper bounds of the IQR did not encompass the tails of performance (either low or high), indicating measures with the greatest utility for quality measurement purposes.

    The measure "Use of Antipsychotic Medication" had the most restricted performance range (an IQR of 3 percentage points). For example, a state performing at the lower end of the IQR (that is, the 25th percentile), reported 92 percent of recipients received an antipsychotic, while a state at the top end of the IQR (the 75th percentile) reported 95 percent of recipients received an antipsychotic. Therefore, we believe that this measure has limited value from a quality improvement perspective, since the performance range is restricted and is already near the top, thus limiting the potential for improvement. However, because antipsychotic use is a fundamental issue for this population and the measure was widely endorsed by our consultants (the TAG and stakeholder groups), "use of antipsychotic medication" has considerable utility as a monitoring measure.

  2. Seven of the ten proposed measures demonstrated evidence of validity. We assessed validity using two approaches. To assess construct validity we examined the association between measure performance and outcomes (schizophrenia-related hospitalization and ED visits). We compared the percentage of people who hospitalized or visited the ED for schizophrenia, comparing the worst and best-performing quartiles of state performance for each measure. For example, we found enrollees in states with the highest rates of antipsychotic use had significantly lower rates of hospitalization for schizophrenia compared with enrollees in states with the lowest rates of antipsychotic use (Appendix Table E.14). Seven measures demonstrated evidence of construct validity.

    Convergent validity was determined through examination of recipient-level measure correlations (Appendix Table E.15). We considered measures with a correlation coefficient of 0.15 or greater with at least two other measures to demonstrate evidence of convergent validity. Three of the ten measures met this criterion.

    Although some of these results are encouraging, some important limitations of these measures warrant consideration. Our measures of schizophrenia-related hospitalization and schizophrenia-related ED visits assess adverse outcomes at one extreme of care and thus do not reflect the full spectrum of care. Further, measures that assess preventive care processes were not anticipated to have a significant effect on schizophrenia-related hospitalization or ED use, therefore this relationship warrants further investigation to understand this finding.

  3. Nine of the ten measures demonstrated evidence of reliability. Reliability was assessed through correlation of state-level 2007 and 2008 performance. Seven of the ten measures demonstrated 2007-2008 correlation of 0.30 or higher at the state level (Appendix Table E.16). In addition, we compared each state's performance quartile in 2007 with its performance quartile in 2008 to understand the stability of each measure. We defined stability as no more than a one-quartile performance difference between 2007 and 2008; six measures met this criterion (Table III.5). Only "Use of Antipsychotic Medications" failed to show a strong state-level year-to-year correlation (r=0.25) and showed a large performance difference (a three-quartile change) between 2007 and 2008, although this difference was observed in a single, small state.

    In summary, we began with a list of 23 measure concepts to assess the care provided to Medicaid enrollees with schizophrenia, and arrived at a final list of ten measures for submission to NQF. These measures fall into three domains, pharmacological, physical health measures and cross-cutting measures. Current evidence and limitations of claims data prevented us from developing robust measures of psychosocial treatments. Appendix F details the numerator, denominator and exclusions for each of the ten proposed measures.

View full report


"schqm.pdf" (pdf, 5.77Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®