Development of a Quality Measure for Adults with Post-Traumatic Stress Disorder. EXECUTIVE SUMMARY

05/01/2019

Purpose

In September 2011, the U.S. Department of Health and Human Services Office of the Assistant Secretary for Planning and Evaluation, with support from the HHS National Institute of Mental Health, contracted with Mathematica Policy Research and the National Committee for Quality Assurance (NCQA) to develop quality measures for treatment of adults with PTSD. This 3.5-year project began by reviewing existing research evidence and measures and gathering input from a technical advisory group to identify and prioritize opportunities for new measures. We then specified and pre-tested a survey measure of the delivery of evidence-based psychotherapy for adults.

To develop the survey items, we sought input from a technical panel of experts in psychotherapeutic treatments for adults with PTSD and reviewed clinical manuals to produce a list of common evidence-based psychotherapeutic elements of PTSD. We converted the elements into three parallel sets of survey items to be completed by three different respondent groups: clinicians, clinical supervisors, and clients. The development of the three versions of the measure provides an opportunity to begin to assess which type(s) of rater results in the most credible and reliable measure. We revised the survey items based on input from groups of clinicians and clients. The clinician survey is presented in Appendix E.

To gather initial information about the measure's importance, feasibility, usability, and scientific acceptability in accordance with National Quality Forum endorsement standards, we gathered quantitative and qualitative data from six behavioral health organizations that provide outpatient services to adults with PTSD. Our quantitative testing involved fitting statistical models to identify the measure's underlying theoretical constructs and determine the necessity of each individual survey item. We examined the reliability of the measure using different psychometric tests depending on the type of reliability (inter-rater agreement or internal consistency) examined. We also conducted a preliminary assessment of the measure's sensitivity and specificity to determine the extent to which we could identify high-performing and low-performing clinicians, using scores we created based on performance at the 50th and 75th percentiles in the delivery of evidence-based psychotherapy. Finally, we conducted focus groups with a range of stakeholders and gathered information from site coordinators to obtain input on the measure's importance and face validity and to understand whether it could yield findings that could be used to inform quality improvement efforts. We also sought stakeholders' perspectives on practical barriers to implementing the measures.

Measure Testing Results

For each clinician, three therapy sessions for three different clients were sampled from the clinician's current caseload of adults with PTSD. The clinician, the clinician's supervisor, and the clients completed the survey following each sampled therapy session. We received 96 clinician, 97 supervisor, and 78 client surveys. Response rates were 98 percent, 99 percent, and 80 percent for clinicians, supervisors, and clients, respectively. The majority of clinicians and supervisors completed the survey on the web, whereas the majority of clients complete the survey on paper. On average, respondents completed the web survey in 8-10 minutes. In focus group discussions, most stakeholders felt the measure was too long and recommended shortening it.

We identified five similar underlying constructs in the measure that fit the data well in the clinician, supervisor, and client samples: (1) structuring and conducting the therapy session; (2) psychoeducation and therapeutic techniques; (3) therapeutic alliance; (4) assessment; and (5) homework. Some items correlated with more than one construct and other items had low correlations with the constructs. Taken together, the results suggest that the survey items assess constructs related to the delivery of psychotherapy for PTSD, but that some of the items may be unnecessary or require refinement. Although many stakeholders agreed the measure captures elements of psychotherapy, some stakeholders felt it focused too strongly on cognitive behavioral approaches when other psychotherapies are also delivered to adults with PTSD.

Across the reliability tests conducted, the measure demonstrated fair to good reliability. On average, we observed the highest reliability across all constructs in the supervisor sample, followed by the clinician and client samples. Supervisors and clinicians had the highest inter-rater agreement; supervisors and clients and clinicians and clients had comparable inter-rater agreement. The reliability results suggest some items may need revision, particularly among the items that comprise the "assessment" construct.

To begin to understand the measure's validity, we calculated its sensitivity and specificity for each of the five constructs and compared clinician and client scores to the supervisor scores, which for the purposes of these analyses, we treated as a gold standard. We examined the implications for the measure's sensitivity and specificity using two thresholds, the 50th and 75th percentiles, to determine high and low delivery of evidence-based psychotherapy. Measure sensitivity and specificity at the 50th percentile ranged from 0.50 to 0.79 and 0.49 to 0.78, respectively. At the 75th percentile, sensitivity ranged from 0.22 to 0.57 and specificity ranged from 0.75 to 0.85. Based on these preliminary findings, the 50th percentile threshold appears to better discriminate high and low performance. We treated the supervisor survey as a gold standard; however, stakeholders uniformly indicated a lack of endorsement for it due to the changes in process and the resources that would be required to routinely collect data from supervisors for quality improvement purposes. Some stakeholders noted a preference for the client survey, whereas others indicated a preference for either the client or clinician survey.

Conclusions and Next Steps

The development of a measure of the delivery of evidence-based psychotherapy has the potential to improve the quality of care for adults with PTSD. We made promising strides in creating the foundation of such a measure; however, a significant amount of additional work is needed to develop a final measure that can be used for accountability purposes. Below, we provide overarching conclusions and recommended next steps.

  • Additional input. Although there was support for use of the measure in training and education, support for using it for accountability purposes was limited. Additional input from a larger group of stakeholders regarding the measure's use for internal quality improvement and the circumstances under which it would be useful would inform the next stages of measure development.

  • Further revisions. Our analyses suggest that the survey assesses important underlying constructs associated with the delivery of evidence-based treatment for PTSD and that many survey items produce significant agreement across the three raters. The analyses also suggest that several items need refinement. For example, items with low inter-rater agreement and/or low internal consistencies may be candidates for deletion. Items with significant cross-loadings and moderate agreement could need revision. The surveys should be revised further, with additional cognitive testing and stakeholder input conducted on the refinements.

  • Further investigation of feasibility. Several stakeholders expressed concern regarding the measure's feasibility. Refinements to the survey items may result in a shorter measure that takes less time to complete, which should improve the feasibility of using it. In addition, it would be useful to have additional information from a larger group of stakeholders regarding topics such as preferred survey mode (including mobile technology applications), the infrastructure available to support the measure, and approaches to automating aspects of site coordination.

  • Further development of the measure for broader application. The factor analyses results identified therapeutic constructs that are likely relevant in the delivery of psychotherapy for conditions other than PTSD. The measure could be refined and further tested to create modules that broadly apply to the delivery of psychotherapy.

  • Examination of inter-rater reliability and factor structure using revised items and a larger sample. Once the survey items have been refined, additional work will be needed to test whether the refinements improve inter-rater agreement and the factor structure. The goal of our current project was to pre-test this instrument. A pilot test with a larger sample that offers increased diversity in sites, clinicians, and clients would increase the external validity of the measure.

  • Examination of other scoring methods. Our current thresholds for high and low delivery of evidence-based psychotherapy yielded positive results in terms of specificity and sensitivity. After item refinement, these scoring methods should be verified and compared to other possible methods of scoring. For example, contextual scoring may be beneficial, as it would allow clinicians flexibility in deviating from a treatment plan for appropriate reasons (such as in a case where a clinician did not use an expected set of therapeutic elements, because he or she had to help a client manage suicidal ideation).

  • Additional validity testing. Additional psychometrics are needed to validate this measure. The use of an external, independent rater (not associated with the site) to serve as the preferred gold standard is important. To assess the measure's predictive validity, information on patient outcomes (for example, symptom improvement, quality of life, and functioning) is critical.

The measure developed under this project has the potential to address significant gaps in quality of PTSD care. Additional work is needed to further prepare it for implementation on a larger-scale basis and to better understand the groups and situations where the measure will be most useful.