An Environmental Scan of Pay for Performance in the Hospital Setting: Final Report. CMS–Premier Hospital Quality Incentive Demonstration


Four studies have analyzed the effects of the PHQID, a three-year CMS-sponsored demonstration project initiated in 2003. The PHQID program allowed for voluntary enrollment (i.e., hospital self-selection into the study) and only included hospitals using the Premier Perspectives data system—two factors that may hinder the ability to generalize the experience of the demonstration hospitals to non-demonstration hospitals to the extent that participants differ in important ways from non-participants. It should also be noted that at the start of the Quality Incentive Demonstration period, CMS had already begun implementing its RHQDAPU P4R program, whose set of measures overlapped substantially with that of the PHQID. The PHQID program includes 34 measures of which 22 overlap with RHQDAPU measures in the areas of AMI, pneumonia, CHF, and surgical infection prevention.

The PHQID demonstration includes 262 hospitals across 38 states. Hospitals were paid an annual bonus based on their composite performance scores in five clinical areas: AMI, Coronary Artery Bypass Graft (CABG) surgery, Community Acquired Pneumonia (CAP), CHF, and hip and knee replacement surgery. The bonus dollars represented new money. Hospitals that did not achieve a minimum level of performance in the third year of the program (defined by the lowest two deciles of performance in the first year if the program) were assessed a financial penalty.

Premier, Inc., 2006: Premier published its own report describing the PHQID and the observed quality improvements from the first year of the incentive program’s implementation. Premier reported that between the first and fourth quarters of the first year of the program (October 2003 to September 2004), significant gains were made across the measures in the study, with an average 6.6 percentage point improvement across the five clinical areas. Within each of the five clinical composites, AMI performance increased from 87.4 percent to 90.8 percent, CABG surgery performance improved from 84.9 to 89.7 percent, CAP improved from 69.3 percent to 79.1 percent, CHF increased from 64.6 percent to 74.2 percent, and hip/knee replacement improved from 84.5 percent to 90.1 percent. 

Although these results are positive, it is difficult to draw conclusions from this study about the effect of the PHQID program. An important challenge with this study is trying to assess whether non-participants were achieving similar gains in performance given the absence of a control group. As documented by Williams et al. (2005), there has been a strong trend across the country toward improvement in many of the same measures used as a basis for incentives in the PHQID. Disentangling the impact of the CMS-Premier demonstration from concurrent Joint Commission and CMS quality improvement efforts (i.e., RHQDAPU and the 7th Scope of Work) requires that there be a set of comparison hospitals with similar characteristics but no exposure to the PHQID. Selection bias is another issue to contend with in explaining the observed outcomes, since Premier hospitals that chose to participate in the PHQID had higher baseline quality scores than did Premier hospitals that chose not to. Thus, improvements in performance may be stem partly from characteristics of the hospitals that participated rather than from the incentive program itself. 

Grossbart, 2006: This study examined the impact of the PHQID but focused solely on a subset of hospitals participating in the Premier system. The study followed the performance of hospitals in the Catholic Healthcare Partners system—four that chose to participate in the PHQID and six that chose not to participate and were used as controls. The analysis was limited to a subset of 17 of the 34 measures used in the PHQID initiative (for three clinical conditions, AMI, CAP, and CHF) that were collected by both intervention and control groups of hospitals as part of reporting for Joint Commission ORYX Core Measures program. 

All 10 hospitals showed significant improvement across the measures. Those participating in the PHQID had a greater statistically significant increase in performance than did the non-participants. Across 17 measures, PHQID hospitals improved their scores by 9.3 percentage points, versus 6.7 percentage points for non-participating hospitals. Although the researchers matched hospitals on a number of key characteristics, one important limitation of this study is that they did not match them on baseline performance. The findings are confounded by the fact that the participating hospitals started at a higher level of quality than the non-participants did (80.4 percent versus 78.9 percent).

Much of the observed difference between the two sets of hospitals was driven by greater improvement in CHF care (19.2 percentage points for PHQID hospitals versus 10.9 percentage points for non-participants). Across the 17 measures examined, the two measures with substantial differences in improvement between PHQID and non-participating hospitals were (1) discharge instructions for patients with CHF (40.1 percentage points improvement for PHQID hospitals versus 14.6 for non-participants), and (2) pneumococcal vaccine delivery for patients admitted with pneumonia (31.6 percentage points improvement for PQHID hospitals versus 22.1 for non-participants). These two measures likely drive a substantial fraction of the overall observed differences in improvement between participating and non-participating hospitals.

The PQHID P4P intervention did not occur in isolation; it was conducted in an environment in which several national quality improvement efforts already in play were focusing on the same measures, particularly the HQA measures. These efforts included the CMS RHQDAPU program, the Joint Commission’s quality improvement initiatives, and the CMS 7th Scope of Work. Across the subset of ten HQA measures, the study found that there was no difference in the amount of improvement: 5.4 percentage points for PHQID hospitals, and 5.1 percentage points for non-participating hospitals. This very modest difference, while not statistically different, raises questions about the added value of P4P incentives above and beyond other quality measurement and feedback efforts, particularly the RHQDAPU P4R intervention, which appears to have driven improvements in performance nationally (Lindenauer et al., 2007). Similar levels of improvement were observed among all hospitals nationally, both those exposed to P4P and those exposed to public reporting, measurement, and feedback interventions.

The author described why only some Catholic Healthcare Partners hospitals chose to participate in PHQID. With the exception of those with the highest volume, hospitals saw the costs of participation, particularly for the extra staff required for the additional data collection, as being too high; and most hospital CEOs believed there was little to be gained by participation. Those that chose to participate thought the experience would provide them with a market advantage and a head start given the growing numbers of P4P programs in the market.

It is unknown from this study whether the ten Catholic Healthcare Partners hospitals making up the set are similar to or different from other hospitals nationally in ways that are important. To the extent that these hospitals differ in important ways from other hospitals, the results may not be more broadly generalizable. Another unknown is how Catholic Healthcare Partners hospitals and the system in which they operate may differ from other hospitals nationally, such as in the amount and type of systems and quality resource support that were provided. The six hospitals serving as the control group were selected because of “similar levels of service,” and the hospitals were shown to be similar in terms of availability of an open heart program and average number of beds, discharges, and case-mix index. A more rigorous method of selecting controls would have been to match each intervention hospital to a control on these characteristics as well as on baseline performance.

Lindenauer et al., 2007: This study provides the most comprehensive evaluation of the impact of the PHQID that has been published to date. The paper describes changes in performance on 10 measures that occurred over a two-year period, between the fourth quarter of 2003 and the third quarter of 2005. The study examined 207 PHQID hospitals and 406 control hospitals that were submitting performance data as part of the RHQDAPU program. Hospitals in this study were matched on bed size, teaching status, region (Northeast, Midwest, South, or West), location (urban or rural), and ownership status (for-profit or not-for-profit).

On an overall composite measure constructed from the 10 measures, PHQID hospitals experienced greater improvement than the control hospitals did (9.6 percentage point improvement versus 5.2 percentage points). This difference was seen consistently for each of the three clinical conditions (AMI, CAP, and CHF) for most individual measures and on an appropriate care measure.3 The greatest amount of improvement was seen among hospitals with the lowest baseline performance.

The authors did a number of sensitivity analyses to assess whether this differential response stemmed from a volunteer bias, meaning that Premier Perspectives hospitals that volunteered to select into the PHQID program were inherently different from Premier Perspectives hospitals that did not volunteer. The researchers found that after controlling for baseline performance and volume of patients, the difference in improvement decreased from 4.3 percentage points to 2.9 percentage points, but the improvement was still statistically significantly higher in PHQID hospitals. When all hospitals eligible to participate in the PHQID program were compared to all other hospitals nationally (so those exposed to RHQDAPU), the performance differential remained, but the gap was smaller (the difference in absolute performance point improvement was 2.1 points). Overall, this article provides the strongest evidence that the PHQID is improving performance beyond what is accomplished by public reporting of performance for some of the 10 measures, albeit modestly, once the hospitals’ baseline performance and characteristics are controlled for. Because this study describes the impact of the P4P intervention on top of the measurement and public reporting intervention, we do not know how the impact of the P4P intervention would have differed absent public reporting.

Glickman et al., 2007: This study examined the impact of the PHQID on hospitals voluntarily participating in the national quality improvement initiative Can Rapid Risk Stratification of Unstable Angina Patients Suppress Adverse Outcomes with Early Implementation of the American College of Cardiology/American Heart Association (ACC/AHA) Guidelines (CRUSADE). Hospitals participating in CRUSADE received performance feedback, including comparisons with other CRUSADE hospitals and national standards, as well as a variety of educational interventions. Trends in the cardiac care of patients with non-ST-segment elevation AMI from July 2003 to June 2006 were compared for 54 CRUSADE hospitals participating in PHQID and 446 CRUSADE hospitals not participating in PHQID (i.e., controls). In addition to the AMI measures included in PHQID, the comparison also used eight AMI process measures not included in the demonstration. The study sought to determine whether participation in the P4P intervention gave an additional boost to performance improvement above that from the CRUSADE intervention.

Both PHQID and control hospitals improved performance on PHQID measures and the other AMI measures over the period examined. There were not statistically significant differences between improvement in the PHQID and control groups on the composite measure for either PHQID (7.2 percentage points and 5.6 percentage points, respectively) or other AMI measures (13.6 percentage points and 8.1 percentage points, respectively). PHQID hospitals had significantly greater improvement on three individual measures—two that were included in PHQID (aspirin prescribed at discharge, p = .04; smoking cessation counseling for active or recent smokers, p = .05) and one that was not included in the demonstration (lipid-lowering agent prescribed at discharge, p = .02). There were no statistically significant differences in improvements in inpatient mortality between the two groups. In both groups, hospitals with lower levels of performance at the start of the observation period demonstrated greater improvements in performance than did higher-performing hospitals. 

The authors concluded that P4P leads to only very small improvements in performance beyond what can be accomplished through engagement in quality improvement initiatives. Like the Lindenauer et al. (2007) article, the Glickman et al. article demonstrates the importance of using control hospitals and controlling for baseline performance in any analysis of the impact of hospital P4P.  This study’s limitations are its focus on only one of the clinical areas included in PHQID and its narrow focus on patients with non-ST-segment elevation myocardial infarction. In addition, since the hospitals included in the study voluntarily participated in CRUSADE, it is not known whether hospitals would demonstrate the same level of performance improvement if participation were not voluntary. 

View full report


"PayPerform07.pdf" (pdf, 1.22Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®