An Environmental Scan of Pay for Performance in the Hospital Setting: Final Report. Hospital Experiences with Premier PHQID


Among Premier hospitals that were voluntarily participating in PHQID, we found broad agreement that their decision to participate reflected a desire to “get in at the start to hopefully shape it” and a recognition that “P4P is coming, and it is a way to gain experience.” Some of the Premier hospitals that were eligible to participate but had declined indicated that they were shadowing the PHQID project by collecting the same data and investing in quality improvement activities. They felt that it was important for them to do so to be prepared when P4P became a reality for all hospitals. Interestingly, among the subset of PHQID hospitals with which we spoke, many stated that the possibility of financial incentive was a negligible factor in their decision to participate in the demonstration. 

While P4P and P4R Are Leading to Behavior Change Among Hospitals, the ROI Is Unclear. PQHID participants stated that the P4P demonstration is driving improvements in the care they provide but that it has required them to allocate significant staff and resources to meet program requirements. This sentiment was echoed by hospitals in the RHQDAPU program. Hospitals felt that incentive payments (actual or potential) did not offset costs they were incurring to participate. Among the hospitals in the RHQDAPU program, a number noted that the cost of participation exceeded the 0.4 percent update they could receive for reporting, although they noted this might change when CMS increased the update factor tied to public reporting to 2 percent. One hospital commented that “you’ve got to make it worth people’s time to do these things.” Several hospitals expressed the importance of having CMS help hospitals see the link between doing better on the quality measures and a positive ROI—such as reductions in costs, lengths of stay, and readmissions. 

The PHQID Incentive Payment Structure Creates Cliff Effects and Penalizes Hospitals That Perform Well. The Premier demonstration payment structure provided financial rewards only to hospitals that performed in the top two deciles of performance, based on a relative comparison of performance among hospital participants in each year of the program. Across the board, hospital participants expressed dislike for the design of the incentive structure. They noted it created a cliff effect (all or nothing payment) by rewarding hospitals at or above the 80th percentile performance and not rewarding any hospital that fell below this cut point—even when there was no statistical difference in their performance. Hospitals felt they were being penalized unfairly under a relative scoring method when most hospitals were scoring at or close to 100 percent—which occurred for several of the performance indicators that had effectively topped out. One hospital cited, as an example, that for aspirin at arrival, the top four decile groups had effectively achieved 100 percent compliance with the performance measure, yet only the top two deciles were paid incentive dollars. Several hospitals questioned the value of having hospitals expend substantial resources chasing the top tail of the performance distribution when performance scores were so tightly clustered to the top right end of the distribution, expressing a belief that the relative benefit to patients was small and that it effectively was causing hospitals to divert resources that could be deployed to lower-performing areas that were not incentivized.

 Over time, as providers make improvements, the compression of performance scores toward the top end of the performance distribution (i.e., the ceiling effect) will present challenges to P4P program sponsors that seek to differentiate providers on a relative performance basis. Common remarks by hospitals included: “All should get the bonus if they achieve top levels of performance,” and “Rewarding the top two deciles is meaningless when the scores are so compressed at the top end.” Other hospital comments reflected frustration with the relative performance incentive structure, for example: “Every time we do better the bar gets higher” (the hospital noted that it was effectively 100 percent on some measures and got no incentive dollars); “Funding [is] only for [the] top 20 percent of hospitals, so 80 percent are spending dollars to improve and getting nothing in return.”

Another reason why hospitals expressed a dislike for using a relative incentive structure is that this approach creates uncertainty about what level of performance is required to win. One hospital said, “The performance bar is constantly shifting up, and it is an unknown to hospitals.” Only at the close of the year, after the hospitals are arrayed in the rank order of their performance, does a hospital know what level of performance was required to hit the 80th percentile of performance to win. Hospitals and their professional associations expressed a strong preference for using an absolute performance threshold as the basis for determining whether a hospital would receive an incentive payment. The absolute threshold was viewed as a preferred approach to structuring an incentive payment because it is “predictable,” “allows a hospital to know in advance what performance target [it] would need to hit,” and “allows all who meet the threshold to secure the bonus.” 

Hospitals also expressed support for establishing a lower threshold in order to be able to qualify for an incentive. It was noted that this threshold should “increase as more institutions met the minimum bar.” Our discussions found lukewarm support among individual hospitals for paying for improvement: “Hospitals should meet a minimum standard of excellence to be allowed to care for patients, so you don’t want to pay for improvement that occurs below this threshold.” Hospital associations, however, strongly supported paying on the basis of improvement.

At This Stage, It Is Unclear Whether PHQID Is Causing Unintended Consequences. While most hospitals stated they did not believe the focus on a limited set of performance measures has led to unintended consequences, such as ignoring other clinical areas, they did say that limited staff and financial resources had caused them to focus heavily on what was being measured and rewarded—providing support to those who claim financial incentives promote teaching to the test. Most hospitals said they either did not know whether negative consequences were occurring or were not specifically tracking them. One hospital remarked, “If anything, PHQID has increased activity and focus, and other quality improvement investments are being made, such as EHRs, CPOE, and use of intensivists, which will drive improvements across the board, not just on those things being incentivized.”

Hospital associations commented that they were aware of one unintended consequence associated with the “antibiotic timing” measure for pneumonia (i.e., percentage of pneumonia patients who have received the first dose of antibiotics within four hours after hospital arrival), which is a measure for PHQID and RHQDAPU. In an effort to do well on this measure, some hospitals may have been over-prescribing antibiotics to patients who did not have pneumonia, giving them the antibiotic within the four-hour window before a diagnosis of pneumonia could be confirmed. There is concern that the overuse of antibiotics will increase resistance to the drug in the future. As a result, this measure has been pulled from the measure set and is being respecified. Hospitals, while unable to cite specific examples, expressed concern that the relative incentive structure could lead to such unintended consequences as gaming of the data or hospitals chasing the very top end of the performance distribution by increasing a performance rate from 98 percent compliance to 100 percent with little to no clinical benefit, just to secure the incentive dollars. Several hospitals stated that because hospital margins are very thin, hospitals will chase the dollars.

The Reporting Burden Is Significant. Hospitals emphasized that the reporting burden for hospitals to comply with PHQID and/or RHQDAPU is significant given that data collection is still largely a manual exercise requiring chart abstraction. This was found to be true even in larger institutions having more information technology (IT) resources. EHRs and CPOE are not yet designed to provide data to populate measures such as those in PHQID, RHQDAPU, or other nationally endorsed measurement sets. Most EHRs capture relevant information in text fields; so even when EHRs are available, a text search must be done to determine if an event occurred. Hospitals universally felt that the data collection burden should be an important selection criterion for P4R and P4P programs. There was also consensus on the need to align measures and measure specifications to minimize data collection and reporting burdens—although it was also noted that the problem was less about alignment of specifications and more about getting the various stakeholders to align on what they want to hold providers accountable for. However, it is important to note that even though CMS allowed sampling of patient records to minimize the hospital reporting burden, many large hospitals reported that they did not use the sampling method, citing a need to have 100 percent of the cases to do their quarterly quality improvement work with doctors. These hospitals stated that the small number of sampled cases showed results that were too variable and did not provide a reliable source of information to give to doctors.

The Problem of Small Numbers Exists. The problem of only a small number of patients meeting the measure criteria was also raised, primarily by small hospitals, including rural hospitals and CAHs. Estimates of performance based on a small number of events (i.e., patients who receive appropriate processes of care) are not stable and vary substantially from period to period, making the task of separating out the “signal” (true performance) from the “noise” (random variation) a challenging one. Hospitals with small numbers of patients cited challenges in interpreting and using results that showed large variation from period to period. Among the smaller hospitals, there was agreement that “we should only be measured on what we actually do.” Smaller hospitals thought that CMS should work to construct measures that more readily apply to the care they provide, such as transfers. When asked whether hospitals would support the use of composites to help with the small-numbers problem, there was no strong signal of support. However, this response may have stemmed from a lack of understanding about how the composites might be constructed. There was, in contrast, strong support for risk adjustment to ensure comparability across hospitals.

Measures of Outpatient Hospital Services Are Not Being Used at This Stage. None of the hospitals or hospital associations with which we spoke reported measures of outpatient hospital services being included in any P4P or P4R program to which they had been exposed, although several of the hospitals exposed to the private-sector P4P program noted that its sponsor was beginning to discuss with hospitals how such measures might be developed. There was general agreement that services—visits, procedures, and tests—provided in the outpatient hospital setting represented a substantial portion of care for which there currently is no accountability. Hospitals noted that outpatient hospital services have been a huge revenue growth area, and some reported seeing “much utilization that seems questionable.” While hospitals recognized that a large amount of care is delivered in this setting, they cited many challenges with developing performance measures and holding hospitals accountable given that data are less standardized on the outpatient side, and the mix of services delivered in this setting varies substantially across institutions.

Support for Having a Robust Data Validation Process Is Strong. Hospitals universally agreed that data validation is a critical feature of P4P programs. Hospitals were concerned about possible gaming, especially if there is “too much money on the table and people start panicking,” and believed that an audit function was needed to guard against this behavior. An attestation-type approach to data validation, such as the process the Leapfrog Group uses, was not viewed as sufficiently rigorous for situations in which money is tied to performance. Hospitals expressed frustration with the substantial lag in the current validation processes—minimally six to nine months for PHQID, and 12 months before RHQDAPU results are posted on Hospital Compare—which slows down the process for getting feedback for CQI and public reporting. Hospitals stated a need for more-frequent updates—within three months of data submission—with comparisons to peers/benchmarks for use in quality improvement activities.

Transparency of Performance Results Is Viewed as a Positive. Hospitals indicated that they thought public reporting of performance on the hospital measures was good and that it has forced their doctors to pay attention and get engaged. One hospital noted that “an external force doing measurement and reporting is our key lever (other than relational) with doctors to get them to change their behavior.” Another noted that “it says someone is watching.” Only a few hospitals said that “reporting hasn’t been a factor in driving behavior changes.” Most hospitals stated that public reporting of their results compared with those of their peers has garnered the attention of their hospital boards and stimulated investment in quality improvement, noting that “no one wants to be at the bottom of the list.” Hospitals preferred that if the RHQDAPU program evolved into a P4P program, a pilot or dry-run period of data collection occur prior to public reporting and payouts. 

Although hospital leadership and physicians are internally paying attention to the comparative results, hospitals seemed to be unsure about whether consumers really use the information. Many hospitals thought that the CMS Hospital Compare website  should be simplified to make it easier for consumers to use. There was no consensus among hospitals about what would be the appropriate comparison group of hospitals or whether one is even needed for public reporting of results. One hospital stated: “The consuming public needs to know if a hospital will provide adequate care, so the focus should be on whether the hospital hits a threshold target [rather than] comparing one hospital to another.” Another hospital thought that regional comparisons would be helpful to consumers “who won’t be traveling to other states for care.”

Hospitals Are Encountering Certain Challenges. Many hospitals stated that it was difficult to get physicians to change their behavior regarding actions called for in the performance measures and that they felt as though they were serving as a go-between for CMS and the physician. Hospitals thought they had little leverage to affect physician behavior other than having good relationships. The current prohibition on gain sharing precludes hospitals from structuring provider financial incentives within their organizations, thus hindering their ability to motivate physicians to engage in the P4R and P4P programs (“A slow process until MD incentives are also aligned.” “Physician and hospital P4P programs shouldn’t be separate”).

Having to work with and win over doctors was a common theme in our discussions with hospitals (“Doctors don’t like hospitals telling them what to do.” “Doctor’s don’t like to practice cookbook medicine”). 

Some hospitals reported that in response to the challenges of engaging physicians, they had developed solutions to force behavior change, such as creating admission and discharge forms that prompt doctors for information and/or to do required things, creating standing clinical protocols, and structuring clinical treatment paths differently. Hospitals appeared to be developing unique interventions rather than implementing a one-size-fits-all approach to driving improvements in care. It was noted that making P4P and quality improvement work requires a lot of coordination across departments. 

Hospitals also noted that involvement in these programs requires a lot of staff resources for data collection and validation and quality improvement. Several remarked that to succeed in these programs, a hospital needs infrastructure and multidisciplinary teams, two things not available in smaller community hospitals and hospitals in rural areas, where there are no dedicated staff to perform these functions and “the CEO is often wearing several hats within the organization.”

On the subject of data submissions and the validation process, hospitals expressed broad appreciation for the important “assistance” role that Premier played as a “go-to” entity. The feeling was that Premier provided an important support function related to a hospital’s ability to comply with the program requirements.

 Hospitals cited struggles faced because of ongoing changes in the evidence without corresponding changes in what hospitals are held accountable for. They reported that their physicians had made changes in practice consistent with new evidence, even though the hospitals were still required to comply with measure specifications that reflected out-of-date evidence. Hospitals urged that P4R and P4P program sponsors work to address, in a timely manner, changes in the evidence and what hospitals are held accountable for. 

View full report


"PayPerform07.pdf" (pdf, 1.22Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®