As of June 2007, there were only nine studies on the impact of hospital P4P programs, one of which was not peer reviewed. All of these studies evaluated programs that targeted the inpatient setting, and none examined P4P interventions in the hospital outpatient setting. Among the studies examining changes in performance, each one reported improvements over time in at least some of the hospital performance measures or condition-specific composites included in the specific study; however it is difficult to disentangle the P4P effect from the effect of other quality improvement efforts that were occurring simultaneously. Improvements in hospital performance have been observed in response to feedback reports (Williams et al., 2005) and public reporting with a financial incentive for submitting data (Grossbart, 2006; Lindenauer et al., 2007).
The two studies with control groups saw very modest improvements in performance associated with P4P compared with what was accomplished with public reporting (Grossbart, 2006; Lindenauer et al., 2007), but one of these studies saw improvements in a few performance areas associated with P4P compared with what was seen for control hospitals participating in voluntary quality improvement activities (Glickman et al., 2007). It has been argued, however, that in order to accomplished sustained quality improvement, interventions should be multifaceted and focus on different levels of the health care system (Grol et al 2002; Grol and Grimshaw 2003). This implies that to be most effective, P4P should be partnered with other activities such as public reporting and internal quality improvement activities that also encourage quality improvement for the same clinical area.
There is less evidence of the effect of P4P on patient outcomes. Berthiaume et al. (2006) found improvements in complication rates for obstetrical and surgical patients in an uncontrolled study but did not report whether those improvements were statistically significant. In the study by Glickman et al. (2007), they did not find significant differences in inpatient mortality improvement for AMI between PHQID and control hospitals. None of the studies evaluating PHQID separately analyzed the other patient outcome measures (for coronary bypass survey and hip and knee replacement surgery) included in the program, so it is not clear whether improvements occurred in these measures.
Most of the published studies have significant methodological limitations. Six of the nine had no controls, which are critical for providing evidence of a link between P4P and performance improvements. This is particularly important given the documented temporal trend toward increasing performance on many hospital quality metrics. It is challenging to disentangle the effects of the increasing use of financial incentives from the effects of greater use of quality improvement initiatives on the local and national level as well as the increasing use of public reporting when all activities are focused on the same clinical conditions. One of the studies that used a control group only included six control hospitals, and it is unclear whether the controls utilized were appropriate.
Beyond the specific limitations of the nine studies, another important issue is whether the experience of these geographically confined incentive programs that took place in the context of established relationships between the individual hospitals and the program sponsors would reflect the experience of wholesale national implementation of a hospital P4P program by Medicare. Medicare is the largest payer of inpatient care in the nation, accounting for 30.4 percent of third-party payments for hospital expenditures (CMS, 2007b). Given the importance of this revenue source for hospitals, it is possible that the level of engagement by hospitals in a national P4P program would be higher than that experienced in the programs in Michigan and Hawaii; though in both Hawaii and Michigan, the incentive program was administered by the dominant commercial payor in `each of those states. Another issue to consider when interpreting the impact of these smaller P4P programs and demonstrations is that they all generally focus on a small set of process measures covering a handful of diagnoses. It is unknown what the impact on raising quality performance more broadly might be if Medicare were to adopt a more comprehensive set of measures.