Editor’s note: In addition to Steffie Woolhandler and Dan Ariely (photos and linked bios above), this post is authored by David Himmelstein, a professor at the CUNY School of Public Health at Hunter College.

Paying for performance (P4P) has strong intuitive appeal.  Common sense and rigorous studies tell us that paying more for, say, angioplasties or immunizations yields more of them.  So paying doctors and hospitals for better care, not just more of it, seems like a no-brainer.  Yet while Medicare and many private insurers are charging ahead with pay-for-performance (P4P), researchers have been unable to show that it benefits patients.

Findings from the new field of behavioral economics may explain these negative results.  They challenge the traditional economic view that monetary reward is either the only motivator or is simply additive to intrinsic motivators such as purpose or altruism.  Studies have shown that monetary rewards can undermine motivation and worsen performance on cognitively complex and intrinsically rewarding work, suggesting that P4P may backfire.

The Research Record On P4P

Researchers have failed to demonstrate that financial incentives can improve patient outcomes, and not for lack of trying.  Reviews of early, mostly small P4P studies found virtually no evidence of global quality improvement; mixed evidence on improvement on incentivized process-based measures; and occasional unintended harms.

Two Cochrane reviews appearing in 2011 reached similarly agnostic conclusions.  One overview found that “financial incentives may be effective in changing health care professional practice”, but unearthed “no evidence that financial incentives can improve patient outcomes.”  Another, focused on primary care, found “insufficient evidence to support or not support the use of financial incentives.”

The latest findings are no more reassuring.  In Britain’s massive P4P initiative in primary care, after early apparent success, improvement plateaued for incentivized performance measures and quality deteriorated for non-incentivized measures like continuity of care.  Although doctors reported meeting virtually all P4P hypertension targets (including surrogate outcome measures that were incentivized), neither population blood pressures nor hypertension complications fell.

The major U.S. P4P experiment also yielded a null result.  In Medicare’s Premier Hospital Quality Incentive Demonstration, the 200 participating hospitals’ process-of-care quality indicators improved more rapidly than control hospitals’ over the first two years, according to one oft-cited study.  But differences between P4P and control hospitals had evaporated by five years and patient outcomes didn’t improve at all.  Incentives specially targeted to low-performing hospitals were also ineffective.

No one has undertaken a large-scale randomized controlled trial (RCT) that might definitively determine the effect of P4P in healthcare setting.  However, researchers have completed two trials of the impact of financial incentives on professional performance in education, a milieu with similarities to health care.  A $75 million RCT — involving over 200 high-needs New York City schools employing more than 20,000 teachers — offered incentives of up to $3,000 per teacher based on students’ test scores, graduation and attendance rates, and the results of learning environment surveys.  Notably, most schools opted to pool bonuses among all teachers at the site — the type of institution-level incentives that some P4P proponents advocate.  Yet, “. . . incentives . . . did not increase student achievement in any meaningful way.  If anything, student achievement declined.”  And bigger teacher bonuses yield no better results.  In a Tennessee RCT, offering middle school mathematics teachers P4P bonuses of up to $15,000 failed to raise standardized test scores.

Of course the absence of proof of P4P’s effectiveness does not prove that it’s ineffective.  But, as with other clinical innovations, the mounting number of null studies should breed skepticism.  Moreover, evidence suggests that ubiquitous gaming of quality measurement (e.g. by upcoding diagnoses) may uncouple reward from actual performance; even good people (including doctors and hospital leaders) cheat a little bit when they stand to gain from it, while deceiving themselves into believing they’re honest.

Nonetheless enthusiasm for P4P remains strong.  Medicare is moving ahead with P4P programs for hospitals, ACO’s, HMOs and physicians, and major private insurers are following suit.  Even skeptical scholars have focused mostly on technical specification problems, e.g. identifying better performance yardsticks and the right mix of incentives, or fine-tuning risk adjustment.   Few have countenanced the possibility that P4P may simply not work in health care.

The Science Of Performance And Reward

The quality improvement literature has pinpointed many causes of quality breeches in medical care: fatigue; poorly designed workflow and care systems; undue commercial influence; knowledge gaps; memory lapses; reliance on inappropriate heuristics; poor interpersonal skills and insufficient teamwork, to name just a few.  But “not trying” is rarely cited.  Yet P4P implicitly blames lack of motivation for poor quality care.

But even when motivation is the problem, money isn’t always the solution.  Findings from the new field of behavioral economics indicate that performance bonuses often backfire, particularly for cognitively challenging work.

Traditionally, economists have viewed extrinsic (i.e. monetary) reward as either the only motivator (Figure 1a), or as simply additive to intrinsic motivators such as purpose, altruism, mastery, or autonomy (Figure 1b).  According to this view, higher pay induces better performance. (Figures appear at the end of this post.)

But this simple model of reward-induced performance ignores the complexity of human drive, particularly the role of intrinsic motivation — the desire to perform an activity for its own inherent rewards.  Offering your dinner party host a $10 reward for cooking a wonderful meal isn’t likely to motivate future invitations.

Experimental data documents that financial incentives often “crowd out” intrinsic motivation.  For instance, college students will spontaneously play with interesting puzzles, but once they’re paid to solve them they lose interest in playing for free.

Among frequent (presumably highly motivated) blood donors, an incentive payment (about $55 in today’s dollars) decreased donations in an RCT.  In contrast, payments increased donations among those who hadn’t donated for years.  A Swiss study of volunteer work reached a similar conclusion; unpaid volunteers worked, on average, four hours more monthly than those offered a small payment.

Financial incentives also had untoward consequences in an RCT in Israeli day care centers.  In centers that imposed fines on parents for picking up children late, tardiness increased, and remained high even after the fines were eliminated.  Fines had transformed promptness from a moral duty to a market transaction governed by price.

Moreover, RCTs have shown that upping the rewards may not overcome motivational crowd-out.  In an experiment carried out among MIT students (at semester’s end, when many were cash-strapped) those offered up to $300 for solving mathematical puzzles performed much worse than students offered only $30.  (In contrast, the highly incentivized students did better on simple tasks requiring only manual effort.)  Huge incentives offered to rural villagers in India — equivalent to about half of their annual money income — worsened performance on complex memory and puzzle-solving tasks.  High stakes incentives may be distracting, interfering with cognitive focus and creativity.

A meta-analysis summarizing 128 studies indicates that such findings are representative of a consistent body of research.  The conclusions that emerge from the extensive literature on motivational crowd out include:

  • Tangible rewards — particularly monetary ones — undermine motivation for tasks that are intrinsically interesting or rewarding, an effect that is quite large.
  • Symbolic rewards (e.g. praise or flowers) do not crowd out intrinsic motivation, and may augment it.
  • The negative effects of monetary rewards are strongest for complex cognitive tasks.
  • Crowding-out effects tend to reduce reciprocity and augment selfish behaviors.
  • Crowding-out may spread (both to other tasks and to co-workers), decreasing intrinsic motivation for work not directly incentivized by the monetary rewards.
  • Crowding-out is strongest when external rewards are large; perceived as controlling; contingent on very specific task performance; or associated with surveillance, deadlines or threats.

Although none of these studies analyzed physician or hospital performance, most conditions shown to weaken intrinsic motivation are integral to medical P4P.

Finally, as indicated graphically in Figure 1c, motivational crowd-out works in the opposite direction to the standard supply curve, where performance rises with price.   The net effect of financial rewards depends on the relative size of the price effect and the crowding-out effect.   When crowding-out is modest, the classic economic model underlying P4P holds; you get what you pay for.  However, if intrinsic motivation is high and crowding-out is strong, payment may worsen performance.

Contract Theory And P4P

Until recently doctors’ and hospitals’ payment contracts specified only the general parameters of the exchange (e.g. spend 30 minutes with the patient, or provide a day of ICU care for a heart attack patient).  Most details and unexpected contingencies were covered by social and professional norms.

In contrast to these so-called “incomplete contracts”, P4P strives to cement the deal with an airtight agreement specifying all deliverables in advance – a more “complete contract”.  Yet when it comes to contractual detail, more may not be better.

The optimal specificity of contracts has interested economists’ at least since Ronald Coase’s 1937 paper on the nature of firms — work that was recognized with a Nobel economics prize in 1991 and laid the foundation for Oliver Hart’s pioneering work on incomplete contracts.

Coase and Hart noted the exorbitant administrative and legal costs of spelling out and enforcing complete contracts.  (Indeed, they posited that these transactional inefficiencies drive entrepreneurs to form firms rather than outsourcing all tasks.)  In medicine, the increasing specificity of contracts — a trend that predates P4P — has coincided with a sharp rise in administrative costs.

Costly administration is not the only downside of complete contracts.  If something is omitted from an exquisitely detailed agreement there’s no presumption of default to goodwill — its happy hunting season.  When one of us (DA) asked the Dean of Duke’s Law School about its honor code, he replied that it amounted to little more than “don’t do anything dishonorable”.  Lists of rules (“don’t raise chickens in your dorm room; don’t smoke hashish”) implicitly permit everything else.

Moreover, highly prescriptive contracts have a behavioral downside.  Because professionals may (correctly) perceive detailed contracts as controlling, such contracts tend to worsen motivational crowd-out.  When specifying every detail and contingency isn’t possible, as is clearly the case in medicine, it may be better to rely on professional and social norms.


None can doubt health care’s grave quality deficits and cost excesses.  As remedy, P4P suggests manipulating greed, a fuel that’s powered exponential growth in productivity in the overall economy.  But Adam Smith, who first recognized greed’s awesome power, was also a moral philosopher who believed that commodity production required a parallel public service economy driven by social duty.

Sadly, greed has caused many of the worst abuses within the current system.  Injecting different monetary incentives into health care can certainly change it, but not necessarily in the ways that policy makers would plan, much less hope for.

Figures (click to enlarge):