Blog Home


Will Pay For Performance Backfire? Insights From Behavioral Economics

October 11th, 2012

Editor’s note: In addition to Steffie Woolhandler and Dan Ariely (photos and linked bios above), this post is authored by David Himmelstein, a professor at the CUNY School of Public Health at Hunter College.

Paying for performance (P4P) has strong intuitive appeal.  Common sense and rigorous studies tell us that paying more for, say, angioplasties or immunizations yields more of them.  So paying doctors and hospitals for better care, not just more of it, seems like a no-brainer.  Yet while Medicare and many private insurers are charging ahead with pay-for-performance (P4P), researchers have been unable to show that it benefits patients.

Findings from the new field of behavioral economics may explain these negative results.  They challenge the traditional economic view that monetary reward is either the only motivator or is simply additive to intrinsic motivators such as purpose or altruism.  Studies have shown that monetary rewards can undermine motivation and worsen performance on cognitively complex and intrinsically rewarding work, suggesting that P4P may backfire.

The Research Record On P4P

Researchers have failed to demonstrate that financial incentives can improve patient outcomes, and not for lack of trying.  Reviews of early, mostly small P4P studies found virtually no evidence of global quality improvement; mixed evidence on improvement on incentivized process-based measures; and occasional unintended harms.

Two Cochrane reviews appearing in 2011 reached similarly agnostic conclusions.  One overview found that “financial incentives may be effective in changing health care professional practice”, but unearthed “no evidence that financial incentives can improve patient outcomes.”  Another, focused on primary care, found “insufficient evidence to support or not support the use of financial incentives.”

The latest findings are no more reassuring.  In Britain’s massive P4P initiative in primary care, after early apparent success, improvement plateaued for incentivized performance measures and quality deteriorated for non-incentivized measures like continuity of care.  Although doctors reported meeting virtually all P4P hypertension targets (including surrogate outcome measures that were incentivized), neither population blood pressures nor hypertension complications fell.

The major U.S. P4P experiment also yielded a null result.  In Medicare’s Premier Hospital Quality Incentive Demonstration, the 200 participating hospitals’ process-of-care quality indicators improved more rapidly than control hospitals’ over the first two years, according to one oft-cited study.  But differences between P4P and control hospitals had evaporated by five years and patient outcomes didn’t improve at all.  Incentives specially targeted to low-performing hospitals were also ineffective.

No one has undertaken a large-scale randomized controlled trial (RCT) that might definitively determine the effect of P4P in healthcare setting.  However, researchers have completed two trials of the impact of financial incentives on professional performance in education, a milieu with similarities to health care.  A $75 million RCT — involving over 200 high-needs New York City schools employing more than 20,000 teachers — offered incentives of up to $3,000 per teacher based on students’ test scores, graduation and attendance rates, and the results of learning environment surveys.  Notably, most schools opted to pool bonuses among all teachers at the site — the type of institution-level incentives that some P4P proponents advocate.  Yet, “. . . incentives . . . did not increase student achievement in any meaningful way.  If anything, student achievement declined.”  And bigger teacher bonuses yield no better results.  In a Tennessee RCT, offering middle school mathematics teachers P4P bonuses of up to $15,000 failed to raise standardized test scores.

Of course the absence of proof of P4P’s effectiveness does not prove that it’s ineffective.  But, as with other clinical innovations, the mounting number of null studies should breed skepticism.  Moreover, evidence suggests that ubiquitous gaming of quality measurement (e.g. by upcoding diagnoses) may uncouple reward from actual performance; even good people (including doctors and hospital leaders) cheat a little bit when they stand to gain from it, while deceiving themselves into believing they’re honest.

Nonetheless enthusiasm for P4P remains strong.  Medicare is moving ahead with P4P programs for hospitals, ACO’s, HMOs and physicians, and major private insurers are following suit.  Even skeptical scholars have focused mostly on technical specification problems, e.g. identifying better performance yardsticks and the right mix of incentives, or fine-tuning risk adjustment.   Few have countenanced the possibility that P4P may simply not work in health care.

The Science Of Performance And Reward

The quality improvement literature has pinpointed many causes of quality breeches in medical care: fatigue; poorly designed workflow and care systems; undue commercial influence; knowledge gaps; memory lapses; reliance on inappropriate heuristics; poor interpersonal skills and insufficient teamwork, to name just a few.  But “not trying” is rarely cited.  Yet P4P implicitly blames lack of motivation for poor quality care.

But even when motivation is the problem, money isn’t always the solution.  Findings from the new field of behavioral economics indicate that performance bonuses often backfire, particularly for cognitively challenging work.

Traditionally, economists have viewed extrinsic (i.e. monetary) reward as either the only motivator (Figure 1a), or as simply additive to intrinsic motivators such as purpose, altruism, mastery, or autonomy (Figure 1b).  According to this view, higher pay induces better performance. (Figures appear at the end of this post.)

But this simple model of reward-induced performance ignores the complexity of human drive, particularly the role of intrinsic motivation — the desire to perform an activity for its own inherent rewards.  Offering your dinner party host a $10 reward for cooking a wonderful meal isn’t likely to motivate future invitations.

Experimental data documents that financial incentives often “crowd out” intrinsic motivation.  For instance, college students will spontaneously play with interesting puzzles, but once they’re paid to solve them they lose interest in playing for free.

Among frequent (presumably highly motivated) blood donors, an incentive payment (about $55 in today’s dollars) decreased donations in an RCT.  In contrast, payments increased donations among those who hadn’t donated for years.  A Swiss study of volunteer work reached a similar conclusion; unpaid volunteers worked, on average, four hours more monthly than those offered a small payment.

Financial incentives also had untoward consequences in an RCT in Israeli day care centers.  In centers that imposed fines on parents for picking up children late, tardiness increased, and remained high even after the fines were eliminated.  Fines had transformed promptness from a moral duty to a market transaction governed by price.

Moreover, RCTs have shown that upping the rewards may not overcome motivational crowd-out.  In an experiment carried out among MIT students (at semester’s end, when many were cash-strapped) those offered up to $300 for solving mathematical puzzles performed much worse than students offered only $30.  (In contrast, the highly incentivized students did better on simple tasks requiring only manual effort.)  Huge incentives offered to rural villagers in India — equivalent to about half of their annual money income — worsened performance on complex memory and puzzle-solving tasks.  High stakes incentives may be distracting, interfering with cognitive focus and creativity.

A meta-analysis summarizing 128 studies indicates that such findings are representative of a consistent body of research.  The conclusions that emerge from the extensive literature on motivational crowd out include:

  • Tangible rewards — particularly monetary ones — undermine motivation for tasks that are intrinsically interesting or rewarding, an effect that is quite large.
  • Symbolic rewards (e.g. praise or flowers) do not crowd out intrinsic motivation, and may augment it.
  • The negative effects of monetary rewards are strongest for complex cognitive tasks.
  • Crowding-out effects tend to reduce reciprocity and augment selfish behaviors.
  • Crowding-out may spread (both to other tasks and to co-workers), decreasing intrinsic motivation for work not directly incentivized by the monetary rewards.
  • Crowding-out is strongest when external rewards are large; perceived as controlling; contingent on very specific task performance; or associated with surveillance, deadlines or threats.

Although none of these studies analyzed physician or hospital performance, most conditions shown to weaken intrinsic motivation are integral to medical P4P.

Finally, as indicated graphically in Figure 1c, motivational crowd-out works in the opposite direction to the standard supply curve, where performance rises with price.   The net effect of financial rewards depends on the relative size of the price effect and the crowding-out effect.   When crowding-out is modest, the classic economic model underlying P4P holds; you get what you pay for.  However, if intrinsic motivation is high and crowding-out is strong, payment may worsen performance.

Contract Theory And P4P

Until recently doctors’ and hospitals’ payment contracts specified only the general parameters of the exchange (e.g. spend 30 minutes with the patient, or provide a day of ICU care for a heart attack patient).  Most details and unexpected contingencies were covered by social and professional norms.

In contrast to these so-called “incomplete contracts”, P4P strives to cement the deal with an airtight agreement specifying all deliverables in advance – a more “complete contract”.  Yet when it comes to contractual detail, more may not be better.

The optimal specificity of contracts has interested economists’ at least since Ronald Coase’s 1937 paper on the nature of firms — work that was recognized with a Nobel economics prize in 1991 and laid the foundation for Oliver Hart’s pioneering work on incomplete contracts.

Coase and Hart noted the exorbitant administrative and legal costs of spelling out and enforcing complete contracts.  (Indeed, they posited that these transactional inefficiencies drive entrepreneurs to form firms rather than outsourcing all tasks.)  In medicine, the increasing specificity of contracts — a trend that predates P4P — has coincided with a sharp rise in administrative costs.

Costly administration is not the only downside of complete contracts.  If something is omitted from an exquisitely detailed agreement there’s no presumption of default to goodwill — its happy hunting season.  When one of us (DA) asked the Dean of Duke’s Law School about its honor code, he replied that it amounted to little more than “don’t do anything dishonorable”.  Lists of rules (“don’t raise chickens in your dorm room; don’t smoke hashish”) implicitly permit everything else.

Moreover, highly prescriptive contracts have a behavioral downside.  Because professionals may (correctly) perceive detailed contracts as controlling, such contracts tend to worsen motivational crowd-out.  When specifying every detail and contingency isn’t possible, as is clearly the case in medicine, it may be better to rely on professional and social norms.


None can doubt health care’s grave quality deficits and cost excesses.  As remedy, P4P suggests manipulating greed, a fuel that’s powered exponential growth in productivity in the overall economy.  But Adam Smith, who first recognized greed’s awesome power, was also a moral philosopher who believed that commodity production required a parallel public service economy driven by social duty.

Sadly, greed has caused many of the worst abuses within the current system.  Injecting different monetary incentives into health care can certainly change it, but not necessarily in the ways that policy makers would plan, much less hope for.

Figures (click to enlarge):

Email This Post Email This Post Print This Post Print This Post

 to the #1 source of health policy research.

9 Trackbacks for “Will Pay For Performance Backfire? Insights From Behavioral Economics”

  1. Will Pay For Performance Backfire? Insights From Behavioral Economics | With My Right Brain |
    January 21st, 2013 at 10:48 am
  2. Pay for Performance in Healthcare: Do We Need Less, More, or Different? | Wachter's World
    November 27th, 2012 at 4:19 am
  3. What is Uncompensated Care? - The Doctor Weighs In | The Doctor Weighs In
    October 22nd, 2012 at 8:36 am
  4. More concern over P4P — db's Medical Rants
    October 21st, 2012 at 7:22 am
  5. Will paying for quality in Medicare backfire? | PolitifreakPolitifreak
    October 16th, 2012 at 1:36 pm
  6. Will paying for quality in Medicare backfire?
    October 16th, 2012 at 1:23 pm
    October 12th, 2012 at 10:06 pm
  8. Financial Incentives May Sap Motivation « Single Payer Action
    October 12th, 2012 at 8:35 am
  9. Will Pay For Performance Backfire? Insights From Behavioral Economics – Health Affairs Blog | real utopias |
    October 12th, 2012 at 1:09 am

4 Responses to “Will Pay For Performance Backfire? Insights From Behavioral Economics”

  1. Randy Holland Says:

    Fabulous article. May I offer some commentary?

    I’ll limit my P4P research observations to one regarding the RCT on teachers. I’m just an MBA, but I got my MBA at 40 as a business owner, which I like to think diminishes my “over-entitled” footprint on the world. Even if I’m wrong about that, I would still say that teachers did not enter their career to be paid for performance and it’s likely that they were loath to engage it when it was asked of them. All in all, a unfortunate pairing of incentive to archetype.

    Your bits on motivational crowd-out and contract theory is where I’d like to chime in. Again, really eloquent observations, and I think I have, at least implicitly, addressed many of these rational objections. My model seeks to reward compliance rather than a medical outcome (an excellent example of an incomplete (and favorable) contract), explicitly because, as I state in my plan, asking an individual to “sign on” to an outcome requires an irrational amount of medical knowledge and makes the process complex and overly taxing. Further, I dramatically reduce the “size” of the reward to an amount that begins with merely getting one’s investment back (The first 8-weeks reward is an approximate return of the money initially invested) so one cannot participate without having skin in the game. Further, Paying HCS an economically rational amount for an outcome when it alone is at risk for the outcome rather than the beneficiary is a game changer. The moral hazard of high rewards is diminished greatly, and the fact that the payment going to HCS is larger than the payment going to the beneficiary should not enter into it unless HCS adds no value. All I am eager to find out is whether my assertions can survive a few trials. Make no mistake, my model is different, and the arguments asserted in the article address the differences rather effectively.

  2. David Himmelstein Says:

    Gwanstadt offers a rosy view of P4P based on outcomes. Would that it were as simple as he or she suggests. For one thing, the most important outcomes – death and disability – often occur many years, even decades after the doctor/patient interaction. P4P can’t possibly operate on that time frame.

    Moreover, outcomes are affected by myriad social and biological factors that are outside the doctor’s control. Doctors who care for poor, minority and non-compliant patients look bad on P4P outcome measures, regardless of their skill. Finding the needle of performance amidst the haystack of other outcome determinants is well beyond current, or foreseeable, risk adjustment methods.

    Risk adjustment of outcomes is a daunting task under the best of circumstances. But when providers have incentives to upcode diagnoses and play other games that will cast their performance in the most flattering light, risk adjustment schemes produce nonsense. A physician who performs echocardiograms on all of her asymptomatic octogenarian patients could label many – even most – of them with the diagnosis of “congestive heart failure”. While this diagnosis would do no good for the patients, it would make the doctor’s panel of patients look very sick, and hence her outcomes look very good indeed. Similarly, incentives to keep patients’ systolic blood pressure below 140 have triggered memos in two practices that we know of instructing staff to round down rather than up when recording blood pressures that are near the line, generating a surge in patients whose blood pressure is “well controlled” at 139.

    In sum, P4P based on outcomes won’t improve quality. What it will do is penalize providers caring for vulnerable patients, grossly distort quality data and distract physicians from the arduous work needed for real quality improvement.

    Our experience as clinicians conflict’s with Gwanstadt’s view that patients are voracious consumers of care whose appetites must be curbed by making them bear out-of-pocket costs. Other than the rare case of hypochondriasis, people generally view a trip to the doctor as an inconvenience (or worse), hardly akin to a lobster dinner as Gwanstadt suggests. Most people want the amount of care that optimizes their health; no more, no less.

    David Himmelstein, M.D.
    Steffie Woolhandler, M.D., M.P.H.

  3. cgreen23 Says:

    Very good article on intrinsic vs extrinsic motivation. At the same time, re applying it to healthcare, I found Gwanstadt’s commentary very persuasive.

    This is a good dialogue – anything from Woolhandler and Ariely by way of response to Gwanstadt?

  4. gwanstadt Says:

    Woolhandler and Ariely nicely explain one important shortcoming of P4P: when coupled with the inspection model of quality, the friction costs are too great.

    They are also correctly document that injecting financial incentives into the current system is unlikely to make much difference, and do a nice job of documenting this with the data we now have.

    However, their conclusion that financial incentive are not likely to work in medicine due to moral hazards, or some other vague concepts of morality or ethics, is not substantiated, and is likely wrong. Globally, many high value health care systems exist!

    It is not the fundamental nature of healthcare as a business that is at the heart of our American problem. The source of our trouble simple: our healthcare financial incentives are being driven by the insurance industry. That industry is behaving rationally by maintaining low administrative costs, which evolved a simple bargain between the insurer and provider namely, “Tell us what you did and we will pay you the going rate”. This is easy to administer.

    We are currently getting what we have contracted for- a lot of procedures and encounters, but not enough health. The data shows that grafting P4P process measures of quality and cost onto this flawed bargain is not going to do much.

    Lack of data does not mean lack of effect. We do not have data about how P4P might perform if incorporated into a more rational bargain for society. The needed experiment: apply P4P to OUTCOMES. That is the combination that will loose the power of the market to provide good health at low cost, and we must get on with it. A system based on aligned patient and provider financial incentive for high health status at low overall cost will dramatically change behaviors of both and the value proposition. The digital transformation of medicine makes such a system quite affordable if outcome measures are included in the design.

    An additional twist: The insurance mechanism adds value by protecting us against infrequent catastrophic events, but also adds an administrative burden and encourages overuse- if we had grocery insurance, everyone would want lobster. We have already realized this in medicine and do not cover cosmetic surgery. So, insurance for the lowest value quarter or third of healthcare should probably be proscribed- it has been shown to add little to population health. This will allow unhampered market forces to determine how many choose a year on a respirator, etc.

    The big winner will be the high value preventive services and healthy behaviors, which are sadly underutilized in the US. These must be incentivized.

    When financial success in medicine means creating healthy patient populations, provider behaviors will change, most doctors will be happier, most patients will be happier, and our nation will be much more competitive, for three reasons: 1) a lower health care “tax” on our goods and services, 2) the increased productive capacity of a healthy population, and 3) the increased consumption capacity of a healthy population.

Leave a Reply

Comment moderation is in use. Please do not submit your comment twice -- it will appear shortly.

Authors: Click here to submit a post.