Virtually unheard of thirty years ago, workplace wellness is now embedded in large self-insured companies. These firms pay their workers an average of $460/year to participate in worksite wellness programs. Further, wellness is deeply enough engrained in the public policy consciousness to have earned a prominent place in the Affordable Care Act, which allows large employers to tie a significant percentage of health spending to employee health behavior and provides direct subsidies for small businesses to undertake these workplace wellness programs.

Yet the implausible, disproven, and often mathematically impossible claims of success underlying the “get well quick” programs promoted by the wellness industry raise many questions about the wisdom of these decisions and policies.  In this post, we lay out the evidence demonstrating that the industry consistently mis-measures and overstates the direct healthcare cost savings.  We suggest several strategies to prevent this and to re-allocate wellness dollars from “get well quick” schemes to the much more challenging, but ultimately more rewarding, task of truly creating a culture of wellness, a workplace that can attract and retain healthier, presumably more productive, people than competitors do.  There is no guarantee that strategy would work and no easy way to implement it, but clearly the easy approach isn’t working.

Creating a culture of wellness begins by establishing standards, practices, policies, and procedures that inculcate a particular set of wellness values in employees.  Employers advised by some of the leading benefits consultants and wellness vendors in the US routinely fail at this crucial step.  Consider one Fortune 1000 company, for example, that annually renews its long-running wellness program, while it simultaneously surrounds employees with vending machines filled with chips and soda and serves up pizza daily in its commissaries.  Ironically, this company’s workforce has recently declined by 5 percent while aggregate medical care spending rose by an equivalent amount during the same period, producing an 11 percent increase in per capita spending.  Yet, the wellness program — foisted upon leadership by benefits consultants — proceeds unchallenged by human resources (HR) executives.

Another Fortune 1000 company recently held open health insurance enrollment for the next benefits year.  In a conference call with employees, an HR executive told them that their answers to health risk appraisals (HRA) questions would not matter; the mere fact of completing the HRA would produce a discounted premium, no matter whether the answers (including entry of blood pressure and cholesterol measurements) were truthful or not.

The most pointed, but very instructive example, of having a high impact wellness culture is actually the military.  There are very few active duty obese members in the four major services; everyone gets required immunizations, and the consequence is that the military has much lower incidence rate of wellness-sensitive events among the active duty than do private sector employers, even after adjusting for age.  The military’s medical spending inflation dilemma stems from civilian employees, dependents, and retirees who account for 83 percent of covered persons and, unsurprisingly, are not subject to the same physical standards as active duty personnel.

While we do not believe that employers need to adopt military-style standards, we do propose that it is illogical to expect sustainable reductions in medical care spending if corporate leaders treat their environments, personnel policies, practices, and procedures with the insouciance of people who believe that they can just wish something into existence.

The Wrong Track

It is crucial that we move in the direction of culture-driven and evidence-based wellness programs, since the current wave of wellness programs are taking us wildly off course by promising substantial short-term reductions in health spending. This is clear whether one looks at the peer-reviewed literature, outcomes measurement generally, marketing claims, or the “face validity” of the broader impact of wellness on population health status.

Peer-reviewed Literature

In the peer-reviewed literature, studies consistently claim savings in medical claims far more substantial than any other category of health benefit initiative (an average of 24.5 percent according to one meta-review of 62 studies), resulting in returns-on-investment (ROIs) of 3.27:1 according to another meta-analysis, or roughly $340 per person) that themselves are also far higher than any other voluntary managed care initiative.

These results are completely inconsistent with marketplace behavior, meaning both can’t be right.  Why, if medical expenses fall so much, don’t insurers — for whom medical expenses are front and center — routinely offer and incentivize wellness for their fully insured populations?   Savings of that magnitude would dramatically multiply their profits on these fully insured members.   Yet, it appears that none of America’s roughly 1,000 private health plans routinely incentivize wellness programs for fully insured members with their own cash and/or premium discounts.  Nor do they routinely require their fully insured members to take HRAs or biometric screens, which underlie most corporate wellness programs.

As is often the case when there is a gap between marketplace behavior and pundits’ views of what should be the case, the marketplace is likely right and the gap can be explained.  One reason for this gap is that, unlike insurers who are immersed in this field and accustomed to hearing implausible outcomes claims, their employer counterparts who champion wellness programs typically rose through the corporate ranks in the non-quantitative HR field.   This creates an asymmetry of information, because both justifying a wellness program and defining its ROI are data-driven processes.  (Wikipedia contains extensive, hyperlinked, information on why information asymmetry will distort a marketplace.)

Further, these HR professionals are often advised on quantitative matters by benefits consultants for whom new program procurement, implementation, and evaluation are profit centers, and who themselves also lack training in outcomes measurement analytics. (Of the 176 people who have earned certification in Critical Outcomes Report Analysis, only 2 list their occupation as benefits consultant.)  Thus, when the contracted consultants evaluate the wellness programs that they themselves advocated, it is far easier to “find” that the client was correct, even when there is overwhelming evidence to the contrary.  A finding that a client made a significant mistake could lead to the loss of the client and/or the sponsor’s loss of a job. This conflict of interest creates an insurmountable moral hazard.

To help resolve this dichotomy and understand why the market is “right,” consider the above-mentioned meta-reviews and analyses.  As appropriately noted by the authors of both meta-evaluations, the studies proclaiming wellness program success are subject to more than the usual set of limitations, because by definition there is no “intent to treat” control.   Almost invariably, all voluntary participants are in one group and those who didn’t want to participate comprise the control, or else the control is a passive matched control on paper, with no indication of whether the control group consists largely of motivated or unmotivated people.  In both cases, only motivated participants are included in the study group.   This limitation is especially problematic in wellness, because unlike a drug trial, motivation is by far the most important factor in success in wellness, which is basically an incentivized self-help program.

Marketing claims made by vendors

This critical flaw in the analytic framework produces the anomaly that virtually every desirable wellness outcome is found in voluntary participants only.  This aberration permeates not only the peer-reviewed literature, but also the claims made generally by wellness vendors and their corporate customers.  Some vendors even show results that specifically require two years of repeat participation (example available from authors), all but ensuring that only the volunteers most dedicated to improvement will persevere long enough to be counted.

While common sense would dictate that putting all the motivated people in one group would skew results in a program genre largely dependent on people’s motivation, it is also fortuitously possible to measure what happens to medical claims when participants and non-participants are placed in separate cohorts.  A keynote presentation at the 2011 Care Continuum Alliance (CCA) (available from authors) did exactly that.  Employees were separated in 2004 into two groups whose claims costs were almost identical.  In 2005, the non-participant group’s annual claims cost grew 9 percentage points faster than the participant cohort.  This was not a “program effect” because the program was not initiated until 2006.

A second classic mistake in outcomes measurement is to focus only on the downward risk migration of high-risk (or high- and medium-risk) people.  Most of the studies cited in the meta-analyses did that, and some websites even tout it. (Example available from authors.)  That this is simply claiming credit for regression to the mean should be self-evident.  If not, Dee Edington, the acknowledged leader in the field of risk migration, has clearly documented it.  Consider this hypothetical:  suppose smoking is the only risk factor in an organization, and that everyone smokes.  Further suppose that everyone also quits periodically, meaning that the average person smokes half the time.  Annual measurements, done as the industry does them, meaning on high-risk people only (in this case, current smokers), will therefore show 100 percent declines in smoking every year, even as the percentage of smokers remains unchanged.

Despite its obvious invalidity, this methodology is almost universal among wellness vendor marketing claims.   In at least one case it isn’t even a mistake:  One health plan’s program guarantees high-to-low risk migration, which its brochure illustrates using artwork that no knowledgeable observer could consider accidental.  Even though low-risk people constitute two-thirds of the population and high-risk people one-eighth, the bar chart shows all three segments (medium risk comprising the remainder) to be of equal size.  The health plan also shades the low-risk segment in a light color, so that the untrained eye focuses on the high-risk segment’s guaranteed decline and doesn’t pick up that upward risk migration of two-thirds of the population is not counted as an offset to the downward migration of the one-eighth. (Bar graph with carrier’s name removed available from authors.)

While one vendor acknowledges that it doesn’t save money in the short term through wellness (noting that others who purport to do so are simply wrong or dishonest) and some websites are silent on the ROI issue, many other websites show ROIs whose invalidity is indisputable.  One vendor claims $350/year in savings even when risk factors are not reduced (citation available from authors), a result possible only if one measures participants against non-participants, as shown at the aforementioned CCA presentation.   Another vendor claims near-term ROIs of 4.8:1 for screenings and 14.3:1 for HRAs (citation available from authors), even though screenings initially generate higher costs as people seek additional care, and HRAs are completed anonymously, making it impossible to calculate savings at all, let alone at that level of precision.

Two more vendors claim total savings of 17 percent to 22 percent 8 to 12 months after implementation.  (One company showed a 17.3 percent relative improvement in total costs after one year for participants, while another reports a case study showing 22 percent savings for the year for a program not implemented until May.  (Both citations available from authors.)  At least three companies have simply announced savings in excess of the mathematical limitation of 100 percent, as does the author of the meta-review himself.  (All three vendors have removed the claims on their website after the impossibility of reducing a number by more than 100 percent was brought to their attention.  Screen shots are available from the authors.)

A vendor proposing to establish a wellness program for an elite intellectual services consultancy promised to produce medical cost savings in excess of $5 million in a year.  The vendor never asked to see any demographic or claims information for the employees.  The promised savings are nearly equal to the company’s entire annual medical spend.

A large health system launched its wellness program with the proclamation that the inflation rate for its aggregate medical care spending would slow by one-third within one year. Many factors can slow the growth of aggregate medical care spending, but the most important ones are completely disconnected from wellness.  These include: a reduction in the number of employees; cost-shifting that inhibits (both appropriate and inappropriate) care; and the retirement, death, or transition to disability income, of older, sicker, and costlier employees.

Finally, one vendor almost explicitly acknowledges that its methodology is worthless.  Its White Paper says it “targets those individuals who have high health confidence and the highest motivation to change their health situation.”   This vendor then “provides incentives to participate.”  It then compares this cohort with non-participants.  In other words, they do precisely what a health services researcher would never do: They (1) find motivated volunteers, (2) bribe them to participate, and then (3) compare them to people so unmotivated they couldn’t even be paid to play along.  They admit:  “This approach is used because there is a need to compare the program participants to something [emphasis theirs] in order to judge whether there have been improvements.” (Citation available from authors.)  Thus, they believe that they must offer an obviously invalid ROI analysis rather than none at all. This is presumably because their customers demand to know: “What’s my ROI?”

Even the iconic Safeway story of achieving a zero medical cost trend through wellness — the inspiration for the wellness provisions in the Affordable Care Act -– turns out to be made up:  Safeway’s zero trend predated its wellness initiative by several years.  Further, after Safeway did institute wellness, its trend increased sharply.  The Safeway success story is further undercut because only 11,000 of the company’s roughly 200,000 employees participated once the program did roll out.

Outcomes Measurement

One could argue perhaps that this commentary cherry-picks the literature, presentations, and vendor websites, and that generally valid wellness program results abound.  We’ve already established that no company has ever controlled for motivation by separating volunteers into two groups to determine whether health-risk-sensitive medical events declined significantly more relative to other events in the motivated cohort allowed to participate in the program.  Hence, perhaps a less exacting test is needed.  Perhaps there has been a situation in which the participation proportion was high enough (and the total “n” was high enough) that when the company tracked spending on health-risk-sensitive medical events against all other spending population-wide, it found favorable separation.  This would “control” for the participation bias by including everybody and having the “control” be the targeted events vs. the non-targeted events.

This type of “plausibility test” is commonplace in disease management, especially among insurers, who recognize the validity of the approach.  The 10th Annual Report (2012) on the Disease Management and Wellness Industries offers an extensive list of insurers using such methodologies.  The Care Continuum Alliance Outcomes Guidelines describe the test as well.

However, not once — in any peer-reviewed journal, lay publication, or vendor website — has a study attempted to determine whether a population’s health-risk-sensitive medical events outperformed non-risk-sensitive medical events over time.  This is largely because no one has ever even made a list of health-risk-sensitive medical events analogous to the list of 16 prevention-sensitive medical conditions published by the Agency for Health Quality and Research (AHRQ) to determine the effectiveness of ambulatory care (Agency for Health Policy and Research, Prevention Quality Indicators Technical Specifications, Version 4.4, March 2012), or the list of ICD-9 “plausibility indicators” used to determine the effectiveness of the five common chronic conditions that comprise most disease management.  (Lewis A, Why Nobody Believes the Numbers, John Wiley & Sons, 2012, p. 48)

Consider the import of this research gap:  this industry sells $6 billion worth of services to American corporations and governments based on the premise that some medical events caused by high-risk behaviors will be avoided — and hence claims dollars saved — if people are incentivized to use the industry’s programs.  Yet this list of medical events does not exist, let alone used to determine if these events declined.

Let us assume that such a list existed.  How much in annual claims would it cover?   In other words, what is the theoretical maximum beneficial impact on claims of a wellness program that eliminates every health-risk-sensitive event; we will generously assume no offsetting increase in medical care expenditures caused by people seeking more care after HRAs and biometric screens, no cost of incentives to visit primary care physicians (Lewis A, Why Nobody Believes the Numbers, p. 132), and no costs for the screens themselves?

Since no such list exists, we can only speculate about the scope of impact.  However, according to the Healthcare Cost and Utilization Project (HCUP) database, neither the AHRQ prevention-sensitive indicators nor the disease management “plausibility indicators” account for more than 8 percent of primary-coded inpatient and emergency room (ER) events and (because inpatient/ER events account for about half of all spending) only about 4 percent of claims in the commercially insured population.

Therefore, for a company to save the 24.5 percent cited in one of the meta-analyses as the “average” savings, the risk-sensitive medical event list would have to be six times the magnitude of either of those established lists.  Further, in contrast to the small percentage inpatient reductions achieved over the years through ambulatory care-intensive projects or the controversial reductions consistently cited for disease management and related activities, the wellness program would have to achieve an unprecedented 100 percent reduction in these services.

Face Validity Questions

In addition to the specific issues raised by the studies in the meta-analyses and by wellness outcomes measurement and marketing in general, there is a “face validity” gap in macro wellness outcomes that advocates never address:

  • Why, if so many companies offer wellness and achieve such tremendous results, are we the fattest country on earth (and getting fatter), and why does our smoking rate decline at a slower pace now than twenty years ago when employer-sponsored smoking cessation programs were much rarer?
  • Why has no wellness company been able to find an independent validator who will vouch for company-wide healthcare cost savings and stand behind that validation?
  • Why does health care inflation continue to significantly exceed the rate of inflation in the broader economy, reflecting a persistent deterioration in population health status?

It is reasonable to conclude that the marketplace-vs.-pundits dichotomy posed at the beginning is indeed due to asymmetric information and/or moral hazard: the marketplace of 1,000 health insurers is right.  The “evidence” for the impact of wellness on health care expenses – whether journal articles, industry practices, or face validity — fails to pass even cursory reasonability tests.

Getting Back On Track  

The implications of this re-examination are significant.  At the federal level, the taxpayer-financed wellness incentives should be removed from the Affordable Care Act.   The Federal Employee Plan, which is poised to spend many millions on an employee wellness vendor, should reconsider that decision.

Private employers can also improve their chances of avoiding this fate by taking these three suggestions. First, HR departments need to reconfigure their benefits consulting relationships, since with few exceptions the latter have not provided critical thinking about wellness (and other “value-added” programs) on a par with that of insurers, who have universally shunned these programs for their fully insured members. (Most insurers will happily sell them to self-insured employers even so, because they clearly understand that “invalid” is not the same as “unprofitable,” especially when you do not bear risk for the outcomes and the customer’s consultants are demanding the service.)

Second, it is also time for employers to change the roles and responsibilities in health benefits administration. Human resources is fundamentally a process-oriented department being burdened with analytic responsibilities that their executives simply aren’t trained to do.

Third, start the reconsideration of wellness by looking at the big picture.  The signals your firm sends have greater impact when they relate cogently to a corporate culture that has openly embraced wellness as an organizational value and not just as a fill-in-the-blank HRA accompanied by a large financial incentive.

We agree that workplace wellness is a useful construct, providing morale and productivity benefits.  However, where large financial incentives are offered in the hopes that health expenses will decline, measurement of health expense reductions is a critical responsibility that is almost invariably lacking in today’s wellness marketplace.  If employers continue to rush to buy workplace wellness programs, they will soon find themselves doing what the health care system itself has done for so long, to its great detriment: consecrating standard practices without clear evidence drawn from sound analytics.  This will result in more money spent on services of uncertain value that produce invalid outcomes, and misallocate resources away from more valuable endeavors and discussions.