We thank Health Affairs for the opportunity to respond to Professor Ron Goetzel’s comments on our recent Health Affairs article, “Wellness Incentives in the Workplace:  Cost Savings through Cost Shifting to Unhealthy Workers.”  In many respects our article was quite limited in scope.  We started by noting that companies are increasingly adopting wellness programs based on the idea that, with the help of financial incentives, employees will improve their health and employers will save money.  We set out to explore the assumptions underlying this idea and found scarce high-quality evidence on the subject of workplace wellness incentives.  What evidence we found offers, at best, limited support to justify these assumptions.

We hope that others will conduct the much-needed research about whether financial incentives can change behavior, improve health, and control spending.  We also hope that the body of causal research will be expanded to a wider variety of populations, such as the elderly, and other related issues, such as the effects on incentives of absenteeism, productivity, and long-term health, or to programs that do not involve financial incentives or do not claim to control costs.  These other effects and other types of programs are worthy of consideration, but we have not considered them in our article.

The Ethics Of Incentive Programs

Unfortunately, Professor Goetzel misunderstood the scope of our research.  Contrary to Professor Goetzel’s assertions, we did not suggest that people should avoid making choices consistent with their health (and other goals), or that employers should not act in the best of interest of employees, employees’ health, or society at large.  Still, we are skeptical that wellness programs of the sort encouraged by the Affordable Care Act (ACA) create, as Professor Goetzel suggests, a “win—win—win” situation:  good for all workers, good for society, and consistent with the financial interests of employers, firms, and their owners.  Rather remarkably, despite the potential for workplace wellness to become a large “social experiment”, we found scant reliable evidence that implementing wellness programs can easily save costs through health improvements without being discriminatory.

Although we did not directly address the myriad ethical issues related to workplace wellness programs, our study’s conclusions pointed to ethical challenges raised by the programs advanced by the ACA.  Professor Goetzel suggests that our conclusions are unethical because they question the validity of incentives in driving population health.  However, if the evidence we reviewed — the best, most carefully constructed evidence available — is correct, we worry that companies are making money off programs not by improving health but by “cost shifting.”

We think it is unethical for the most vulnerable employees — those from lower socioeconomic strata with the most health risks — to bear greater costs that, in effect, subsidize their colleagues with fewer health risks.  For many reasons, such as concerns about the appropriate role of the employer or the fiduciary duties of doctors, some observers would challenge the ethical basis for workplace wellness even if the programs led to improved health among workers of lower socioeconomic status.  For example, it is more difficult for some people than for others to maintain an ideal weight because of genetic or other immutable reasons.  We leave it to others to consider whether wellness programs can be ethical given these kinds of concerns.

We need not delve deeply into these ethical questions, however, to identify some troubling issues regarding workplace wellness.  We are concerned that, if the evidence we surveyed is correct, workers with identified health risks (who tend to be of relatively low socioeconomic status) are subsidizing workers without identified health risks (who tend to be of relatively high socioeconomic status) without justification.  In our review of the research on wellness, we found quite mixed evidence that people of working age with identified health risks such as those targeted in wellness programs even spent more on health care than others, or that financial incentives led to improved health among the most commonly targeted health conditions.  Consequently, if companies are reaping even the lowest publicized returns on investments for these programs, then the returns may well be generated on the backs of employees with health risks in exchange for little or no benefit to them.

For working Americans this kind of cost shifting would represent a big financial hit.  Starting in January 2014, the Affordable Care Act allows employers to adjust insurance premiums based on participation or achievement up to 30 percent of an employee’s health insurance premium — including both the part the employer pays and the part the employee pays.  The average cost of an employer health plan for a family is just under $16,000, which translates into $4,800 at risk.  With median annual income of about $50,500 per year, the 30 percent figure accounts for almost ten percent of median annual income.  Proposed regulations allow for putting up to 50 percent at risk for programs limiting tobacco use.  Given all the strains on working families such as rising insurance premiums, rising worker contributions to those premiums, and falling real wages, one might reasonably worry that these programs put workers at grave financial risk.

In fact, the ethical status of these programs may be even worse than the regressive financial transfers would suggest.  Wellness programs may jeopardize the health of participants.  Many people agree with Professor Goetzel who, in his response to our article, wrote, “I believe that common, modifiable health risks are damaging to people’s health.  I cannot imagine an argument against that statement.”  Like Professor Goetzel’s belief, workplace wellness programs are based on faith that the control of modifiable health risks with recommendations that everyone knows to be true will help employees.  But sometimes the conventional wisdom is wrong.

Consider just two of the most recent controversies:

1) Sodium.  The American Heart Association and public health officials have long advocated diets that limit sodium intake to 1500mg of sodium per day.   In a new report from the CDC based on evidence available since 2003, however, there was no benefit to lowering intake below 2300mg of sodium and substantial evidence of potential harm (heart attack and increased hazard of death) for some of the most salient subpopulations.  In one randomized controlled trial the CDC studied, patients were randomized to “consume either 2,760 or 1,840 milligrams of sodium a day, but otherwise to consume the same diet.  Those consuming the lower level of sodium had more than three times the number of hospital readmissions — 30 as compared with 9 in the higher-salt group — and more than twice as many deaths — 15 as compared with 6 in the higher-salt group patients.”

2) Glucose.  Conventional health wisdom also holds that glucose-lowering medication is an appropriate treatment for patients with Type 2 diabetes.  But on more than one occasion, rigorous randomized controlled trials have had to be stopped prematurely because patients on stricter glycemic control died more quickly and suffered more severe cardiovascular conditions than those on less strict diets.  Although such evidence has been available for years (see our original article), only in the 2012 Position Statement of the European Association for the Study of Diabetes (EASD) and the American Diabetes Association (ADA) have these authorities tacitly admitted to a problem regarding “mounting concerns about their potential adverse effects and new uncertainties regarding the benefits of intensive glycemic control.”

As some diabetes experts have noted, “The most entrenched conflict of interest in medicine is a disinclination to reverse a previous opinion.  This apart, diabetes specialists and the professional societies that represent them might feel impelled to defend a specialty under threat in a turf war with cardiologists and general practitioners.”  While the American Diabetes Association has continued its general advocacy of glycemic control, it recently stressed that, “ultimately, Type 2 diabetes is a disease that is heterogeneous in both pathogenesis and clinical manifestations…”

One lesson from these two examples is that patients need to be treated as individuals.  When employers implement healthy eating initiatives or risk-lowering suggestions the typical treatment may not be the right treatment for everyone, particularly those with the targeted risk factors.

Ultimately, we do not take a stand on whether workplace wellness programs are ethical.  But we do believe that the concerns they raise are serious enough to merit considered attention and evidence-based decision-making.  We do not think Professor Goetzel should be “troubled by Horwitz et al.’s argument challenging worksite programs or prevention programs in general,” or label our article as “radical.”  After all, our paper merely reports what we found in a survey of over 2,000 published articles.  We stand by our recommendation that caution is necessary before programs that may not save money as intended, that change the nature of the employee-employer relationship, and that may peddle unproven, one-size-fits-all medical care are embraced as a panacea for health spending.

Reliable Evidence

Perhaps our biggest source of disagreement is with Professor Goetzel is regarding what research offers reliable evidence for causal claims.  We believe that given the financial and health risks we outline above, the burden for acquiring reliable evidence should rest with those who advocate widespread adoption of wellness programs based on financial incentives, and we believe that burden has not been met.

In our review we tried to be careful and inclusive in identifying quality evidence in our review.  But readers need not trust us.  First, we published a lengthy on-line appendix regarding our procedures so that other scholars can make their own judgments.  Second, there are several widely accepted definitions of quality evidence that elucidate what kind of evidence can be trusted in this sphere.  Consider, for example, the “levels of evidence” produced by the Oxford Centre for Evidence-based Medicine (As is common across most of the evidence hierarchies, a systematic review of a large number of randomized controlled trials is the most trustworthy evidence, and “Expert opinion without explicit critical appraisal, or based on physiology, bench research or `first principles’” is the least reliable evidence.)  Or, if you prefer, consider a similar system by the US Preventive Services Task Force (USPSTF).  By either set of standards, virtually every individual study cited by Professor Goetzel in his reply employs a research design that is of lower quality than the research designs of the studies we cited.

Professor Goetzel does not address our evidence but, instead, cites a single counterexample — the Diabetes Prevention Program (DPP) — which we excluded from our discussion because it did not address one of the risk factors we identify as commonly included in workplace wellness, but instead on medical management of a particular disease.  Nonetheless, although a more detailed discussion of this trial is warranted, we are happy to acknowledge that some interventions like that in the DPP can work.  It would be distressing if they did not.  But they do raise the problem that they are quite labor intensive and unlikely to save money if adopted wholesale; indeed, the DPP required, “16 one to one sessions delivered by case managers to achieve target weight reduction and exercise levels.  Although lifestyle interventions produce successful results in research settings, they are difficult to replicate even in well funded healthcare systems.”

Moreover, even if we could somehow make interventions like the DPP costless to the firm, the DPP itself shows why they would not work in a typical wellness program.  After ten years, the effect of DPP on cardiovascular risk factors was zero: the cardiovascular risk factors were the same in the lifestyle intervention, metformin, and placebo groups.  If an employee were to adopt the DPP program, it wouldn’t have yielded the usual biomarkers used to measure “success.”

In addition, we hope readers will keep in mind a summary of an extensive study of medical experts of the National Academy of Sciences, which was far less sanguine than Professor Goetzel about the difficulties faced by individuals attempting to meet a biometric target for weight:

An obese individual faces a continuous, lifelong struggle with no expectation that the struggle required will diminish with time. For most people, even a brief abatement in effort will be met with a significant setback in control. Studies in controlled settings show that individuals who complete weight-loss programs lose approximately 10 percent of their body weight, but gain two-thirds of it back within 1 year and almost all of it back within 5 years.

‘Smoking Is Not Good For You” And The Problem Of Cause v. Correlation

Professor Goetzel warns readers that our method of analysis would lead them to “the nonsensical position of still questioning whether cigarette smoking is harmful” by suggesting that we recommend “hold[ing] ourselves hostage to randomized controlled trials.”  But this simply is not true.  First, there is randomized controlled trial evidence that is consistent with the view that smoking causes lung cancer.  Second, there are ways to learn about causes and not mere correlations that are not randomized controlled trials but far more reliable than most of the evidence Professor Goetzel cites.  Natural experiments and studies based on observational data can be quite reliable.  Not only do we rely on them in our study, but we regularly teach about and publish our econometric-based studies.  In fact, one of us (DiNardo) has spent a large part of his professional career teaching and writing specifically about how to learn about causes in the absence of a randomized controlled trial.  See here for a recent, albeit technical, example.

Moreover, smoking and lung cancer offers a virtually sui generis example in which a purely epidemiological approach has established a causal link between behavior and health.  The eminent statistician David Freedman, discussing the near total failure of epidemiological approaches to establish what actions are true “causes” and what actions merely reflect “spurious” or “non-causal correlations,” describes the case of causal effect of smoking on lung cancer as “one of the great triumphs of the epidemiologic method.”  Nothing remotely similar can be said for other causes of ill-health such as obesity.

We believe part of the confusion over our claims and Professor Goetzel’s objections rest of the use of terminology.  In statistics and epidemiology, the words “health risk factors” and “modifiable health factors” typically have a meaning more narrow than adopted in informal discourse and are the subject of much confusion.  A “health risk factor” is anything that is correlated with some aspect of health.  It is not necessarily a causal factor.  The subset of risk factors which can be changed by deliberate policies are “modifiable risk factors.”  (Age, too, is a “risk factor for an increased chance of death”, but as no one has yet learned how to stop time, it is not a “modifiable risk factor”.  Some refer to this type of risk factor as a “risk marker” instead.)

A modifiable risk factor may or not cause the outcome of interest.  The practice of wearing eyeglasses with corrective lenses is no doubt a risk factor for poor unaided eyesight.  Although the practice is “modifiable” and hence a “modifiable risk factor,” it is not a causal factor; hence (and fortunately), we have not seen widespread calls for persons to stop wearing eyeglasses!

The distinction is generally not so frivolous nor easy to detect and can have significant consequences.  A classic case involved the debate over what causes epidemics (such as cholera) in the late 19th century.  When the city of Hamburg faced a cholera epidemic, it turned to the famous Bavarian hygienist and chemist German Max Josef von Pettenkofer, who argued that the cause of cholera was miasma, a mysterious emanation created by rotting organic matter.  The theory was supported by the well-known fact that living in a marshy area was a (modifiable) risk factor for malaria.  At his direction, Hamburg went about digging up the carcasses of recently buried dead animals (such as pigs), hoping to stop the formation of miasma and hence slow down the epidemic.  This program was costly and failed miserably.

In London, on the other hand, the physician John Snow, observing that some sources of a household’s water appeared to be a modifiable risk factor for cholera, conducted a study to assess whether the risk factor was a causal factor.  By contrast with Pettenkofer, Snow’s study was not merely an epidemiological analysis of the sort frequently cited by Goetzel.   Employing a great deal of “shoe leather,” Snow collected information on cholera deaths by household and what firm provided a household’s water.  Snow took advantage of a “natural experiment” to assess whether cholera epidemics were caused by waterborne factors. Due to very (seemingly minor) specific peculiarities of the history of the private provision of water, there was little if any association between income and the firm providing the water.  A rich family living next door to a poor family might have same water supplier, but might not.

As it turned out, the water supplied from some firms was contaminated with cholera-causing bacteria, while the water from other suppliers was not.  It was if “nature” had randomly assigned contaminated water to some families and not to others.  Snow’s was not an “epidemiological” study. What made his evidence persuasive (and why it was ultimately proven right) was not the mere existence of a modifiable risk factor. Rather, careful study of the minor nuances of water supply in London and hard work allowed Snow to exploit the fact that “nature” had provided him something akin to a randomized control trial of a specific policy/treatment.

This case, like the claims of some workplace wellness advocates, highlights the risk of focusing on “modifiable risk factors” and evading discussion of specific treatments.  Hunger strikes and bariatric surgery can both produce weight loss, but the effect of the weight loss on health and health costs might be expected to be different.  So what “treatment” should employers impose for the overweight or the obese?  The long historical record of unsuccessful “diets” (as described for example in Gina Kolata, Rethinking Thin: The New Science of Weight Loss – and the Myths and Realities of Dieting, especially chapters 1 and 2) and “multifactorial approaches” to health like the Multiple Risk Factor Intervention Trial (MRFIT) suggest that current exhortations to individuals to lose weight may do little to improve health, and in some cases may actually make matters worse.  (See J. Eric Oliver, Fat Politics: The Real Story Behind America’s Epidemic, especially pages 10ff.)

Fortunately for humanity, Snow refused to be content with “overwhelming” evidence that proximity to rotting organic matter was a “modifiable risk factor” for cholera.  One can only hope that advocates of wellness programs will follow his path and not the path of von Pettenkoffer.  Maybe Professor Goetzel’s conclusions are right.  But let’s hope that advocates of workplace wellness programs based on financial incentives provide more solid evidence for their claims soon.  We need it.