The Patient Protection and Affordable Care Act (ACA)  is designed to increase the number of U. S. citizens with access to health insurance.  Along with augmented access to health insurance and hopefully health care for millions of the currently uninsured, as well as some degree of overall health care cost containment,  the ultimate success of the new legislation will depend upon the quality of the delivered health care.

What will we Americans get in return for significantly changing the health care system and inevitably redistributing wealth to provide access to care to the currently uninsured?  In other words, along with more health care, will Americans get better health care?

Health care quality can be defined as net positive outcome (benefit – harm) per unit cost.  This is also termed value.  Some health care practices are of questionable or at least unproven value. Comparative effectiveness research (CER) is a potentially useful method for determining the value of health care.

CER is defined by the Institute of Medicine in its critical report published on June 30, 2009 as :

…the generation and synthesis of evidence that compares the benefits and harms of alternative methods to prevent, diagnose, treat, and monitor a clinical condition or to improve the delivery of care. The purpose of CER is to assist consumers, clinicians, purchasers and policy makers to make informed decisions that will improve health care at both the individual and population levels.

CER quantifies the benefits and risk of a diagnostic test, therapeutic manipulation, or a means of care delivery.  By including cost data in the CER analysis (cost-effectiveness),  the monetary value of effective health care can be estimated.  The cost of different therapies can be compared with their associated benefit or harm to assist physicians, patients and payers in decision making.

During the health care reform debate, opposition to including cost effectiveness analysis in CER was voiced, out of fear that it is a first step toward health care rationing.   But we already ration care, based not on any analysis of costs, benefits or harms, but rather on the ability of the patient (or his insurer) to pay for the care.  Besides, conducting CER without including cost-effectiveness analysis is neither logical, nor, in itself, cost-effective.

The Model for CER Utility-The Randomized Clinical Trial (RCT)

There is a well-established model of effective CER, and its results often do lead to changes in practice or reimbursement.  This is the randomized clinical trial or RCT.  RCTs are a subset of CER, but differ from more common CER in important ways.  While RCTs do involve comparative research, they measure efficacy under ideal, well-controlled clinical conditions instead of effectiveness in a “real world” environment.   RCTs usually ask a clinical question (e.g., is experimental therapy B superior to or safer than currently approved therapy A?) in a defined population of “human subjects” (consisting of patients or well volunteers) who fulfill strict eligibility criteria.  Often, only those human subjects who are free of co-morbid conditions (diabetes or heart disease for eligibility for cancer trails) or other potentially confounding characteristics (e.g., extreme youth or advanced age) are eligible to participate in an RCT.

Despite its greater relevance to everyday clinical practice than RCTs, CER may need to adopt some aspects of the RCT if CER results are to be accepted as a means by which to change physicians’ behaviors , patients’ acceptance or third-party reimbursement.

All research with living people or their tissues must be approved by an institutional review board (IRB) of at least 5 professionals and layman before any patient is accrued to the trial.  Among the issues an IRB considers when deciding whether or not to approve a proposed research plan (called a protocol) are: (1) what are the aims of the research? (2) how will successful attainment of those aims be determined? and (3) are the potential benefits of the successful attainment of the aims worth the potential risks incurred by the participating human subjects?

Most CER entails very little new or additional risk to human subjects (i.e., patients or normal volunteers) as it is often retrospective analyses of previously collected clinical data or observational studies.  However, the aim of CER, like that of an RCT, is to change health care for the better.  In the case of an RCT comparing treatment A to treatment B, there is a predetermined level of superiority one treatment arm must reach compared to the other to consider the trial positive.  It is called “statistical significance”.  It is usually a difference between the outcomes of A vs. B that has only a one in twenty chance of occurring when A and B in fact have equivalent efficacy.  This is the often cited statistical p-value of < 0.05.

Following the demonstration of statistical significance by an RCT or two, there remains the acceptance of the RCT finding as being of “clinical significance”.  Do the regulatory agencies and medical community at-large judge the finding as actionable due to greater therapeutic activity, greater safety, lower cost or greater convenience?  Unfortunately, these latter parameters are rarely examined in as formal a fashion as was the original identification of statistical significance of a new therapy.

Biostatistical methods can be applied to CER as they are to RCT when the research is planned and before it is initiated.  More specifically, Bayesian adaptive methods are more uniquely suited to CER.  With Bayesian methods, each human subject’s research results can influence and alter the subsequent trial design.  This may be better to employ in CER than traditional statistical methods which establish a rigid, inflexible trial design at a study’s onset that cannot be affected by what the research team learns as the trial progresses.

Bayesian analysis would allow critical CER to be done using the data from many institutions.  For example, CER examining the best way to treat a common malignancy could use data from hospitals that characterize patients by varying staging criteria (by different surgeons and pathologists), differences in criteria for accepting patients (e.g., ability to pay, eligibility for an open clinical trial), or the likely presence or absence of co-morbid conditions.  Bayesian designs can account for these differences more naturally than traditional statistical methods.  Bayesian designs do not eliminate bias, but accept bias as reality and try to correct for it.

Ineffective CER

In November of 2009 the U. S. Preventive Service Task Force (USPTF) released recommendations that women between the ages of 40 and 49 with no increased risk factors for breast cancer, do not need routine mammographic screening evaluations, and that women over 50 could reduce the frequency of their mammographic screening from annual mammograms to biennial exams.  A fire storm of protest over the USPTF report ensued.  Assuming the USPTF used excellent judgment and excellent CER data, why was this CER finding so controversial?

In a sense, physicians and their patients trusted their own anecdotal observations of individuals with breast cancer more than the collected CER data.  CER suggests what is likely to happen to a group of patients, with retrospective subset analysis possibly suggesting characteristics of a smaller and unique group of people most likely to benefit from or be harmed by a given treatment.  For the most part, doctors don’t treat groups of patients.  They treat one patient at a time.  Many women between the ages of 40 and 49 know of a young breast cancer patient who they believe has or might have benefited from early mammography.   When each woman goes to her doctor, she wants the test that she believes could save her life, and by the way, she wants her insurance company to pay for the test, preferably with no co-pay.   She wants this test regardless of how much it may cost the health care system to find one cancer among this age group (1 cancer death would be saved for every 1900 or so mammograms done in the 40 to 49 age group, according to the USPTF).  In addition, a large number of false positive mammograms would be avoided in the 40 to 49 year old age group using the new guidelines, reducing the costs in psychological stress as well as dollars that would be expended proving that the false “positive” mammograms do not represent a malignancy.

Clearly, in this case, the CER (or the communication of its results) was not good enough to change physicians’ behavior, insurers’ reimbursements or patients’ minds.  Maybe that’s because, unlike the case of an IRB-approved clinical trial, the level of significance the CER had to reach to indicate an advisable alteration of doctor, insurer, and patient behavior was not determined before the CER began.  As a result, doctor, insurer, and patient made their own decisions about the power of the study and its relevance to individual patient care delivery.

Furthermore, the findings were not reduced to a readily comprehensible form upon which opinions could be based—namely, if we do 1904 mammograms in women 40 to 49, we will prevent the death of one woman from breast cancer; the national cost would be about $3,000,000,000 for the mammograms alone if all eligible women (about 22 million) were screened (This does not include the cost of the follow-up procedures to identify which positive mammograms were false positives or the potential cost of over-treatment of women with limited or no cancer at all.) and about 12,000 deaths would be prevented.  Now, let’s discuss whether preventing those deaths is worth the expense.

What would it take to have that discussion in the U. S.?

Towards More Effective CER

There are some steps that could be taken to improve the value of CER. First, CER cannot be an afterthought or a hoped-for secondary benefit of old research data and medical records.  Medical science moves too quickly for us to depend upon such data for future decision-making.

Second, as  Garber and Meltzer point out, the “value of information” (VOI) is critical to deciding what CER to do.  “VOI is the difference between the value of the outcome of CER given the decision one would make in the absence of additional information and the value of the outcome of the decision that would be made as additional information became available as a result of research”.

In the USPTF breast cancer CER described above, the VOI was low because the system of health care deliverers, consumers and advocates were uninterested in the result.  A recent trial showing that a series of spiral CT screening exams in heavy smokers decreased the mortality rate of the participants from lung cancer may also be without VOI .  In this case, the cost of screening all smokers who might benefit in the U.S. is prohibitive (also measured in the billions of dollars although a full cost analysis of this trial is still pending).  If it proves that the finding is correct, but the cost is excessive, and that this could have been known before the trial started, even doing the trial was unwise and expensive.

In short, using Garber and Meltzer’s terminology, in both of these trials there was a low probability that medical practice was going to change.  If that is true, why bother doing the trial?  By contrast, if significant benefit in lives saved or costs not expended could have been identified by the research and accepted by the medical community, the study could have significant VOI.  But we should be able to calculate the needed VOI of any CER before the study starts.

Third, using the VOI concept allows the calculation of the relative benefit generated by any CER study vs. the loss of not doing it.  The opportunity cost of not doing basic research, clinical research, or even other CER that is competing for NIH or AHRQ  funds must be taken into account as the nation’s health care priorities are assessed and only limited resources are available for CER.

Fourth, we need to set specific thresholds that CER must attain that would warrant its results altering clinical decision making.  What if breast cancer treatment A is more effective than treatment B and will cost three times more than B and save 10,000 lives?  Should we find out if A is more effective than B?  Or, if test A is more accurate than test B in diagnosing Alzheimer’s Disease, but that information does not lead to a useful intervention or alteration in the disorder’s natural history, should we do this research?

CER is not like basic research or even like most traditional clinical trials.  Knowledge alone is not enough.  CER must have practical applications to the practice of medicine as applied to an individual patient, or indicate definitive improvements in resource allocation that will lead to better individual patient care.

Within ACA, a Patient-Centered Outcomes Research Institute (PCORI) has been established to oversee the application of federal expenditures for CER, establish CER priorities, and hopefully guide CER endeavors to assist providers, patients and payers alike in making critical health care decisions.

PCORI, above all else, would best service the discipline of CER by establishing parameters that measure significant improvements in outcomes, reductions in costs, or increases in value, and the degree to which a study must show this benefit to be considered positive and thus alter current behavior by caregivers, patients and payers.  And, of course, will the health system bear the cost of acquiring and implementing the CER findings, and from whose current resources will this money come?

If we cannot be sure a study is positive or are unwilling to pay for the consequences of the full implementation of its positive results, why spend the money to even start the research?