Blog Home


Collaborative Filtering: An Interim Approach To Identifying Clinical Doppelgängers

June 17th, 2013
by Eric Caplan and Norman Rosenthal

“The real challenge of human biology, beyond the task of finding out how genes orchestrate the construction and maintenance of the miraculous mechanism of our bodies, will lie ahead as we seek to explain how our minds have come to organize thoughts sufficiently well to investigate our own existence.”

The initial enthusiasm following the mapping of the human genome has given way to a more circumspect outlook.  With the exception of a small number of promising interventions, advances in genomic science have yet to yield a critical mass of therapeutic breakthroughs – thus forestalling the birth of the era of precision medicine (PM).

While a comprehensive genomic understanding of disease and concomitant molecular-based patient taxonomy would doubtless hasten the arrival of PM, a significantly less costly alternative offers a promising interim approach.  A methodology known as collaborative filtering (CF) which has already achieved widespread use in advertising and marketing, has the potential to offer powerful insights not only to advertisers and others desiring to influence purchasing behavior but also to physicians, allied health care professional, patients, and their families by offering personalized advice and recommendations regarding health and disease.

CF relies directly on aggregated subject/user behavior to reveal complex and unexpected patterns that would otherwise be difficult to capture using known data attributes.  Recommendations generated from analyses of these patterns have demonstrated significantly greater reliability than those using more traditional demographic categories. The core idea behind applying CF to clinical decision-making is to make decisions about a patient based on historical data derived from multiple “similar” patients presenting multiple “similar” cases.  As Victor Streecher explains, “collaborative filtering in the health area could match the coping strategies, medical decisions, and preferences of similar others with specific needs and interests of the user.”

The Lay Of The Land

The practice of medicine is in the midst of significant transformation.  Much of this transformation has been chronicled in the scientific and mainstream press. The gradual movement away from common cures for common ailments to more targeted approaches for specific diseases is likely to increase in vigor across the next several decades.  While part of this shift is attributable to recent advances in genomics and proteomics, no less significant has been a host of factors that have little to do with science or medicine.  Among these, one of the more prominent is the emergence of a new-consumerism driven by rapidly evolving social media and data-mining capabilities.

Understanding the molecular basis of disease is considered a critical perquisite for the discovery, development, and delivery of patient-specific therapies.  Missing from this framework, however, is an appreciation of the need to resolve the equally complex behavioral challenge of ensuring medication adherence.  Poor adherence to medication regimens remains common and contributes to substantial worsening of disease, death, and increased health care costs.  Absent significant improvement in adherence rates, patient-specific therapies have little likelihood of achieving their full potential.  To address this challenge, we need new methods and new approaches.

As major Internet players like Google, Facebook, Amazon, Netflix, Microsoft, and others hone their capacity to harness the petabytes of data they’ve amassed across the past decade or more, traditional taxonomies of patient and consumers will be supplanted by more analytically rigorous and empirically validated predictors of individual and group behavior as well as disease and response to treatment.  The implications of this shift not only on the practice and delivery of medical services but also pharmaceutical adherence and health outcomes are likely to be profound.  Moreover, the potential rewards and benefits to first movers in this arena are likely to be substantial – provided, of course, they know what questions to ask and what analyses to perform.

The Opportunity

Internet-fueled social media sites have turned consumer profiling on its heads. Whereas race, gender, geographic, ethnic, age, and even income-related categories have traditionally been employed not only to predict but also to influence consumer behavior, advertisers and consumer-goods companies are increasingly turning to that behavior itself to target their wares.  As the Wall Street Journal reports, “Data-gathering firms and technology companies are aggressively matching people’s TV-viewing behavior with other personal data—in some cases, prescription-drug records obtained from insurers—and using it to help advertisers buy ads targeted to shows watched by certain kinds of people.”   At the core of this matching enterprise is collaborative filtering.

Whereas the impact of collaborative filtering on marketing and advertising has been profound, its applicability to the health care arena has been sluggish at best.  Indeed, a recent review suggests that little work has been done in this field.  The slow adoption of collaborative filtering by health researchers is attributable to a number of factors.  Among these, two of the more significant are the heightened efforts to promote the adoption of electronic medical records (EMRs) and the increasing focus of bioinformatics on harnessing the petabytes of genomic data that continue to grow at an exponential rate.  Taken together, these two critical health information technology (HIT) initiatives have crowded out resources required to support the application of collaborative filtering techniques to enhancing clinical decision-making and improving patient outcomes.

Data mining and health care researchData mining entails the use of specific algorithms to extract interesting (non-trivial, implicit, previously unknown and potentially useful) patterns from data.  The practice is hardly new when it comes to health care and pharmaceutical research.  Both industry and regulatory authorities have long relied upon sophisticated statistical methods when analyzing data generated from clinical trials to determine the safety and efficacy of drugs and biologics.  What is new is the size and scope of the practice.  Whereas a “large” phase III clinical trial might include thousands of subjects, private insurance claims databases may contain upwards of several million records; the Medicare claims database includes a significantly larger number.

The sheer volume of data contained in these repositories when combined with increasingly sophisticated bioinformatics applications has the potential to yield significant benefits – not only to clinical practice but also to containing mounting health care costs.  A number of challenges remain, however.  The vast majority of existing health data repositories were not designed to facilitate either clinical or comparative effective research.  At best, insurance claims databases, safety surveillance databases, disease registries, and electronic medical records (EMR) remain imperfect surrogates for systems explicitly designed to facilitate health care and clinical research.

Moreover the operational complexities of dealing with such enormous databases are prodigious, as are the ethical, legal and social issues (ELSI) germane to this type of research.  That these vast repositories contain enormous data to be mined cannot be denied.  But how best to translate these findings to the realm of clinical practice remains an open question.

Translation challenge.  The question of translation has become even more pronounced with the emerging genomic paradigm.  Just a decade ago, Craig Venter boasted: “Genomics is moving so fast it is possible to think that in perhaps fifteen years you will be able to walk into a doctor’s office and have your own genome interpreted. It could be stored in a smart card” that could be kept in your wallet.  Venter was hardly alone in his enthusiasm. “Advances in human genome research,” Geoffrey Ginsburg declared, “are opening the door to a new paradigm for practicing medicine that promises to transform healthcare … The molecular equivalent of the Pap smear, mammogram or blood-pressure measurement will define more precisely the predilection for disease development.”

Added NIH Director Francis Collins, “Our increased understanding of the interactions between the entire genome and nongenomic factors that result in health and disease is paving the way for an era of “genomic medicine,” in which new diagnostic and therapeutic approaches to common multifactorial conditions are emerging.”   The underlying hypothesis behind this line of research is that once we catalogue all disease-related mutations, we will be able to predict the susceptibility of each individual to future diseases using various molecular biomarkers, ushering us into an era of predictive medicine.

What these and other early genomic enthusiasts held in common was an abiding faith that in a relatively short time – years not decades – the prevailing disease nosology would be supplanted by a more precise molecular based taxonomy.  Shortly thereafter therapies that had previously been directed at disease symptomatology would be redirected at their root cause.

A decade later, the initial promise that greeted the mapping of the human genome has given way to a more circumspect outlook. Identifying the molecular basis of disease has proved considerably more complex. As former head of the Office of Clinical Pharmacology at the FDA Lawrence Lesko explains: “Personalized medicine is a paradigm that exists more in conceptual terms than in reality, with only a few marketed drug–test companion products and not very many actual clinical practices set up to personalize medicine in the way that supporters have intended.”

Even if it were possible to store one’s entire genome on a smart card or similar device as Venter had earlier speculated, Kathy Hudson declares: “Try to present that flash drive to your personal physician, and you’re most likely to be greeted with a blank stare. We understand very little of what this sequence means for health and disease.   The vast majority of genomic data remain medically inactionable.”

Venter himself recently acknowledged his “[surprise] at how slowly science has progressed in interpreting the genome” and now estimates” it will be at least another ‘10 to 20 years’ before it becomes cost-effective for patients to have their genomes sequenced.”  As Eric Topol, one of the leading advocates of genomic science, concedes: “It’s not immediately clear what the clinical importance of any of this information is.”

The bacteriology precedent.  In many respects, the failure of genomics to yield significant clinical interventions as rapidly as its initial enthusiasts had anticipated should come as little surprise.  A similar scenario transpired more than a century ago.  Translating the new science of bacteriology into viable clinical therapies required years of painstaking research.

Prior to Robert Koch’s 1882 discovery of the tubercular bacillus, physicians regarded fever itself as a distinct disease entity.  The proliferation of recently discovered microbial agents gave rise to a radical re-conception of prevailing nosology. As a result many long-established clinical entities were cast aside.  In their place emerged a host of new, biologically validated explanations of causation.  What many who recall this revolutionary period in the history of medicine neglect to consider, however, is that more than half a century passed between Koch’s discovery of the bacillus and the advent of an effective antibacterial therapy, penicillin.  Even following Alexander Fleming’s serendipitous discovery of penicillin in 1928, another seventeen years elapsed before the drug could be mass-produced and distributed.

Excepting a few early victories, translating genomic science and molecular diagnoses into effective molecular therapies is likely to prove equally daunting.  As Garrison and Austin explain, “Progress toward personalized medicine in the five years following the sequencing of the human genome has been slower than many expected … the initial sequencing of the human genome, and the number of new pharmacogenetics applications—that is, those whose drug response varies across individuals because of genetic differences—can be counted on two hands.

“Our current translation landscape in genomic medicine,” adds Muin Khoury, “has major gaps.”   As New York Times science reporter Nicolas Wade recently reported, “The primary goal of the $3 billion Human Genome Project — to ferret out the genetic roots of common diseases like cancer and Alzheimer’s and then generate treatments — remains largely elusive. Indeed, after 10 years of effort, geneticists are almost back to square one in knowing where to look for the roots of common disease.”

Despite tremendous excitement about the potential value of molecular biomarkers such as SNPs and microarray expression profiles as genetic disease signatures on which to base improved diagnosis, therapy, and prevention, this potential has largely gone unfulfilled. While an increasing number of candidate biomarkers are being identified, development of these biomarkers into diagnostic tests with clinical utility has proceeded at a slow pace.  As NCI Director Harold Varmus has noted:  “Genomics is a way to do science, not medicine.”

E unibus pluram.  Just as it had in the wake of the bacteriological revolution, once again, the number of distinct disease entities is likely to proliferate.  As American Society of Clinical Oncology President George Sledge recently explained, “Cancer is like cable television. Thirty years ago you had three channels. Now you have 500.”  The same holds for a host of other currently conceived clinical disease entities.  Topol explains: “Someday there may be at least 10 subtypes of diabetes mellitus based on specific individual biological variation.”

The clinical implications of a more expansive and precise molecular based disease nosology remain uncertain. While few doubt that genomic analysis of diseases with homogeneous clinical phenotypes will eventually unveil distinct molecular entities that require different treatment strategies for optimal outcomes, most concede that we are long way from that day.  As Lesko states, the question is “where we are on the continuum between traditional medicine and personalized medicine, and what is the likely trajectory for the future development of personalized medicine?”

The Trajectory For The Future Development Of Precision Medicine

In part, the answer to Lesko’s question depends on the definition of personalized medicine.  If we embrace Ginsberg definition’s –  i.e., “Personalized medicine is rooted in the hypothesis that diseases are heterogeneous, from their causes to rates of progression to their response to drugs. Each person’s disease might be unique and therefore that person needs to be treated as an individual” — we run the risk of embracing both diagnostic and therapeutic nihilism.  A more practical, albeit less theoretically-compelling definition is offered by the President’s Council of Advisors on Science and Technology:

Personalized medicine refers to the tailoring of medical treatment to the individual characteristics of each patient. It does not literally mean the creation of drugs or medical devices that are unique to a patient but rather the ability to classify individuals into subpopulations that differ in their susceptibility to a particular disease or their response to a specific treatment. Preventive or therapeutic interventions can then be concentrated on those who will benefit, sparing expense and side effects for those who will not.

At the core of this definition is the “the ability to classify individuals into subpopulations that differ in their susceptibility to a particular disease or their response to a specific treatment.”  In practice what this has typically implied is consideration of race, ethnic, gender, and class based attributes.  Given past inequities in both medical research and clinical practice, this more inclusive approach represents a welcome corrective to a considerably more restrictive taxonomy comprising largely white men of European origin.  But at best, a phenotypic-based taxonomy is only a surrogate for a more scientifically robust patient stratification schema.

While myriad efforts are underway to address this challenge, as noted above, a molecular based disease taxonomy and concomitant patient stratification approach may be decades in the making.  What’s needed is an interim approach – one that avails itself of existing disease taxonomies while simultaneously offering a more clinically pragmatic and therapeutically relevant approach to identifying subpopulations.  As Ralph Synderman explains, “To address our current dilemma, we need to create and validate fundamentally new models of prospective health care that determine the risk for individuals to develop specific diseases, detect the disease’s earliest onset, and prevent or intervene early enough to provide maximum benefit.”   And while it would be extraordinarily beneficial, we don’t necessarily need genomics to do this.

Collaborative Filtering And Clinical Decision Making

Increasingly sophisticated data mining applications offer cost-effective, scientifically robust alternatives to identifying clinically meaningful subpopulations absent consideration of genotypic attributes.  Two recent examples suggest how CF can be used to enhance clinical decision-making and improve risk stratification among sub-populations.

Collaborative Assessment and Recommendation Engine (CARE).  A team of computer science and medical researchers from Notre Dame, Harvard Medical School, and Northeastern University has developed an intriguing model for practical disease prediction that employs a set of collaborative filtering based algorithms.  Their model, CARE, Collaborative Assessment and Recommendation Engine, uses collaborative filtering to predict patients’ greatest disease risks based on their own medical history and those of similar patients.  As Darcy Davis and coauthors explain:

The technique is based on the intuitive assumption that people will enjoy the same items as their similar peers, or more specifically, having some common preferences is a strong predictor of additional common preferences. Predictions are based on datasets consisting of many user profiles, each containing information about the individual user’s preferences. This has made a significant impact on marketing strategies. We draw an analogy between marketing and medical prediction. Each user is a patient whose profile is a vector of diagnosed diseases. Using collaborative filtering, we can generate predictions on other diseases based on a set of other similar patients.

While admittedly speculative, what CARE suggests is that CF can provide significant advancement toward personalized medicine.

Collaborative Filtering and Risk Stratification.  CARE is hardly the only example showing how CF may contribute to improving clinical decision-making.  Two University of Michigan computer scientists, Shahzaib Hassan and Zeeshan Syed, recently demonstrated how CF could achieve higher predictive accuracy for both sudden cardiac death and recurrent myocardial infarction than popular classification approaches relying on logistic regression and support vector machines.  By using CF to compare new patients to historical records and by comparing known patient characteristics and outcomes to future events of interest with a database containing 4500 patients admitted with acute coronary syndrome (ACS) patients, Hassan and Syed were better able to match patients to effective treatments regimes than existing risk stratification models that rely on conventional medical knowledge.

Summing Up

The precision medicine train is approaching the station.  But its arrival is not in a form that many of its early adherents would recognize.  Rather than following the path of genomic science and novel molecular based taxonomies, precision medicine is likely to make its initial forays into clinical medicine via an alternative route — one that avails itself of increasingly sophisticated algorithms and data mining capabilities that rely on subject behaviors and preferences rather than genes and molecules.


Email This Post Email This Post Print This Post Print This Post

Don't miss the insightful policy recommendations and thought-provoking research findings published in Health Affairs.

Leave a Reply

Comment moderation is in use. Please do not submit your comment twice -- it will appear shortly.

Authors: Click here to submit a post.