Editor’s note: For more on the topic of big data, check out the July issue of Health Affairs

Health care research is on the cusp of an era of “Big Data” — one that promises to transform the way in which we understand and practice medicine.

The Big Data paradigm has developed from two different points of origin. First, significant efforts to digitize and synthesize existing data sources (e.g., electronic health records) have been driven by policy and practice economics. Second, a wide range of novel ways to capture both clinical and biological data points (e.g., wearable health devices, genomics) have emerged.

The era of Big Data holds great possibility to improve our ability to predict which health care interventions are most effective, for which patients, and at what cost.

Data, Data Everywhere: Big Data, Real World Data, and Clinical Trial Data

The terminology used to describe data is often inconsistent and can be confusing. For the purpose of this discussion, we will define Big Data, Real World Data (RWD), and Clinical Trial Data (CTD).

Big Data is a blanket term for any collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.

RWD is any data that is not captured within the context of a clinical trial and is not explicitly intended for research purposes. RWD can be “Big Data” when vast in quantity and multiple sources are combined.

In contrast, CTD are often collected for the specific purpose of obtaining regulatory approval for a new medicine, or a new indication for a medicine. Clinical trials are rigorously designed, often focused on a highly specific patient population, take significant time to complete, and can be very expensive to complete.

There are fewer regulations around how RWD analyses are designed, executed, and disseminated. They also typically cost less and take less time to complete than clinical trials. As a result, there can be a wide amount of variance in methodological rigor. Both CTD and RWD can be useful in informing clinical decision-making.

Big Data, Real World Data, and Future Drug Development

In the pharmaceutical industry, Big Data is being used to identify new disease pathways through the combination of genomic and RWD data. Big Data holds the promise of making the drug discovery and development process more efficient.

Big Data, including RWD, can improve both the selection of target molecules and patients for clinical trials. RWD, whether or not technically Big Data, will have an impact on how we monitor the safety and effectiveness of clinical and real-world treatment by providing more rapid feedback that allows providers to individualize treatment.

Real World Data and Health Benefit Design

At the same time, RWD could transform health benefit designs by shifting from a one-size-fits-all approach to one that enables the right beneficiaries to receive the right medicine while at the same time avoiding unnecessary treatment for those who would not benefit to as great a degree.

In particular, specialty tiers — which include newer, more expensive medicines that impose high cost sharing for all patients — could be better managed in this way. By influencing the entry of new treatments to the market, RWD will both enhance the efficiency of health care spending and ensure access to the right patients.

Getting to the Era of Real World Data Becoming Big Data

Progress on these endeavors will depend on the availability of integrated real-world data sets to the appropriate stakeholders. Barriers to ready access include the need to maintain confidentiality of patient information and issues of data ownership. The former has gotten much more attention than the latter.

Patient confidentiality. Combining data across sites of care and broadening access has potential value for health research, but it poses a risk to privacy. Even with the protections provided through the Health Insurance Portability and Accountability Act (HIPAA), there is still the risk of re-identification, particularly if data sets are merged with other information such as voter registration.

Risk may be reduced by removing personal identifiers, aggregating small samples as required through the Safe Harbor and Limited Dataset provisions of HIPAA, and carefully considering what data are available through other public use datasets prior to releasing health care data.

Despite this risk, everywhere we look — federal and state governments, private industry, health care entities — data is becoming more and more available.

EHR Implementation in Health Care Settings. Patients have unprecedented access to their electronic health data, and that access is only increasing.

The Health Information Technology for Economic and Clinical Health Act (HITECH), part of the American Recovery and Reinvestment Act of 2009 (ARRA), has provided funds to support the implementation of EHRs. It also augments access to data for patients, in addition to requiring providers to demonstrate achievement of standards (e.g. Meaningful Use) to improve the delivery of care, patient empowerment, and information sharing.

The Department of Veterans Affairs’ “Data Liberation.” Meaningful use under HITECH also promotes the electronic access by patients to their health information, and encourages the development of tools to make use of that data. Some have characterized this as “data liberation.”

For example, in 2010, the Departments of Veterans Affairs and Health and Human Services both launched the Blue Button as a tool for Veterans and Medicare beneficiaries to have electronic access to their own clinical data.

By May 2012, more than 500,000 veterans had accessed their data via the Blue Button initiative, and many data holders in the private sector have begun to adopt the platform including payers, providers, and others.

Initiatives at the State Level. States are also expanding data access for researchers, payers, and health care providers. Sixteen states have or are in the process of developing all-payer claims databases, which will aggregate and collect medical, pharmacy, and dental claims data from plans including Medicaid, Health Insurance Exchange, Medicare, hospitals, and other sources.

States vary in their data access policies and the level of transparency on the research being conducted. Massachusetts, with a high degree of transparency and access for qualified researchers, grants access to de-identified data for research under a data use agreement. The process allows for public comment and includes representatives from a broad range of public and private stakeholders in the data release committee. Private organizations and academics have submitted applications for over 20 studies in 2012-2013.

Lots of Data, But Who Owns It?

The thornier issue to ready access is that of data ownership. While governments, at least in the U.S., are making efforts to permit patients and researchers greater access to data, a majority of health-related data are aggregated and curated by private companies.

In most cases, charging for access to the data is integral to their business plans. In many cases, the data has been transformed in some fashion by the company collecting the data, and therefore they consider the data as intellectual property.

Some organizations are willing to license their datasets. In others, access is provided either through an interface or through a carefully controlled procedure. These access limitations characterize not only private companies but some government-sponsored Big Data initiatives (ex. Genomics England, Danish Biobank, etc.).

Patients Not Proprietary

Some have argued that the solution to this barrier is the assertion by patients of their rights to control the use of their own data. However, when patients have access to their health data, 91 percent of the time they are willing to share that data to benefit research.

Thus, ready access to data may be facilitated by the advocacy of patient. Empowered by the Internet and easier access to health information, this decade has been marked by the rise of the ePatient — individuals “who are equipped, enabled, empowered, and engaged in their health and health care decisions.”

According to the Pew Internet & American Life Project, 59 percent of U.S. adults are looking online for health information, and 62 percent of adults with two or more conditions are now considered “trackers” —  patients monitoring health indicators including symptoms, diet, and activity.

A subset of these patients is keenly interested in sharing. Much as social networks have spurred unprecedented sharing of personal information, patients are sharing health information with one another. A pioneer in this field has been PatientsLikeMe. By allowing patients to publicly track health indicators, data can be aggregated and analyzed by individual patients in an effort to determine their best course of treatment.

When an open-label study with 44 patients was published in 2008 suggesting lithium could delay the onset of amyotrophic lateral sclerosis (ALS), hundreds of patients in the PatientsLikeMe community began to use lithium, self-tracking symptoms and progression on the website. PatientsLikeMe was able to use this data to publish findings in a major scientific journal showing that lithium did not slow ALS.

While randomized trials were ongoing to attempt to replicate the original study, PatientsLikeMe was able to publish preliminary findings from data provided by these “citizen scientists” in only nine months and at a very low cost.

Public-Private Consortia Is the Answer

Patient advocacy may be part of the solution but it is not enough, for two reasons. First, there will be enormous implementation challenges that will require multi sector solutions. Second, as long as business interests are not aligned, enthusiasm by data aggregators will be muted at best and progress will be slow.

So what is the solution?

Much as public-private consortia have arisen to address key non-competitive challenges — such as the FDA Sentinel Initiative and IMEDS to make progress on pharmacovigilence methodology — a public-private consortium could create a platform for the creation of a “public good” data repository to facilitate and accelerate scientific progress, including improving health care efficiency and effectiveness and the discovery and development of new health care technologies.

Incentives must be aligned to promote collaboration, establish open source data standards, and increase critical data capture; this will probably require advocacy from patients as well as regulatory and legislative action. The value of these principles to the promotion of health research is generally recognized.

A white paper by the 21st Century Cures initiative recently stated,  “FDA’s review of supplemental applications for new uses or changes to a product are governed by pathways established when computers could not identify trends in statistical or clinical data anywhere close to the degree they can today, let alone what they will be capable of doing tomorrow. Considering these ongoing developments, should we be rethinking the supplemental approval processes and how real world data can be leveraged?”

Indeed, we should. Realizing the promise of Big Data and Real World Data will require some disruptive changes to the health care ecosystem. We will make more rapid progress through collaboration and cooperation.