Big data offers breakthrough possibilities for new research and discoveries, better patient care, and greater efficiency in health and health care, as detailed in the July issue of Health Affairs. As with any new tool or technique, there is a learning curve.
Over the last few years, we, along with our colleagues at Booz Allen, have worked on over 30 big data projects with federal health agencies and other departments, including the National Institutes of Health (NIH), Centers for Disease Control (CDC), Federal Drug Administration (FDA), and the Veterans Administration (VA), along with private sector health organizations such as hospitals and delivery systems and pharmaceutical manufacturers.
While many of the lessons learned from these projects may be obvious, such as the need for disciplined project management, we also have seen organizations struggle with pitfalls and roadblocks that were unexpected in taking full advantage of big data’s potential.
Based on these experiences, here are some guidelines:
Acquire the “right” data for the project, even if it might be difficult to obtain.
We’ve found that many organizations, eager to get started on a big data project, often quickly gather and use the data that is the easiest to obtain, without considering whether it really goes to the heart of the specific health care problem they’re investigating. While this can speed up a project, the analytic results are likely to have only limited value.
For example, we worked with a federal agency experimenting with big data analytics to identify cases of perceived fraud, waste, or abuse. The program’s analysts focused on data they already had on hand and currently used to direct audit and investigation activity. We encouraged project staff to identify alternative data sources that might reveal important information about compliance history or “hotspots” for illegitimate activity.
We learned that historical case reports and online provider marketing materials were available and were a potentially valuable source for information to aid in fraud detection. However, the project analysts had decided it would take too long to incorporate that information and so had excluded it.
Many organizations – both inside and outside of health care – tend to stick with the data that’s easily accessible and that they’re comfortable with, even if it provides only a partial picture and doesn’t successfully unlock the value big data analytics may offer. But we have found that when organizations develop a “weighted data wish list” and allocate their resources towards acquiring high-impact data sources as well as easy-to-acquire sources, they discover greater returns on their big data investment.
Ensure that initial pilots have wide applicability.
Health organizations will get the most from big data when everyone sees the value and participates. Too often, though, initial analytics projects may be so self-contained that it is hard to see how any of the results might apply elsewhere in the organization.
We ran into this challenge when we helped a federal health agency experiment with big data analytics. The agency’s initial set of pilots focused on specific, computationally complex and storage-intensive challenges, such as reconfiguring a bioinformatics algorithm to run across a large cluster of processors and developing a data-capture approach to access and store data in real time from a laboratory instrument.
While each pilot solved a big data analytics challenge, the resulting capabilities did not provide examples that would be powerful enough to push transformational change across the organization, as the organizational leaders had hoped.
In subsequent pilots, we advised the agency to focus on less rigorous but more far-reaching pilots. In one project, the agency piloted an unstructured natural language processing and text search utility across a number of disparate data archives. In another project, we deployed a data platform that could rapidly generate millions of records of synthetic data for algorithm testing.
In each case, organizational decision-makers could more easily see the applicability and potential of big data analytics and more clearly understand the potential of big data to transform their organization.
Before using new data, make sure you know its provenance (where it came from) and its lineage (what’s been done to it).
Often in the excitement of big data, decision-makers and project staff forget this basic advice. They are often in a hurry to immediately start data mining efforts to search for unknown patterns and anomalies. We’ve seen many cases where such new data wasn’t properly scrutinized – and where supposed patterns and anomalies later turned out to be irrelevant or grossly misleading.
In one such case at a federal health agency, information contained in a data source suggested that there was a significant uptick in the number of less-experienced clinical investigators associated with a set of therapeutic areas. Project staff identified this as an important trend to aid in risk analysis for the agency and prepared to brief senior decision-makers.
However, when the findings were presented first to the administrator for the data source, he suspected that the trends might coincide with the roll-out of new address fields.
As a result of a data-field change, when new address information was added for an investigator, it didn’t append to the original file, but created an entirely new file – making it appear that there were many new investigators, when in fact the number of investigators had slightly decreased over time.
This scenario could have been avoided through an investigation and annotation of candidate data sources with provenance and lineage information prior to operational use. With big data analytic techniques, such details can be prospectively or retrospectively annotated to data records, indicating the prevailing process and data standard at the time of collection.
Then, data miners can leverage this factor in data mining efforts and predictive models to test whether the data-collection process is causing a significant effect in the outcome variable of interest.
Don’t start with a solution; introduce a problem and consult with a data scientist.
Unlike conventional analytics platforms, big data platforms can easily allow subject-matter experts direct access to the data, without the need for database administrators or others to serve as intermediaries in making queries. This provides health researchers with an unprecedented ability to explore the data – to pursue promising leads, search for patterns and follow hunches, all in real time. We have found, however, that many organizations don’t take advantage of this capability.
One federal health agency we worked with, for example, invested in big data analytics to enable network analysis of nodes in a supply chain. Instead of giving its subject-matter experts free rein to look for new and unexpected patterns, the agency stayed with the conventional approach, and simply provided canned business-intelligence reports and visualizations to the end-users.
Not surprisingly, the outputs of this approach disappointed organizational decision-makers in terms of generating new insights and value. We strongly encouraged the agency to make sure subject matter experts could have direct access to the data to develop their own queries and analytics.
Once this was provided, the user community rapidly grew, and there was an associated increase in new capability, training requests, and overall value for the organization.
Health organizations often build a big data platform, but fail to take full advantage of it. They continue to use the small-data approaches they’re accustomed to, or they rush headlong into big data, forgetting best practices in analytics.
It’s important to aim for initial pilots with wide applicability, a clear understanding of where one’s data comes from, and an approach that starts with a problem, not a solution. Perhaps the hardest task is finding the right balance.