The March issue of Health Affairs demonstrates the potential of health care delivery system innovation to improve value for both patients and clinicians. Technology innovations such as machine learning and artificial intelligence systems are promising breakthroughs to improve diagnostic accuracy, tailor treatments, and even eventually replace work performed by clinicians, especially that of radiologists and pathologists. Machine-learning systems infer patterns, relationships, and rules directly from large volumes of data in ways that can far exceed human cognitive capacities. As the computational underpinning of tools such as e-mail spam filters, product and content recommendations, targeted advertisements, and, more recently, autonomous vehicles, machine learning is already ubiquitous in many economic sectors. Yet, machine-learning applications are still used sparingly today in the delivery of care.

Electronic health records (EHR) systems, and the digitization of health data more broadly, have promised to transform health care to be more intelligent, safe, efficient, and cost-effective. While machine learning can be a key enabler of this promise, most EHR vendors do not provide robust machine learning, natural language processing, cognitive computing, and artificial intelligence solutions to process internally generated or imported health data, which come in a variety formats (for example, text, images, claims, genomics, and so forth). More general limitations of machine learning, such as the difficulty in interpreting results and describing to clinician users how algorithms arrive at particular outcomes, have further hindered adoption in health care.

While we believe machine learning holds great promise, it is far from clear how it will transform health and health care in the short to mid-term. Today, policy makers and industry executives face decisions about when and how to invest in machine learning to optimize organizational effectiveness and efficiency without wasting capital funds on premature or nonvalue-adding technologies.

For the past several years, we have worked on multiple projects to build, test, and deploy machine-learning solutions to improve safety, quality, and patient outcomes, while reducing costs in health care. In our experience, machine-learning solutions demonstrate high potential for ancillary and support services in health care but often fail to deliver compelling impact for frontline clinicians. We share four key lessons about how to generate value today from EHR systems and machine-learning applications.

Use Machine Learning To Eliminate Routine, ‘Mundane,’ But Resource-Intensive Processes

Human intelligence outperforms machine-learning applications in complex decision making routinely required during the course of care, because machines do not yet possess mature capabilities for perceiving, reasoning, or explaining. Moreover, despite significant progress, even state-of-the-art machine-learning algorithms often cannot deliver sufficient sensitivity, specificity, and precision (that is, positive predictive value) required for clinical decision making. For example, using various machine-learning techniques, we developed statistical models for predicting health care-acquired infections. These models performed well in terms of specificity or sensitivity but were not sufficiently high in terms of precision (that is, one in 15 predicted cases were positive) to meet clinician requirements.

While many machine-learning solutions are not yet mature and sophisticated enough to support complex clinical decisions, machine learning can be effectively deployed today to reduce more routine, time-consuming, and resource-intensive tasks, allowing freed-up personnel to be redeployed to support higher-end work. By draining time, energy, and attention, such tasks can lead to clinician burnout and hinder clinicians’ ability to practice at the top of their expertise when providing care.

For example, when treating patients, reviewing their history is both a routine and pertinent process. However, this task can be very time consuming and cumbersome for clinicians, especially with large amounts of patient data, much of which may be contained in unstructured notes. Consequently, during patient visits, clinicians may only rely on a partial patient history, such as the most recent visit, or they resort to having the patient recount his or her history, which can be unreliable. Machine learning in conjunction with natural language processing can be used to go through a patient’s entire medical history in the EHR, instantly looking for hundreds to thousands of different crucial facts.

In collaboration with the MedStar Institute for Innovation, we developed such a system in a MedStar Health hospital emergency department. The average patient visiting the emergency department has around 60 documents in his or her medical history, and each document can take up to a minute to read. With clinicians seeing two patients every hour, it is neither feasible nor practical to comprehensively identify relevant and crucial facts in the patient history for informing care decisions. In such time-constrained settings, clinicians can spend more than half their time with the patient conducting a review of his or her medical history in the EHR and still risk missing relevant facts. In close collaboration with clinicians, we developed a machine-learning system that instantly scans the entire patient history and provides recommendations on what is important based on the patient’s presenting symptoms. Clinicians can then be left to evaluate the outputs and make the best diagnosis, treatment, and care decisions. Over time, these decisions can be captured to “learn” what clinicians find relevant in the course of care to improve accuracy and utility. This approach did not require workflow changes and freed clinicians’ time and energy to focus directly on taking care of patients.

Similarly, the first Food and Drug Administration-approved machine-learning solution focuses on eliminating highly laborious and manual tasks associated with quantifying blood flow in cardiac MRIs. Experienced clinicians today can spend 60-90 minutes physically drawing contours on a desktop to calculate blood flow. The system developed by Arterys uses deep learning algorithms to accurately quantify blood flow in 15 seconds, enabling clinicians to focus on higher-order cognitive tasks in caring for the patient.

Foster Clinician Acceptability Of Machine-Learning Solutions Through Iterative Product Development Processes

Data scientists commonly apply machine learning to find new insights in data or to create predictive algorithms that perform as well or better than the status quo and benchmark medical literature. However, many data science organizations are unsuccessful in convincing clinicians to use and adopt those algorithms at the frontlines.

The advantage of machine learning is not only in its static outputs but in its ability to efficiently automate or augment human tasks based on what it learns from data. To this end, we experienced higher levels of clinician engagement when machine-learning projects were approached as product development, as opposed to traditional research and analysis projects. Product development involves iterative prototyping and implementation of user-facing tools that integrate front-end interfaces with machine-learning algorithms and back-end data systems. This approach required multidisciplinary teams, which were staffed with clinicians, software engineers, human factors and user-experience experts, and data scientists. We found that project teams focused primarily on what functionalities can be implemented to improve use and adoption of the end-state solution, with less fixation on whether algorithms achieved specific performance thresholds or generated previously unknown insights. For example, when building the previously described recommendation engine for emergency department triaging, rather than needing the algorithm to perform perfectly, clinicians found it more important that the application could achieve an acceptable level of accuracy and include simple mechanisms to capture and correct any erroneous results.

Do Not Rely Exclusively On One Vendor To Provide All Analytics Solutions

EHR vendors are actively enhancing features to provide increasingly attractive solutions for data management and analytics. However, a health care organization that relies on a single EHR vendor’s analytic solutions, as well as its own legacy analytics infrastructure created before the era of big data, may see limited progress.

Leading machine-learning solutions, both general and health-specific, are evolving rapidly and are likely to come from both start-up and established technology companies as well as innovative health systems. Many of these solutions (Google, Facebook, OpenAI) are open-sourced and available to anyone. Furthermore, health-focused technology companies such as Health Catalyst have also open-sourced core machine-learning algorithms specific for health care data and applications to help stimulate adoption. Beyond publishing in the medical literature, leading health systems are also increasingly making their health analytic techniques available through marketplace platforms such as Apervita.

Health systems should adopt an information technology infrastructure based on modular and open architecture principles that make it easy to add or update components and integrate machine-learning solutions as plug-in functions to the EHR. Furthermore, to build, test, and deploy machine-learning algorithms, organizations need internal policies and mechanisms to transfer health data in and out of legacy systems securely. This approach will enable organizations to take advantage of industry innovations, while reducing the risk of machine learning-system obsolescence and avoiding the cost of custom integrations.

Without Better Capture Of Care Endpoints, Machine Learning Will Not Achieve Its Full Potential To Improve Outcomes

Machine learning can be useful in optimizing efforts to reach a particular endpoint or outcome (for example, being pain or symptom-free, achieving a given functional status, avoiding an infection) by taking into consideration historical data trends and variations in care delivery. To accomplish this, the best machine-learning techniques today are extremely data hungry, especially for clinical or patient outcomes. However, EHRs do not typically capture outcomes of care or treatment goals in a standardized format. For example, it is difficult for analysts to identify health care–acquired infections in many EHR systems, because doing so may require complex processing of lab pathology values or use of International Classification of Diseases codes, which can be unreliable.

To fully realize the value of machine learning, health care organizations must better capture standardized outcome data that represent the goals of the treatment processes. Without these standardized endpoints, it is not possible to “train” machine-learning algorithms to sufficiently explain the variability in outcomes that can be translated into better tailoring of diagnostic or treatment processes. For example, in one project, we tried to predict patient pain scores based on variability in patient characteristics and differences in pain medication and dosages. However, the pain score data, like most transactional data, were often captured incompletely and inconsistently across providers and encounters, making it challenging to develop a robust machine-learning algorithm that reliably predicts the outcome.

Researchers as well as policy and decision makers in federal agencies and leaders of medical specialty societies need to intensify efforts to identify and push for the adoption and capture of standardized outcome measures in EHRs. Such outcome measures should be salient to specific clinical conditions and allow comparisons across conditions. For example, measures of pain and functional status can span many clinical conditions, whereas other measures (for example, Apgar scores) might be specific to only one or a few conditions. We found that data in EHRs typically lack the spectrum of patient-oriented outcomes needed to comprehensively measure value to the patient, such as: treatment and recovery outcomes (for example, increased or reduced use of services or speed of return to normal functioning); and, long-term health status (for example, reoccurrence of chronic pain, sustained side effects from treatment, reduced reliance on medications, or reported quality of life). Without the capture of standardized outcomes data, machine-learning benefits will be limited in addressing the biggest opportunities for improving value in health care.

The speed of innovations in machine learning will continue to accelerate, and health care will be a key industry experiencing “disruption.” While practical applications in health care may seem rare today, carefully examining and evaluating opportunities to automate tasks and augment decisions with machine learning can quickly yield benefits in everyday care. Furthermore, by taking a practical approach to evaluating and adopting machine learning, health systems can improve patient care today, while preparing for future innovations.