We are living in an age of Big Data. Thousands of terabytes of information have been collected on everything from internet behavior to RNA transcripts, spurred by the successful development of tools capable of analyzing massive amounts of information. Now, medical data—datasets from electronic health records and other sources—are joining the club. How might data analysis of these vast swaths of medical information serve to help patients and healthcare providers?
The Age of Big Medical Data
While the most prominent uses of big data have been in fields like online security engineering and genetic research (genomics, transcriptomics, and other “omics” fields), huge amounts of data have also been generated in the realm of healthcare. Despite the challenges of using this vast resource of data—patient privacy concerns being in the forefront—”oceans of data” are becoming available as information is harvested from sources as diverse as wearable technology, sensors and electronic health records (EHRs).
The NIH has prioritized smart use of this data through its Big Data to Knowledge (BD2K) program, which promotes access to data to empower healthcare providers. In Europe, the European Medical Information Framework has compiled health data from the EHRs of millions of Europeans and provides tools for accessing the data as well as workflows for assessment and discovery. Other initiatives from governments, organizations and private companies are cropping up across the globe.
Harnessing Big Data: Machine Learning
Data gathering is only the first requirement of using large-scale datasets for healthcare enhancement. It’s one thing to have a mountain of data; it’s quite another to harness it. How can millions of EHRs, each with multiple data points, be analyzed? The answer is a type of artificial intelligence tool known as machine learning.
What Machine Learning Cannot Do
To understand the vast capabilities of machine learning, it is important to first understand what machine learning is not. Algorithms for straightforward tasks such as mathematical equations or if-then decision trees have been around since the mid-20th century. These algorithms can do tasks that would be impossible for humans (such as averaging millions of values), but the actual tasks are designed by humans. In other words, a human programmer tells the computer exactly what to do, and the computer does it.
Machine Learning Crunches Huge Datasets
Machine learning goes beyond specific human instructions. Consider a problem that we want to solve, but we don’t know exactly how to solve it; we don’t have an equation we can specify.
For example, we would like to know if there are any gene expression variations associated with a specific type of cancer, but we have no idea what genes might be involved—we can’t ask the machine to look for differences in expression of a specific gene because we don’t know what we are looking for. In a machine learning approach, all the expressed genes in a set of patients with that specific cancer would be fed into an algorithm, along with a set of all the expressed genes of a population of healthy control subjects who do not have the cancer. The algorithm can consider the entirety of the data—millions of genes, many of which we know nothing about—in both these sets, and then compare them. If the datasets are large enough (and, machine learning does require very large datasets), the output will be genes that are overexpressed in the cancer population compared to the healthy population.
Machine Learning Can Provide Insights From Outside Our Knowledge
In this way, important genes—or values such as “successfully responds to Treatment X”— can be identified, even if we know nothing about them. In the above example, there is no need for any understanding of cancer or prior information about the genes—just the information the machine has learned by objectively comparing the two sets of data. In this way, the machine can learn what even the expert human is blind to—insights that come from outside our base of knowledge and understanding. (Although, in point of fact, many advances in medicine throughout the ages, from penicillin to antidepressants, have been due to such “blind insights”, gained through just serendipity rather than data analytics.
Machine learning allows the same sorts of insights, but without having to wait for chance or luck.) In the end, what is provided by the algorithm is an answer not to a specific question, but to a general, exploratory one: “Is there anything that distinguishes these two groups?”
Then, it’s up to a human expert to further examine that distinguishing characteristic to determine its meaning and/or practical utility.
Big Data for Diagnostics, Preventative Medicine, and Precision Medicine
In the hands of conscientious medical doctors and other specialists, insights from big data analytics can be an extremely valuable tool in the clinical arena, increasing the accuracy and precision of diagnosis, especially in cancer. This approach also allows the harnessing of large amounts of data for the development of precision medicine—the personalization of diagnostic and treatment options—towards a goal of optimal healthcare for each individual patient. In the arena of infectious diseases, a machine-learning approach can help to identify particularly vulnerable patients, and big data analytics is being increasingly deployed in public health and disease surveillance.
Companies Partnering with Healthcare for User-Friendly Options
The power of artificial intelligence is being honed for healthcare by companies like IBM through its Watson Health system, which has partnered with major players in the healthcare space, including the Mayo Clinic and Memorial Sloan Kettering Cancer Center. Smaller companies like the Leipzig-based ecSeq Bioinformatics are also cropping up, to provide hands-on support and training in healthcare IT for precision medicine. Such companies perform a vital function, filling the gap between the capabilities of technology and the skills of clinicians, who want an efficient and user-friendly system.
Balance Healthcare IT With Human Expertise
In incorporating data analytics into healthcare, it is important to remember two things. First, healthcare IT should not interfere with a clinician’s ability to provide patient care or increase overall time requirements for clinicians; the point of technology is to improve healthcare, not make it more difficult.
Second, artificial intelligence is not better than human intelligence; just different. It should never be viewed as a substitute for human expertise. Substituting machine learning insights and recommendations for those of human experts is not a safe approach. This applies to all types of healthcare IT; clinical errors can actually increase if healthcare workers over-rely on technology or automatically consider technology to be superior to human expertise or vigilance.
Outputs from machine learning can generate insights, but those insights must be considered judiciously by human healthcare experts or tested scientifically by biomedical researchers before drawing conclusions.
Big Data for Cost Reduction
Big data holds great promise for improving the health of patients. Perhaps even more dramatic, though, is its potential for reducing healthcare costs. For example, consider chronic diseases like heart disease and diabetes. These are responsible for some of the highest costs in healthcare, so reducing expenditures in these areas can go a long way.
Intervention for High-Risk Patients Before a Crisis Occurs
Boston University College of Engineering researchers Paschalidis et al. are working to develop predictive models that use data in EHRs to identify individuals at risk for heart disease and diabetes. They believe these algorithms could improve outcomes and reduce hospital expenditures by targeting high-risk patients for intervention before their condition reaches the critical stage. They are now collaborating with Boston Medical Center in an NSF-funded project to further develop machine-learning algorithms to identify these high-risk patients.
Error Prevention Through Workflow Monitoring
Hospitals can also reduce costs substantially through error prevention. By using technology such as Vitalacy’s Workflow Monitoring, which gathers data on every entry and exit into a patient’s room and combines this with handwashing data, they can predict which areas of the hospital are at risk for hand-hygiene failure, a simple error with potentially critical consequences in terms of healthcare-associated infections.
Beyond the suffering of the infected patients, such errors can lead to staggering expenditures for the hospital in the form of lawsuits. In addition, individual hand-hygiene failures are now cited by the Joint Commission as a deficiency that will result in a Requirement for Improvement under the Infection Prevention and Control chapter for all accreditation programs.
Monitoring of other types of errors, such as patient identification errors and medication errors, can also prevent harm to patients, saving hospitals substantial sums in the process. Big data analytics are steadily improving at tasks like spotting medication errors. In addition, machine learning algorithms are being developed that can identify adverse drug reactions from unstructured EHR data, supporting research to expand the knowledge of such drug interactions.
A Paradigm Shift: Incorporating Data Analytics to Maximize Health Outcomes
With steady improvement in sensors and wearable technology, we are approaching a time when data analytics will be able to be used as a matter of course for improved patient care. This capability calls for a paradigm shift towards a “next-generation” architecture that incorporates data analytics into patient care for improved outcomes and efficiencies.
For example, it can take 5 to 10 minutes out of a 20-minute appointment for a doctor to go through a list of important lifestyle factors and enter the answers into an EHR. Wearable technology, however, can provide highly accurate data on factors like exercise, smoking, and sleep—much more accurate than patient reporting, in fact. If a nurse were able to plug the data from the wearable device into an algorithm that correlated the values with risk on the basis of machine learning from large patient datasets, this information could simply go into the chart along with vitals like weight and blood pressure. This would allow the doctor to focus on those factors that require attention, taking the time to work with the patient on a plan with which the patient can realistically comply.
By orienting the healthcare architecture to take advantage of data throughout the healthcare system, data analytics has the potential to deliver a triple win—empowering clinicians, improving patient outcomes and increasing the profitability of hospitals.
Dr. CS Copeland holds a BA in neuropsychology from the University of California at San Diego and a PhD in molecular and cellular biology from Tulane University, specializing in parasitology and virology, with postdoctoral research in molecular entomology and computational genomics.