The Rise of Big Data

Even as many doctors struggle to give up their pen and paper charts, some innovators are already shifting healthcare information technology into warp speed. Researchers, health systems, and other stakeholders are analyzing huge amounts of aggregated information—big data—to elucidate patterns that remained hidden under old data models. Blending biostatistics, bioinformatics, computer programming, and operational research, big data is expected to transform the process of clinical decision-making. And of course, much of this data will come from laboratory medicine. The promise of big data is taking these growing data repositories—from lab results to claims codes—and analyzing them with improved computer power to strengthen the evidence base across the healthcare spectrum.

Several factors are driving this new data binge. The Health Information Technology for Economic and Clinical Health (HITECH) Act, Affordable Care Act, and other payment reform initiatives provided stimulus for change in recent years. The government is pushing providers to adopt electronic health records (EHRs), while at the same time tying more reimbursement dollars to data on quality of care. Similarly, the cost of care and competition in the healthcare marketplace have made investors more data-driven.

Finally, as genome sequencing drops in cost and rises in speed, everyone's universe of information is expanding. "The idea of having 1,000 nodes was pretty sexy five to 10 years ago, but it's kind of a commodity in 2014," said Christopher Chute, MD, DrPH, section head of medical informatics at Mayo Clinic in Rochester, Minn. "There's really no limit to the scale of the computational approach when you invoke these big data principles."

Frontline Forays

Provider Organizations Spending Big Money on Big Data

Although the front runners in big data tend to be large health systems and corporations, the big data paradigm will soon be felt even in smaller organizations. The research firm Frost & Sullivan predicts that use of advanced health data analytics solutions in hospitals will increase to 50% adoption by 2016. Networks and other computing infrastructure are becoming commodity products that can reap profits as cloud-based hosted services. In fact, 45% percent of hospitals now have clinical data warehouse applications, as executives seek to get the most out of the money hospitals are pouring into their EHRs, according to a 2013 report from HIMSS Analytics.

With all this activity, big data has a somewhat fluid and flexible meaning in healthcare, explained Michael Hogarth, MD, professor of pathology and internal medicine at UC Davis and medical director of Clinical Registries for UC Davis Health System. “It can mean large volumes of data, complex data, or real-time data supporting actionable information. In healthcare today, it is primarily about integration of complex data coming from multiple source systems in order to support improved healthcare delivery—care that is safer, less costly, and patient centered.”

Organizations interested in using big data analytics require both access to computing power and knowledgeable staff to work the data. The first inroads are being made by top-tier and larger healthcare institutions that have invested hundreds of millions of dollars in analytics capabilities.

The largest managed care organization in the United States, Kaiser Permanente, was reported to have the largest clinical data repository in the world, an enterprise-wide electronic health record (EHR) system for nearly nine million patients. Today, Kaiser’s computer system and use of EHRs allows data exchange among all its medical facilities. An April 2013 report by consulting firm McKinsey & Company estimated that Kaiser Permanente’s big data strategy has saved the organization $1 billion in reduced office visits and lab testing.

At the University of Pittsburgh Medical Center, an enterprise data warehouse launched in 2012 is housing more than 200 sources of information, including clinical, genomic, proteomic, imaging, and finance data from across its health system and outside entities, including labs and pharmacies. Its first test of the new $100 million analytics system in 2013 examined integrated clinical and genomic information from 140 breast cancer patients, with follow-up research ongoing. Flagging patients at risk for kidney failure based on subtle changes in lab results is another target.

In New York City, Mount Sinai Hospital is also investing in big data. In April 2013, this venerable institution finished building its own $3 million supercomputer, named Minerva after the Roman goddess of wisdom and medicine, and hired over 100 data scientists. Mount Sinai’s big data team includes laboratory experts in genomics to help doctors make personalized predictions about patients. The aim is to learn to predict risk more precisely, for example, by reclassifying diabetics or identifying patients likely to be readmitted.

“With Minerva, Mount Sinai has the ability to quickly analyze genomic patterns to provide a greater understanding of the causes of disease and how to personalize treatments according to an individual’s genetic composition,” said Dennis Charney, MD, dean of the Icahn School of Medicine at Mount Sinai in a statement. “The supercomputer is able to accomplish real-time visualization of advanced molecular models, promoting drug development and allowing us to test the effects of molecular variations on different receptors in the body.”

In a research project involving primary care practices with an EHR system, Sutter Health in California and Geisinger Health System in Pennsylvania are working with IBM on new data analytics tools, under a $2 million grant from the National Institutes of Health. The initial focus is using data in the EHR to improve early detection of heart failure. The project aims to identify best practices that help health systems integrate big data analytics into primary care for more tailored disease management.

Big data has plenty of opportunities to keep growing. Experts say that healthcare organizations are motivated to improve patient safety by reducing hospital-acquired infections, for example, and to monitor patients beyond the healthcare setting, such as looking for signs of depression after a cancer diagnosis. As more hospitals and physician offices embrace EHRs, burgeoning health information exchanges want to compare how member providers, such as diabetes care clinics, manage large groups of patients. Similarly, public health officials see an opportunity to improve their monitoring of population health.

Challenges for the Lab

As digitization of healthcare information takes hold through EHRs, providers who seek to leverage their growing data warehouses will also have to deal with issues of access—such as patient privacy and intellectual property—as well as standardization and interoperability. This will require attention and new thinking, according to experts, even as some organizations move ahead quickly with big data projects (See Sidebar).

"If data is the new gold, then access to data is going to be key to insights," explained Viktor Mayer-Schönberger, professor of internet governance and regulation at the University of Oxford in England and co-author, with Kenneth Cukier, of Big Data: A Revolution That Will Transform How We Live, Work, and Think. "On the one hand this may lead to a more protective approach, sharing as little data as possible so that others don't get to reap the value of one's data," Mayer-Schönberger said. "On the other hand, many data holders in the future may not be able to see all the potential reuses of the data they have for novel purposes, and thus let others have access to it, perhaps against a license fee."

Mayer-Schönberger on July 28 will deliver a plenary address at the 2014 AACC Annual Meeting and Clinical Lab Expo in Chicago, "Understanding Big Data and Its Impact on Your Laboratory."

Labs will have an essential role in making sure that data is standardized and interoperable if big data is to fulfill its promises. "The naming of things is quite central to the secondary use of data—quality improvement research, inclusion of patients in clinical trials, and so on," said Chute, who is principal investigator for the Office of National Coordinator-funded SHARP grant on secondary data use. The problem of using secondary data for reasons other than immediate patient care has been a long-standing priority of Mayo Clinic, he noted, and now the organization uses a system of data governance.

Inconsistent naming can thwart healthcare analyses, Chute emphasized. "To the degree possible, your analytic efficiency is going to be vastly improved and clarified, and made more focused and resolute, by invoking two things: compatibility and consistency." For labs, a big part of this name game is using Logical Observation Identifiers Names and Codes (LOINC), a database and universal standard for identifying lab results. Labs must use LOINC in electronic messages to pass the government's EHR meaningful use standards, but most don't yet use these codes internally, Chute noted. Instead of creating idiosyncratic identifiers, labs should use the standard even in the internal coding environment for operations, he stressed.

"Interoperability, in my mind, comes down to these naming issues," Chute explained. "If you're going to bring in a new laboratory information system, are you going to populate it with your old laboratory codes? That would make it interoperable with your own historical data, but not with anybody else's. Do you treat that as an opportunity to migrate the enterprise…to using a national or international standard for laboratory data, which would solve a lot of headaches?"

Navigating a Paradigm Shift

In their book, Mayer-Schönberger and Cukier note that big data analytics can extract transformative insights that have the power to overturn established practices. The findings of big data research and analytics can run up against conventional ways physicians make decisions and interpret information.

Laboratorians may find themselves on the forefront of these shifts, helping clinicians cope with change. "Labs will churn out more data," said Mayer-Schönberger. "They won't necessarily grow larger, but they will produce vastly more data—and this requires more resources to store and analyze the data, plus new expertise in what tools to use to do this. In some sense the natural sciences, more generally, and labs, in particular, will become a bit more like cutting-edge social sciences—with a strong emphasis on the ability to make sense of the data."

Clinical laboratorians can participate and contribute leadership as healthcare harnesses big data. In the view of one expert, laboratorians should seek a data collaboration role as much as expecting to deal with large data sets. "This has been part of the laboratory tradition—to exchange data with EHRs—but what we have done in the past revolves almost exclusively around provisioning data for a specific transactional clinical workflow, such as lab ordering or results delivery," said Michael Hogarth, MD. "What I mean by data collaboration—as a philosophy to support integrated multidimensional data—is not something anyone has really done in healthcare, regardless of being in the lab or elsewhere. To support the population-based analytics needed to characterize how we deliver care, and thus work to improve that, will require that those who have source systems can provide data to an aggregated data set in a timely fashion, with high data fidelity, and perhaps to transform some of the data in the process." Hogarth is professor of pathology and internal medicine at UC Davis and Medical Director of Clinical Registries for UC Davis Health System.

Pushing the Envelope in Genomic Medicine

In genomics, the limits of how genomic data can be used are really the limits of bioinformatics. Perhaps no other discipline, in fact, will depend on rapid advances in computing power to harness the full power of raw, biomedical data. For example, the importance of genomic variants needs to be learned before medical care can advance, said Eric Green, MD, PhD, director of the National Human Genome Research Institute.

With sharp declines in the cost of next-generation DNA sequencing technologies, demand for genomic-guided care is rising. Laboratorians will help make the massive amounts of genome sequencing data on each single patient easy for physicians to employ in clinical decision-making. "You only want to know about the relevant genomic variants. You want it to be a quick lookup, maybe with alert values. Already, that's happening in some places for pharmacogenomics," said Green.

At leading-edge institutions such as the Mayo Clinic Center for Individualized Medicine, laboratorians and bioinformaticians collaborate as part of teams to integrate DNA information into patient care. "Pathologists are absolutely at the table, as they often are, when it comes to the ability to take lots of data and interpret it," Green noted. "They will be at the table, with others, helping to determine what is relevant and what is not. In particular, anatomic pathologists will be deep in this on cancer, since routine cancer diagnostics will soon have a major genomics component. It already does for some kinds of cancer."

In the case of clinical laboratories, these innovations mean laboratorians need to become more comfortable with big data terms and technologies. "Everybody's threshold of big data is different, but they're going to have to get used to looking at a lot of data about genomes of patients," Green said. "The next generation will have genomic information as a routine part of medical care. Every patient will present with hundreds and hundreds of genomic variants, and healthcare providers will need to know which of those are relevant. Hopefully, we'll have in place systems that streamline that information and give them recommendations: how that changes what medications they should get, how they should be treated for different disorders, and so forth," said Green.

Clinicians must learn to work with big data, too—with help from the lab and others. "We need total experts, but then, we need everyone to have their game up a little bit," said Green. "We need everybody to be facile—not experts, just very comfortable. That's the next-generation physician, next-generation biomedical researcher. Even if they are not a data science expert, they need to have minimum competencies." Midcareer professionals who are in practice right now also need training, said Green: "Since the time they were in school, the world has changed with respect to big data. They haven't gotten these credentials. How do we train individuals who still have 20 to 30 years of their careers left?" The National Institutes of Health (NIH) is already thinking about how to tackle the training necessary for these jobs, Green added. And other healthcare organizations can do likewise.

Labs should get ready, too, for big data experts—quants—to join lab researchers and work with them, counseled Mayer-Schönberger. Quants will play "integral if not key roles in research. They will come from far afield, with little substantive knowledge initially, but still be able to be valuable contributors because of their data skills."

A Federal and Global Push

Despite the huge leaps made during just the past few decades in computing power and biomedical knowledge, big data is just getting started. NIH and other organizations and experts worldwide are serious about moving forward.

For starters, NIH made the world's largest set of data on human genetic variation, produced under the 1000 Genomes Project, freely available on the Amazon Web Services cloud. Cloud-based collaborations such as this let researchers use data at a fraction of the cost an institution would spend to acquire the needed internet bandwidth, data storage, and computing capacity. Development of a unified clinical genomics database (ClinGen) is another data-sharing collaboration ultimately meant to advance medicine. The Centers for Medicare and Medicaid Services, Centers for Disease Control and Prevention, and Food and Drug Administration also are making more data available.

In June 2013, a global alliance to enable responsible sharing of genomic and clinical data was announced. Influential sequence-data holders—including NIH, the Wellcome Trust Sanger Institute of Hinxton, England, and BGI-Shenzhen of China—pledged to be members, as did nearly 70 healthcare, research, and patient advocacy organizations in 13 countries. They're not necessarily agreeing to share their data. The alliance is interested in setting interoperable standards and policies on ethics, privacy, and technical issues related to aggregating and sharing data.

According to Mayer-Schönberger, laboratory professionals can be champions of the coming improvements to healthcare stemming from big data. "To realize what is in the offing, it's helpful to realize that most of the methods, the processes, the institutions that we currently use to collect data and eventually produce knowledge are artifacts of small data thinking, of the need to squeeze the most value out of the least data, but only for one purpose and then throw it away," Mayer-Schönberger commented. "As the constraints in collecting, storing, and analyzing vastly more data are greatly diminished, we need to rethink everything, including the very foundations of how we organize and conduct research. Because labs have long worked with data, lab experts intuitively understand the power of data—and when they realize the power of big data, they can become natural early champions for this change."

Nancy B. Williams is a freelance writer in Arlington Heights, Ill. Her email address is [email protected].