Machine Learning and Laboratory Medicine: Now and the Road Ahead

As the demand for healthcare continues to grow exponentially, so does the volume of laboratory testing. Similar to other sectors, research in the field of laboratory medicine has begun to investigate the use of machine learning (ML) to ease the burden of increasing demand for services and to improve quality and safety.

Over the past decade, the statistical performance of ML on benchmark tasks has improved significantly due to increased availability of high-speed computing on graphic processing units, integration of convolutional neural networks, optimization of deep learning, and ever larger datasets (2). The details of these achievements are beyond the scope of this article.

However, the emerging consensus is that the general performance of supervised ML—algorithms which rely on labeled datasets—has reached a tipping point where clinical laboratorians should seriously consider enterprise-level, mission-critical applications (Table 1).

In recent years, research publications related to ML have increased significantly in pathology and laboratory medicine (Figure 1). However, despite recent strides in technology and the growing body of literature, few examples exist of ML implemented into routine clinical practice. In fact, some of the more prominent examples of ML in current practice were developed prior to the recent inflection in ML-related publications (3).

This underscores the possibility that despite technological advancements, progress in ML remains slow due to intrinsic limitations of available datasets, the state of ML technology itself, and other barriers.

As laboratory medicine continues to undergo digitalization and automation, clinical laboratorians will likely be confronted with the challenges associated with evaluating, implementing, and validating ML algorithms, both inside and outside their laboratories. Understanding what ML is good for, where it can be applied, and the ML field’s state-of-the-art and limitations will be useful to practicing laboratory professionals. This article discusses current implementations of ML technology in modern clinical laboratory workflows as well as potential barriers to aligning the two historically distant fields.

Where Is the Machine Learning?

As ML continues to be adopted and integrated into the complex infrastructure of health information systems (HIS), how ML may influence laboratory medicine practice remains an open question. In particular, it is important to consider barriers to implementation and identify stakeholders for governance, development, validation, and maintenance. However, clinical laboratorians should first consider the context: is the ML application inside or downstream from a laboratory?

Machine Learning Inside Labs

Currently, there are just a handful of Food and Drug Administration (FDA)-cleared, ML-based commercial products available for clinical laboratories. The Cellavision DM96, marketed by Cellavision AB (based in Lund, Sweden), is a prominent example that has been adopted widely

since gaining FDA clearance in 2001 (3). More recently, the Accelerate Pheno, marketed by Accelerate Diagnostics in Tucson, Arizona, uses a hierarchical system that combines multivariate logarithmic regression and computer vision (4,5). Both systems rely heavily on digital image acquisition and analysis to generate their results.

The recent appearance of FDA-cleared instruments that process digital images is not surprising considering current advancements in computer science, especially significant strides researchers have made with image-based data. Robust ML methods such as image convolution, neural networks, and deep learning have accelerated the performance of image-based ML in recent years (2). Digital images, however, are not as abundant in clinical laboratories as they are in other diagnostic specialties, such as radiology or anatomic pathology, possibly limiting future applications of image-based ML in laboratory medicine.

Beyond the limited number of commercial applications, ML research in laboratory medicine also has been growing, although the total number of publications remains relatively low. In recent years, researchers have investigated the utility of ML for a broad array of datasets, such as analyzing erythrocyte morphology, bacterial colony morphology, thyroid panels, urine steroid profiles, flow cytometry, and to review test result reports for quality assurance (6-11).

While some institutions have successfully integrated homegrown ML systems into their workflows, few have successfully transitioned to clinical practice. Despite the development of better performing models, researchers for a variety of reasons often find difficulty with the proverbial last mile of clinical integration. In particular, the literature offers little to no guidance on statistical performance metrics by which to evaluate ML models, the design of clinical validation experiments, or on how to create more modular ML models that integrate with current laboratory medicine information technology (IT) infrastructures and workflows.

In all likelihood, the reason for clinical laboratories’ slow adoption of ML, both from commercial and research sources, is multifactorial, and arguably emanates from more than just the intrinsic limitations of the core technology itself. Similar to other technologies that receive a lot of attention, such as “big data” or blockchain, ML remains a tool that requires a supportive system architecture. While the core technology is demonstrating promising results, its prevalence in daily practice is likely to remain limited until developers and software engineers offer clinical IT systems that allow easy integration with existing workflows.

Machine Learning Outside Labs

As electronic health records (EHR) continue to evolve and accumulate more data, commercial EHR vendors are looking to expand their data access and analytic capabilities. They have begun offering ML models designed for use within their systems and in some cases are allowing access to third-party models. Vendors often package ML software into clinical decision support (CDS), an increasingly popular location for blending ML and clinical medicine.

While CDS tools traditionally rely on rule-based systems, vendors now are using ML in predictive alarms and syndrome surveillance tools, aimed at assisting clinical decision makers in complex scenarios.

In their current state, ML algorithms usually rely on structured data for training and subsequently generating predictions. While a significant portion of EHRs contains unstructured and semistructured data, laboratory information remains one of the largest sources for structured data, and it is not uncommon for ML-based CDS tools to rely heavily on laboratory data as input. As CDS tools proliferate, the role of laboratory medicine in developing, validating, and maintaining these models remains important yet poorly defined.

In addition, similar to calculated laboratory results such as estimated glomerular filtration rate, probability scores generated by ML models that rely on laboratory data could arguably be subject to regulation by traditional governing and accrediting organizations such as College of American Pathologists, FDA, or the Joint Commission.

While the regulation of ML remains an openly debated topic in the field of computer science, the growing consensus among experts in the medical community is that rigorous oversight of these models is appropriate to ensure their safety and reliability in clinical medicine.

In 2017, FDA released draft guidance on CDS software in an attempt to provide clarity on the scope of its regulatory oversight (12). While these guidelines are still subject to change, it’s clear the agency is committed to oversight in this area. Until guidelines are formalized, subjecting ML models to the rigor of the peer-reviewed process may be the next best thing.

To deliver promising ML technology at the bedside, the IT and medical communities may need to collaborate with ML researchers and vendors to support validation studies. Clinical laboratorians may be particularly suited for guiding these types of efforts, owing not only to ML models’ frequent reliance on laboratory data but also laboratorians’ expertise in validating new technology for clinical purposes.

As ML in the post-analytic phase propagates, clinical laboratorians will need to become increasingly attentive to which laboratory data are being used and how. For example, changes within laboratory information systems may have unintended consequences on downstream applications that rely on properly mapped laboratory result data. Health systems will also benefit from clinical laboratorians’ insight on how ML can improve patient care using laboratory data.

Barriers to Development and Adoption

Three categories generally describe common approaches to ML: supervised, semisupervised, and unsupervised. Supervised ML relies on a large, accurately labeled dataset to train an ML model, such as labeling images of leukocytes as lymphocytes or neutrophils for subsequent classification. Currently, the consensus is that supervised ML will generate the best models for targeted detection of known classes of data. But in many cases the data sets required are not large enough or labeled accurately enough. However, the process of curating accurately labeled datasets is difficult and time-consuming.

With EHRs, researchers certainly have greater access to data than in years past. However, health information in its native state often is insufficiently structured for the rigorous development of ML models. For example, predictive alarms and syndrome surveillance tools that use supervised ML often rely on datasets delineated by the presence or absence of clinical disease. While ICD-10 codes are a discrete data element that could be used for labeling purposes, experience at our institution indicates that ICD-10 codes are not documented reliably enough to train supervised ML models.

To avoid performance issues associated with inconsistent labels, data scientists can curate custom labels based on specific criteria to define the classes in their datasets. But criteria for defining classes are often subjective and may lack universal acceptance. For example, sepsis prediction algorithms may rely on clinical criteria of sepsis used at one institution but not another. It will become increasingly important for clinical laboratorians to consider how models are trained and which specific clinical definitions define the functional ground-truth in an ML model for the classes or disease being detected.

In addition to issues with variable criteria for clinical disease, some labels also have intrinsic variability that may preclude ML from optimal performance across institutions. Linear models such as logistic and linear regression have shown poor generalizability between institutions (13,14). In healthcare, the problem is multifactorial and may result from population heterogeneity, or from discrepancies between the ML training population and the use case or test population. Consequently, ML models trained outside one’s institution may benefit from retraining before go-live. However, nothing in the literature supports this practice.

Lastly, the black box nature of ML models themselves poses a well-described barrier to adoption. Computer scientists have sought to elucidate how and why models arrive at the answers they generate in order to demonstrate to end users the decision points used to arrive at a given score or classification, often referred to as explainable artificial intelligence (XAI).

Proponents for XAI argue that it may help investigate the source of bias in an ML model in a scenario where a model is producing erroneous results. Ideally such a tool would also include interactive features to allow correction of the bias identified. However, as ML models become more powerful and complex, the ability to derive meaningful insight into their inner logic becomes more difficult. The practice of investigating methods for XAI is young, and its utility remains an open question.

What’s Next for Machine Learning?

The powerful technology of ML offers significant potential to improve the quality of services provided by laboratory medicine. Early commercial and research-driven applications have demonstrated promising results with ML-based applications in our field. Despite nagging problems with model generalizability, oversight, and physician adoption, we should expect a steady influx of ML-based technology into laboratory medicine in the coming years.

Laboratory medicine professionals will need to understand what can be done reliably with the technology, what the pitfalls are, and to establish what constitutes best practices as we introduce ML models into clinical workflows.

Thomas J.S. Durant, MD, is a clinical fellow and resident physician in the department of laboratory medicine at Yale University School of Medicine in New Haven, Connecticut. +Email: [email protected]

REFERENCES

Rohr U-P, Binder C, Dieterle T, et al. The value of in vitro diagnostic testing in medical practice: A status report. PLoS One 2016;11:e0149856.
Krizhevsky A, Sutskever I, Hinton G. ImageNet classification with deep convolutional neural networks. Neural Information Processing Systems 2012;25:doi.10.1145/3065386.
510(K) Summary: DiffMaster Octavia. https://www.accessdata.fda.gov/cdrh_docs/pdf/K003301.pdf (Accessed December 2018).
Pancholi P, Carroll KC, Buchan BW, et al. Multicenter evaluation of the accelerate phenotest BC kit for rapid identification and phenotypic antimicrobial susceptibility testing using morphokinetic cellular analysis. J Clin Microbiol 2018;56:e01329-17.
Antypas HC. Machine learning and time-lapse microscopy pave the way for faster antibiotic susceptibility testing. https://www.linkedin.com/pulse/machine-learning-time-lapse-microscopy-pave-way-faster-antypas/ (Accessed November 2018).
Cao Y, Cheng M, Hu C. UrineCART, a machine learning method for establishment of review rules based on UF-1000i flow cytometry and dipstick or reflectance photometer. Clin Chem Lab Med 2012;50:2155–61.
Durant TJS, Olson EM, Schulz WL, et al. Very deep convolutional neural networks for morphologic classification of erythrocytes. Clin Chem 2017;63:1847–55.
Wilkes EH, Rumsby G, Woodward GM. Using machine learning to aid the interpretation of urine steroid profiles. Clin Chem 2018;doi:10.1373/clinchem.2018.292201.
Demirci F, Akan P, Kume T, et al. Artificial neural network approach in laboratory test reporting: learning algorithms. Am J Clin Pathol 2016;146:227–37.
Diri B, Albayrak S. Visualization and analysis of classifiers performance in multi-class medical data. Expert Syst Appl 2008;34:628–34.
Huang L, Wu T. Novel neural network application for bacterial colony classification. Theor Biol Med Model 2018;15:22.
Food and Drug Administration. Clinical and patient decision support software: Draft guidance for industry and Food and Drug Administration staff. https://www.fda.gov/downloads/medicaldevices/deviceregulationandguidance/guidancedocuments/ucm587819.pdf (Accessed January 2019).
Moon JB, DeWitt TH, Errend MN, et al. Model application niche analysis: Assessing the transferability and generalizability of ecological models. Ecosphere 2017;8:e01974.
Steyerberg EW. Clinical prediction models: A practical approach to development, validation, and updating. Springer Science & Business Media 2008.