Researchers are harnessing big lab data to fuel diagnostic modeling, quality control and assessment, and even epidemiological investigations. A review in Clinical Biochemistry offers seven perspectives of real-world big-data studies in laboratory medicine and explains how artificial intelligence (AI) will drive the future of this research.

AI will open up numerous potential applications for clinical laboratory big data, Ling Qiu, director of Peking Union Medical College Hospital’s clinical laboratory and the paper’s corresponding author, told CLN Stat. “Furthermore, as an important part of real-world big data, clinical laboratory big data combined with the data in other fields, like a patient’s diagnosis, diet record, etc. will be used to determine resource allocation, complex disease diagnosis, and judge the prognosis of patients,” said Qiu.

Labs face four key challenges in harnessing data for developing reference intervals, patient data-based real-time quality control, and other purposes:

  1. The logical structure of data isn’t clear because of the different database architectures used in constructing information systems.
  2. An absence of standardized rules and methods in data mining leads to reliability and validity issues in data models and research results.
  3. Due to a lack of data resources, neglect of information construction, limited demand of service, and incomplete knowledge-based systems, not enough attention is paid to the use of evidence from real-world studies to guide clinical decision-making.
  4. The connection between clinical information and test results is insufficient to effectively mine the clinical significance of the data.

“The greatest challenges of studying real-world data in lab medicine vary from region to region due to the difference of development level in different regions,” said Qiu.

The authors addressed the following perspectives:

Establishing reference intervals (RIs)

Two types of RIs exist: direct and indirect. The direct approach uses priori or posteriori sampling, applying exclusion criteria before and after samples are collected. The indirect approach establishes RIs through real-world data mining of lab data. Both have advantages and disadvantages and face technology and methodology challenges. 

Patient data-based real-time quality control (PBRTQC)

Deficiencies in internal quality control (IQC) methods have called attention to this method. PBRTQC, which detects performance changes near the middle instead of the extremes of the analyte concentration range, incorporates five steps. Labs should be aware that PBRTQC detects performance changes near the middle instead of the extremes of an analyte’s concentration range.

Diagnostic or prognostic modeling

AI and machine learning technology have made it possible to combine measurements of multiple blood analytes and further develop disease diagnostic and prognostic models. Using real-world data calls for model training and model validation steps. Patient outcomes or clinical diagnoses need to be included as dependent variables in these steps. 

Epidemiological investigation

Investigations that use real-world data from clinical labs can help provide accurate, reliable, and timely test results to clinicians and inform clinical decision-making and patient care. While the external validity of real-world big-data studies (RWBDS) can be high, internal validity is low, summarized the authors, adding that future studies should address data integrity improvements and reliability of the results from RWBDSs.

Laboratory management

Retrospectively analyzing a lab’s quality indicators, including misidentification errors, test transcription errors, and inappropriate requests, can help improve quality monitoring during the testing process. The International Federation of Clinical Chemistry and Laboratory Medicine Working Group on laboratory errors and patient safety and other organizations have issued a number of quality control indicators. Clinical labs can look at these documents to develop a model suitable for their needs when conducting retrospective analyses.

Analysis of sources of variations for analytes

Real-world big data can help assess factors that influence blood analyte results. Clinical labs can use this data “to explore the distribution of analytes at various levels of some factor and to compare whether there are differences in these distributions,” explained the authors.

Data mining for external quality assessment (EQA)

This quality verification measure has helped to increase understanding of real-world big data. EQA based on this data “can overcome the shortcomings of low frequency of EQA samples of traditional EQA. It is possible that a proper EQA scheme with a well-combined PBRTQC utilizing real-world big-data mining could be an alternative solution for IQC in the future,” summarized the authors.

AI could serve as a game changer in big data research and in laboratory medicine. “The establishment of autoverification models based on decision trees and other algorithms, an optimally performing AI can be obtained through manual training to automatically review test reports. Furthermore, a weighted algorithm is often adopted when studying PBRTQC,” suggested the authors. In the field of laboratory medicine, convolutional neural network-based deep learning could help diagnose various hematology diseases through identification of different blood cells.