Anchoring POC Quality in Clinical Decision-Making
Study Finds Errors Missed by Routine Quality Assurance
By Bill Malone
Traditional quality assurance for point-of-care (POC) testing relies on correlation analysis to ensure that these devices closely mirror the results clinicians would expect from the central lab. A recent paper, discussed in this issue of Strategies, underscores the importance of linking such evaluations to clinical decision-making.
Even as POC instruments have become ubiquitous in hospitals and other settings, laboratorians have maintained a key role in making sure these devices are reliable. The Food and Drug Administration and lab medicine guidelines recommend that labs employ statistical methods to compare results from POC devices to core lab methods in order to assure quality. The usual techniques include linear regression, assessing bias, and correlation analysis. However, as Johns Hopkins Medicine researchers describe in their recent paper, traditional statistical analysis does not always tell the whole story about how well a POC device performs (J Thromb Haemost 2011;9:1769-1775).
Kenneth Shermock, PharmD, PhD, and colleagues examined the performance of a POC hemostasis device in use at their institution and compared the lab’s traditional quality assessment analysis to their own novel assessment based on clinical decision-making. Patients at four coagulation clinics provided 1,518 pairs of International Normalized Ratio (INR) measures, one venous and one fingerstick per patient, as part of routine quality assurance analyses. The lab’s quarterly calculation of the coefficient of correlation ranged from 0.84–0.91 during the January 2006–March 2008 study period, within acceptable limits for a POC device.
However, when the researchers sorted the paired INR values based on ranges used by clinicians to adjust therapy, they found that the POC device would have led to an incorrect clinical decision 31% of the time compared to the lab’s quality assurance method. In fact, when the INR was low, the results from the POC device would have resulted in the appropriate dose increase less than half of the time. “The fact that these devices were leading us to the wrong clinical decisions one third of the time frankly shocked us and really demanded action,” said Shermock, who is director of the Center for Pharmaceutical Outcomes and Policy at the Johns Hopkins Hospital.
The authors’ strikingly bleak assessment of the POC device relied on a method Shermock validated in previous research. Shermock’s method predicts that, when comparing two INR measurement systems, clinical decisions will agree if INR values fall within the same clinically relevant range. For example, the method predicts that, with an INR below 1.9, clinicians usually increase a warfarin dose; between 1.9 and 3.3, they maintain it; and at 3.4 and above, they decrease it. These ranges are based on Shermock’s observations in previous studies published in 2002 and 2009, in which he directly measured clinical decisions in response to INR values.
The researchers also gained surprising insight about the performance of the POC device when they applied principles of analytic design to their graphical analysis. In contrast to the linear regression plots that the lab prepared for the customary quarterly quality analysis, they combined all of the available data into one graph and shrunk the size of the dots in the scatterplot, maximizing the so-called data-to-ink ratio. They also graphed adjacent histograms of INR values from the POC device and from the core lab. In both of these new graphs, an odd trend became clearly visible: the POC device never reported seven common INR values. “This trend literally appeared to us when we shrunk the dots and saw the diagonal white lines coursing through the scatterplot,” Shermock said. “Those lines are actually there in the other scatterplots, but the dots are so big you can’t see them.” The new graphs also demonstrated a systematic pattern of bias in the device, where INR values < 3 were inflated and values > 4.6 were deflated. When Shermock shared the discovery with clinicians, work began almost immediately on replacing the POC device.
How could these dire problems with the device go unnoticed for so long? Shermock blames overconfidence in correlation studies among clinicians and laboratorians. “The usual statistical techniques might tell us a device has a correlation with the lab of 0.8, but that doesn’t actually provide any direct clinical information. The clinicians may think it does, because they assume that if you have a high correlation, then that means that the POC device is leading to good decisions the vast majority of the time, but that’s actually not correct at all,” Shermock said. “In our study, the device had a correlation with the lab of nearly 0.9, yet after our analysis it was judged to be so unacceptable that there was not even a debate as to whether or not to remove it. This highlights why correlation should not be the analytic standard.”
The paper illustrates a weakness in correlation studies that laboratorians need to come to terms with, according to Frederick Kiechle, MD, PhD. “We as laboratorians cannot forget that when the clinician sees the INR value from a POC device, they will change the dose right at that moment,” he said. “What these authors demonstrate is that laboratorians need to think about how values are going to be used clinically, and then try to adapt their validation activities around that.” Kiechle is medical director of clinical pathology at Memorial Healthcare System and Pathology Consultants of South Broward in Hollywood, Fla. He was not associated with the study.
Although the team at Shermock’s institution went on to evaluate and select a new POC device that performed much better, he still worries that problems with POC devices could be common. In a previous unpublished study, Shermock led a team of researchers at the Cleveland Clinic and found that three of the five POC devices evaluated led to the wrong clinical decision at least 30% of the time, the same proportion as the device scrutinized in the recent paper. “Based on what we’ve seen, I would surmise that this problem could be widespread,” he said. “More often than not, the POC devices that we have tested have had unacceptable performance, and our data also show that conventional measures often do not reflect this poor performance.” If this is the case, the good news is that labs should not find it difficult to exploit either of the analytical methods used in the study, Shermock said. For example, combining historical data and shrinking the dots on a scatterplot to reveal patterns both can be managed in Microsoft Excel with little trouble.
Kiechle noted that the INR ranges in Shermock’s clinical decision model might not apply to every patient. For example, guidelines call for patients with metal heart valves to stay in a therapeutic range of 2.5–3.5. However, this should not stop labs from evaluating instruments by reference to clinical decision points. He pointed out that, in the case of glucose meters, the Clarke Error Grid estimates which values are more or less likely to lead to an error in insulin dose, beyond a conventional scatterplot.
Shermock stressed that the concept could be applied to other types of devices. “When I talk about our techniques here, I make it a point that I don’t consider this to be just an INR thing, but that this is a framework of analysis that could be applied to any of the myriad POC devices that are coming to market that result in clinical decisions,” he said. “For doctors and patients, their ultimate concern is whether the device leads them to the correct clinical decision or not. Our analytic method provides direct information about clinical decision-making. Since we have shown that it can lead to radically different inferences regarding the acceptability of a device, we urge that laboratorians and policy makers include it as a core component of their analytic framework for POC devices.”
Join us for a discussion of this article, and interact with your peers, on LinkedIn