A male and a female health professional look at a tablet the female professional is holding.

skynesher/Getty Images

No systematic review of diagnostic test accuracy is immune to “spin” or overinterpretation, according to the authors of an analysis published in Clinical Chemistry. Researchers in Canada and the Netherlands evaluated diagnostic test accuracy systematic reviews published in high-impact-factor journals and found that nearly 80% had some form of overinterpretation. The systematic reviews in this clutch of journals fared better than the authors’ prior analysis covering a broader spectrum of journals. The investigators cited a database with stringent vetting protocols as the key driver in reducing incidence of spin in the high-impact-factor series.

“Spin” misleads readers to believe that a study’s results are more optimistic than what the data actually support. Discrepancies between study aims and conclusions, highlighting favorable results or stronger conclusions in a paper’s abstract more than in its text, or including language that hypes the results are examples of this practice. Abstracts in particular are prime areas for spin, as these summaries generally are too brief to include important nuances and other details, wrote John P.A. Ioannidis, MD, DSc, in a related editorial.

When the authors  originally explored this topic, they “found ‘spin’ to be quite prevalent in a series of reviews published in unselected journals,” Trevor McGrath, MD, the study’s lead author and a radiology resident at The Ottawa Hospital/University of Ottawa, told CLN Stat. Few of these reviews had been published in what many would have considered high-impact-factor journals, McGrath said.

“Although impact factor is not necessarily correlated with methodological quality or completeness of reporting, there is evidence that clinicians may view higher-impact-factor journals as publishing higher-quality work,” he explained. For this current review, McGrath and his co-authors focused on journals with a higher impact factor than the previous series, zeroing in on systematic reviews of diagnostic test accuracy containing a meta-analysis in a laboratory medicine, microbiology, medical imaging, or general medicine journals with an impact factor of five or greater.

“We applied the identical scoring system to the current series of reviews as we had to the original series and compared the incidence of ‘actual overinterpretation’ in the abstract and full text and the incidence of ‘potential overinterpretation’ in the review as a whole between series,” McGrath said. Actual overinterpretation refers to reviews that make test performance look more favorable than the results justify. Potential overinterpretation refers to practices that facilitate overinterpretation but make formal assessment impossible. An example of this is not reporting sample size or confidence intervals around summary estimates.

Among 137 systematic reviews, 79% contained a form of potential spin in the results. Sixty-three and 52 reviews had ≥1 actual overinterpretation practice in the abstract and in the full-text report, respectively. The most frequent spin practice involved a positive conclusion that didn’t reflect the reported summary accuracy estimates. Interestingly, investigators noted a “relative paucity” of systematic reviews of diagnostic test accuracy published in the higher-impact-factor journals with diagnostic specialty journals, McGrath added. More than half of the study’s sample size came from the Cochrane Database of Systematic Reviews, a source known to produce higher-than-average-quality reports.

Overall, incidence of spin was lower in the high-impact-factor series compared with the authors’ 2017 report. “Cochrane reviews were shown to be the main driver of this finding,” noted McGrath.

Several factors give Cochrane an edge: Its protocols are always peer reviewed and published, and unlike other journals, its articles don’t have word count limitations for abstracts. “In addition, because the journal of publication has been predetermined for Cochrane systematic reviews, there may be less pressure for authors to make the results appealing or interesting,” observed McGrath and colleagues. In a sub-analysis that excluded the Cochrane journals, the investigators found no significant differences in actual overinterpretation between the high-impact-factor non-Cochrane reviews and reviews from the prior series.

Ioannidis questioned whether there was some spin to this assumption about Cochrane’s superiority. “A problem with these assessments is that the criteria against which reviews are judged are often crafted by researchers who are sympathetic to or even members of the wonderful Cochrane effort. Some of the better scoring may be a bit circular. It is less clear whether this superior score translates eventually into these reviews being truly more useful and less misleading,” noted Ioannidis.

Whether or not a systematic review with less spin is more useful than one with higher spin is difficult to answer, Ioannidis continued: “Spin checklists are exercises in evidence appraisal, and they have some clear educational value as such. However, it does not mean that they necessarily have high discriminating ability in detecting bias, let alone in probing the utility of the systematic review itself.”

When using a systematic review to alter clinical practice or make policy decisions, “the methodology and results of the review ought to be examined carefully,” McGrath advised. Readers should scrutinize conclusions to ensure they’re in line with the results of the review. In turn, this ensures that clinical practice and policy decisions affecting patients are based on sound, unbiased evidence.