American Association for Clinical Chemistry
Better health through laboratory medicine
March 2009 Clinical Laboratory News: Should This Test Be Ordered?

CLN Banner Logo

March 2009: Volume 35, Number 3


Should This Test Be Ordered?
Evolving Technology, Evidence, Require Close Laboratorian-Clinician Ties
By Gina Rollins

As the medical armamentarium expands, physicians face an ever-growing array of tests to assist in the diagnosis, classification, and monitoring of both wellness and disease. Clinical guidelines provide assistance in the choice and use of diagnostics for selected conditions, but new tests are continually being introduced, and existing ones used for new purposes or populations—often without conclusive evidence of benefit. Concern about the appropriate use of tests has taken on more importance in light of rising healthcare costs and the introduction of companion diagnostics that link lab results with qualification for specific treatments. Recently, a number of reports in the medical literature have grappled with appropriate test usage, and as this issue gains more prominence, laboratorians may be called upon to provide clinicians a better roadmap to the testing maze.

What creates the gap between clinical evidence and practice? “Practice runs far ahead of evidence. It’s a predictable pattern. Very often there’s a high profile study that finds a certain test does something and all of a sudden there’s demand for the test. But very often it has little to do directly with helping people live longer, feel better, or save their healthcare dollars,” said David Bruns, MD, professor of pathology at the University of Virginia Medical Center in Charlottesville. “Clinical chemists have a role in helping sort out which test is appropriate in which setting, and what it’s capable of doing and not doing.”

Creeping Into Practice

One example of a test that has gained traction in practice without solid evidence for its best use is the anti-cyclic citrullinated peptide (anti-CCP) antibodies assay, according to Robert Shmerling, MD, clinical chief of the division of rheumatology at Beth Israel Deaconess Medical Center in Boston and associate professor of medicine at Harvard Medical School. Use of this test as a diagnostic for rheumatoid arthritis has taken off in recent years and has been likened to a revolution in the field. But Shmerling begs to differ. “Everybody loves it, and everybody is ordering it, but it isn’t necessarily going to move them in the right direction to make a diagnosis,” he noted. Shmerling believes that, based on current evidence, the utility of this test is limited to a relatively modest subset of patients, including those who have hepatitis C with cryoglobulinemia, and individuals with a moderate pre-test probability of having RA but in whom there still is uncertainty about the diagnosis even after a full clinical evaluation.

The anti-CCP assay has a reported sensitivity between 50% and 70%, and specificity between 95% and 98%. While the test is more specific than the assay for rheumatoid factor, its sensitivity, similar to that of the rheumatoid factor, is a limitation, according to Shmerling. The problem with using either test in suspected cases of RA is that RA is a clinically defined disease. “You can’t diagnosis it with a blood test, but people rely a lot on a blood test. They’re over-utilized and not terribly well justified,” he said.

At Beth Israel Deaconess, questionable usage of the test has had economic consequences. Shmerling found that 140 anti-CCP tests were ordered in 2005, versus 425 in 2007. In 2005 the test was ordered exclusively by rheumatologists; by 2007 as many as 15% were ordered by other specialties. Further investigation revealed that the hospital loses about $52 per test since reimbursement does not fully cover the cost charged by a reference lab that performs the assay. After factoring in the incidence of newly diagnosed RA patients for whom the test was ordered, Shmerling calculated that the cost per true-positive result was significant, ranging from $107 to $770, depending on the patient’s insurance coverage (Arch Intern Med 2009;169:9–14).

Infrequently ordered tests like anti-CCP are not the only ones that find their way into broader usage. “As with drugs, tests proposed for one purpose then get used for another,” noted Alfred Berg, MD, MPH, professor of family medicine at the University of Washington in Seattle. “A lot of times it makes common sense: a marker is high in a person with a disease, so why not use it in healthy people as a screening test? But the question you want to ask is whether doing the test and intervening early will make a difference?” Berg is chair of CDC’s Evaluation of Genomic Applications in Practice and Prevention (EGAPP) Working Group, and former chair of the U.S. Preventive Services Task Force.

On the Genetic Front

Issues of costs and evidence about appropriate usage have come to the fore particularly in the realm of genetic testing. For example, HER2 testing now is recommended for all patients with invasive breast cancer, and only patients with positive test results are recommended for trastuzumab therapy, which costs about $100,000 annually. Yet there is no consensus about the best testing method, and little data about whether all eligible patients are tested for HER2, which methods are used for those who are tested, and how indeterminate results are confirmed, according to Kathryn Phillips, PhD, professor of health economics and health services research director at the University of California at San Francisco Center for Translational and Policy Research.

At the same time, concerns have been raised about lab methods for analyzing FISH results to amplify HER2 status. One recent study found that 10% of FISH assays had misclassified HER2 positive results as HER2 negative, most likely due to flawed analysis procedures (Clinical Cancer Research 2008;14:7861–7870).

CDC launched the EGAPP initiative in 2004 as a means of establishing and evaluating a systematic, evidence-based process for assessing genetic tests, 1,000 of which now are available for clinical testing. The EGAPP Working Group issued recommendations in January on the role of tumor gene expression profiling in breast cancer, UGT1A1 genotyping in metastatic colon cancer treated with irinotecan, and genetic testing strategies for Lynch syndrome in newly diagnosed colorectal cancer patients to reduce morbidity and mortality in relatives. “The EGAPP initiative is an experiment by CDC, to see whether methods used elsewhere can be adapted to this new age of genetic tests,” said Berg. “If the methods work, our hope is that others will use similar methods to look at the huge range of genetic tests out there.”

Regulators also are grappling with an evolving evidence base and practice environment, particularly in the genomic field. As reported in the February issue of CLN, the FDA Oncologic Drugs Advisory Committee sought guidance on incorporating retrospectively identified biomarkers to support labeling claims, since evidence about a biomarker’s predictive or prognostic value might not be integral to or concurrent with development of a drug (“Personalized Medicine in Colorectal Cancer Treatment”, p. 15). The manufacturers of panitumumab (Amgen) and cetuximab (ImClone) had requested labeling changes in light of clinical trials that indicated that wild-type KRAS status was associated with better progression-free survival in patients with metastatic colon cancer who had been treated with either drug.

Do Guidelines Help?

Guidelines issued by numerous professional associations and governmental agencies are a valuable resource in translating often voluminous amounts and levels of evidence into workable testing and treatment options. NACB has been quite active in that regard, with 17 lab medicine practice guidelines already created or under development. Like many organizations, NACB takes pains to collaborate and coordinate its recommendations with other groups, but guideline discordance does occur, complicating testing and treatment decisions. A prime example is PSA screening for prostate cancer. The USPSTF recently recommended not screening men older than age 75, while both the American Urological Association and the American Cancer Society recommend offering screening to men at age 50 who have a life expectancy of 10 years. Meanwhile, the National Comprehensive Cancer Network recommends that screening begin at age 40.

Harmonization is particularly needed in the genetic testing realm. Citing recommendations put out by numerous public and private groups, EGAPP suggested that “a coordinated approach for effectively translating genomic applications into clinical practice and health policy is still needed.” Phillips contends that “creative collaboration” between academia, industry and government will help fill in the evidence base. She cites as an example an initiative recently launched by the Aetna Foundation to elucidate how physicians use HER2 test results to inform treatment decisions.

But even when there isn’t a clear test or treatment protocol, guidelines serve a valuable purpose, according to Berg. “They potentially reduce the huge amount of noise out there so you decrease the risk of being whipped around by every new article. A new finding needs to be placed in context to know whether it makes a difference in clinical practice or is an outlier,” he observed. A 2008 NACB survey of lab medicine practice guideline users underscored the importance of guidelines. Nearly three-quarters of respondents said that clinical laboratory guidelines drive clinical practice changes in their work setting, and 80% indicated that practice guidelines impact lab use. Full survey results are available online.

  

 Improving Diagnostic Accuracy Studies

A sticky wicket in evaluating diagnostic evidence for guidelines is that studies and systematic reviews of test accuracy historically have not been as robust as their randomized controlled trial counterparts. But groundbreaking guidance and considerable ongoing research is boosting the quality of diagnostic testing studies. Bruns was a key participant in the Standards for Reporting of Diagnostic Accuracy (STARD) initiative, which in 2003 published a checklist of 25 items and a flow diagram for authors to use in reporting studies of diagnostic test accuracy (Ann Intern Med 2003;138:40–44). STARD now has been adopted by at least 200 scientific journals worldwide, and its impact is being felt. “There’s early evidence that there’s been some improvement in reporting,” said Bruns. Analysis of a precursor STARD checklist co-developed by Bruns and used in Clinical Chemistry prior to STARD found that having the checklist improved the quality of studies reported in Clinical Chemistry in comparison to another journal that did not use the list (Clinical Chemistry 2004;50:530–536).

Some of Bruns’s STARD initiative colleagues, writing in their capacity as the Cochrane Diagnostic Test Accuracy Working Group, have outlined advancements since publication of STARD in understanding methods for assessing study quality and conducting statistical analyses of test accuracy (Ann Intern Med 2008;149:889–897). For instance, new hierarchical random-effects models have been developed for use in meta-analyses, along with methods for estimating summary ROC curves and summary estimates of sensitivity and specificity. The group has endorsed the use of the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) checklist for assessing the quality of diagnostic test accuracy studies, which also was published in 2003 (See Table 1, below).

Table 1
Quality Assessment of Diagnostic Accuracy Studies (QUADAS) Checklist
The QUADAS checklist was developed as an evidence-based tool to assess the quality of diagnostic accuracy studies included in systematic reviews.

Item #

Description

1.

Was the spectrum of patients representative of the patients who will receive the test in practice?

2.

Were selection criteria clearly described?

3.

Is the reference standard likely to correctly classify the target condition?

4.

Is the time period between reference standard and index test short enough to be reasonably sure that the target condition did not change between the two tests? (disease progression bias)

5.

Did the whole sample or a random selection of the sample receive verification using a reference standard of diagnosis? (partial verification bias)

6.

Did patients receive the same reference standard regardless of the index test result? (differential verification bias)

7.

Was the reference standard independent of the index test (i.e. the index test did not form part of the reference standard)? (incorporation bias)

8.

Was the execution of the index test described in sufficient detail to permit replication of the test?

9.

Was the execution of the reference standard described in sufficient detail to permit its replication?

10.

Were the index test results interpreted without knowledge of the results of the reference standard? (test review bias)

11.

Were the reference standard results interpreted without knowledge of the results of the index test? (diagnostic review bias)

12.

Were the same clinical data available when test results were interpreted as would be available when the test is used in practice? (clinical review bias)

13.

Were uninterpretable/ intermediate test results reported?

14.

Were withdrawals from the study explained?

Reproduced under Open Access from Westwood, ME, Whiting, PF, et al. BMC Medical Research Methodology 2005, 5:20, doi:10.1186/1471-2288-5-20.

Beyond these methods, other researchers have proposed new paradigms for evaluating diagnostic tests. For instance, Alexander Sutton, PhD, and colleagues recently published a comprehensive approach to integrating evidence synthesis and decision modeling to consider three key questions: is a test worth doing; what is the test’s optimum operating point; and if more than one test is available, which is the best (Med Decis Making 2008;28:650–667). Sutton, a reader in medical statistics at the Centre for Biostatistics and Genetic Epidemiology at the University of Leicester (U.K.), evaluated his framework using computationally intensive simulation methods to simultaneously synthesize primary data and assess the economic decision model. “In looking at different studies we wanted to take into account the variability beyond using different thresholds and reflect the variabilities we can’t explain,” Sutton said. “Our model takes different factors into account simultaneously to reflect all the uncertainties.” Part of Sutton’s work was funded by the U.K. National Institute for Health Research Health Technology Assessment program, and his findings eventually may form the basis of or be used by the agency in evaluating appropriate tests for various conditions.

Yet another model has been proposed for evaluating when a new test should replace the existing reference standard. Researcher Paul Glasziou, MBBS, PhD, and colleagues recently outlined three principles for considering a new reference test, including that the consequences of the new test can be understood through disagreements between the old and new reference test, that resolving the disagreements can be accomplished through use of a fair, but not necessarily perfect “umpire” test, and that umpire tests can be used for comparison anywhere on the timeline of past risk factors and future outcomes (Ann Intern Med 2008;149:816–821). Adoption of this model would require rethinking traditional modes of comparison, according to Glasziou, who is professor of evidence based medicine at the University of Oxford (U.K.). “If you’re evaluating a test you think is better than the current reference standard, you need to throw away the old framework of evaluating sensitivity and specificity,” he explained. “In the areas where the tests agree, there would be no change in management as a result of using the new test. The possible consequences of switching from the old to new reference test need only be considered in the cells of disagreement between the tests.”

The Role of Laboratorians

While researchers hash out both new tests and emerging means of evaluating evidence, laboratorians have a prominent role in educating clinicians about the proper use of tests and improving adoption and utilization of evidence-based testing protocols. First and foremost is ensuring top-notch testing practices. “No matter what new tests or evidence becomes available, we need to do what’s necessary to ensure high standards of quality in the performance of lab services,” noted Stephen Kahn, PhD, DABCC, FACB, vice chair of laboratory medicine and pathology and professor of pathology, cell biology, neurobiology and anatomy at Loyola University Medical Center in Chicago. Based on results of the 2008 survey and other member feedback, NACB is focusing on supporting members in implementing guidelines, according to Kahn, who is president of NACB and helped conduct the survey.

Active participation in guideline development and championing the use of and adherence to guidelines is equally important, according to Bruns. “Guidelines almost always involve some diagnostic test, and if clinical laboratorians are not involved, inevitably the guidelines are not right,” he observed. Approximately 80% of the NACB survey respondents indicated that having a lab director or pathologist champion was the most crucial factor in implementing guidelines. “No matter what organization issues a guideline, it still requires individuals, in whatever kind of practice setting, examining the key issues and recommendations and determining how to translate them into practice,” said Kahn.

Glasziou has a particular concern that laboratorians and physicians work together to understand and incorporate evolving science into testing and treatment decisions. “We need to recognize that our categorization of who is and isn’t diseased drifts with time,” he explained. “As we get better technologies to detect what histologically may appear to be disease but what actually is a different spectrum of disease, we need to have some discussion about what should go into the arguments about changing the definition of disease.”

Although lab resources are stretched thin, active engagement with clinicians can help bring clarity to diagnostic choices and better understanding of a test’s utility. As an example, Bruns participates regularly in grand rounds. “Once you get known, you get people calling you with all kinds of questions and you can be a real resource in helping clinicians sort out which test is of benefit,” he noted. Laboratorians’ expertise can steer clinicians away from faulty use of tests and explain analytic issues that may be confounding the physician’s clinical work-up. For instance, a colleague puzzled over a test result called Bruns, who discovered there was a hook effect at work with this particular specimen and assay. “There are all kinds of things like that that laboratorians know. They seem little, but can make the diagnosis,” he said.

Laboratorians also will do well to revisit their results reporting protocols to ensure that there are clear explanations about the significance of test results, according to Berg. “Put the tests in context with some clinical discussion on the report,” he recommended. “What does the result mean? Is or isn’t the patient at high risk for whatever condition? It needs to be communicated clearly.”

The bottom line for making sense of new technologies and evidence is the laboratorian-clinician link, according to Kahn. “It’s a mutual responsibility between the ordering physician and the people in the lab. We all have to be vigilant about it.”