The Trials of Testosterone Testing
Immunoassays Faulty in Women and Children, But What’s the Solution?
By Genna Rollins
Are there times when a test is so unreliable and imprecise that it simply shouldn’t be used with certain patients? That’s the debate swirling around total testosterone assays—direct immunoassays in particular—when used for testing in women and children. Concentrations of this steroid hormone may be as low as 0.17 nmol/L (5 ng/dL) for these populations, compared with 11.1 nmol/L (320 ng/dL), the lower range of normal in men. A growing body of evidence has clearly demonstrated that the performance of some immunoassays at such low concentrations is sub-par at best, prompting experts to question their use in women and children. Earlier this year, a consortium of professional associations and government agencies examined the issue and is now calling for long overdue testosterone assay standardization, an initiative that has major implications for manufacturers, labs, and public health.
William Rosner, MD, who co-chaired the consensus conference, has been a vocal advocate for improving the assays. “This [standardization effort] has implications for individual patients and for researchers. There’s a push on to treat women with testosterone, but we can’t know how much to give them or whether to give them any if we don’t have a way to accurately measure testosterone levels,” he said “At the same time, researchers can’t compare their results to anyone else’s, and they can’t compare their own results to results they obtained 10 years ago if they changed the method they’re using to measure testosterone.” Rosner is a professor of medicine at Columbia University in New York City.
The consensus effort, spearheaded by The Endocrine Society and Centers for Disease Control and Prevention (CDC) and endorsed by AACC, is encouraging the expert scientific and medical communities, third-party payers, manufacturers, funding entities, journals, and other stakeholders to undertake specific efforts to aid in improving the accuracy of testosterone measurements. The group’s consensus statement has been submitted for publication in The Journal of Clinical Endocrinology & Metabolism.
New Uses, More Tests
Issues about the performance of testosterone assays have risen to the fore as more potential applications for the analyte have become apparent. Testosterone levels once were primarily used in the work-up of men suspected of hypogonadism and in gender assignment for newborns with ambiguous genitalia.
More recently, research—not all of it concordant—has suggested testosterone may have a protective benefit in osteoporosis, type 2 diabetes, cardiovascular disease, obesity, and depression in men. Androgen replacement therapy in women also has gained favor as a means of boosting libido, bone density, and muscle mass. In addition, there has been growing recognition of and better treatments for polycystic ovary disease, a disease of androgen excess and the most common hormonal disorder in women of reproductive age. Meanwhile, the epidemic of obesity in children has been associated with precocious puberty, thereby increasing the need to determine androgen status in these patients. All of these factors have pumped up the volume of testing, particularly in women and children, who had not been part of the traditional testing population.
Each method for measuring total testosterone has advantages and disadvantages, some of which are listed below.
Well-documented reference intervals in different populations
Large serum volumes can be used, increasing sensitivity Labor intensive, costly, cumbersome, time consuming
Requires significant technical skills
Susceptible to matrix effects
Simple, rapid, and in-expensively performed
Can be automated
Often overestimates testosterone concentration
Results, reference intervals method dependent
Limited accuracy at lower concentrations
Very accurate when properly validated
Throughput comparable with RIA after extraction and chromatography
|Source: Adapted from J Clin Endocrinol Metabl 2007;92:405–13|
How Bad are They?
Back in 1996, Robert Fitzgerald, PhD, and David Herold, MD, PhD, were among the first to document the problems with direct immunoassays for testosterone. In an analysis comparing their electron-capture, negative chemical ionization gas chromatography/mass spectrometry assay with the first fully automated nonradioactive testosterone immunoassay cleared by the Food and Drug Administration (FDA), the investigators found an excellent correlation in measuring testosterone in men, with r2 = 0.98 (Clin Chem 1996;42:749–55). However, the story was quite different in women. “The R-squared was 0.31, which basically says there’s no correlation,” said Fitzgerald, a professor of pathology at University of California San Diego and associate director of clinical chemistry at VA San Diego Healthcare System. In their landmark study, he and Herold called for a reference method to standardize the assays.
Seven years later, French researchers compared the performance of 10 direct immunoassays—eight nonisotopic and two isotopic—against isotope-dilution gas chromatography-mass spectrometry (ID/GC-MS) and found that in the low female range, some assays overestimated testosterone levels by up to 500% (Clin Chem 2003; 49:1381–95). Seven of the immunoassays overestimated concentrations in women and gave median concentrations in men that differed significantly from ID/GC-MS results. The researchers concluded that none of the immunoassays were reliable enough to investigate the low testosterone levels that would be expected in women and children.
In an accompanying editorial, Fitzgerald and Herold explained that they had used a random number generator to produce values close to the average female concentrations found by the French researchers (Clin Chem 2003;49:1250–1). “Although not intended to be a statistically rigorous proof that random numbers are better than measuring female testosterone values with immunoassays, guessing appears to be nearly as good as most… and clearly superior to some!” they commented. “Are assays that miss target values by 200–500% meaningful? Guessing would be more accurate and additionally could provide cheaper and faster testosterone results for females – without even having to draw the patient’s blood.” Other researchers have reported similar problems with direct immunoassays in measuring low concentrations of testosterone (J Clin Endocrinol Metab 2004;89:534–43).
In 2007, The Endocrine Society issued a position statement on the utility, limitations, and pitfalls in measuring testosterone and concluded that “the manner in which most assays for total testosterone are currently performed is decidedly unsatisfactory” (J Clin Endocrinol Metab 2007;92:405–13). Among other actions, the position statement called for accuracy-based rather than comparative proficiency testing for testosterone assays, and for labs to avoid using direct immunoassays with women and children. Although some progress has been made since publication of the position statement, “traction on this issue has been slow. We’ve been like Sisyphus,” Rosner said, referring to the king in Greek mythology who was punished in an endless cycle of forcing a huge boulder up a hill, seeing it roll back down, and having to start the process over again.
How Did We Get to this Point?
Rosner believes a big tent will be needed to achieve standardization of total testosterone assays because a complicated web of factors has contributed to their shortcomings. Medical journals, laboratories, regulators, lab testing and accrediting bodies, diagnostic manufacturers, and even payers all have had a hand in promulgating and perpetuating a lackluster assay, he contends.
Conventional radioimmunoassays (RIA) for measuring testosterone were developed in the early 1970s. These assays, which involve denaturing steroid-binding proteins to release testosterone followed by purification steps that remove numerous potentially interfering metabolites, are highly reliable when properly validated, according to experts. But they are cumbersome, time-consuming, costly, and, like other antibody assays, can have cross-reactivity issues, according to Frank Stanczyk, PhD, professor of research in obstetrics and gynecology and preventive medicine at the University of Southern California Keck School of Medicine in Los Angeles. “It’s at least a two-day assay. However, a well-validated RIA preceded by organic solvent extraction and chromatography steps in a good laboratory is typically a very accurate and precise assay for the majority of applications,” he said. “I want to emphasize that most of our knowledge about steroid hormones in endocrinology is based on these conventional RIAs.”
The attention to detail required for conventional RIAs was reflected in the medical literature of the time, according to Rosner. “When the immunoassay was invented, the methods section in published studies was one-and-a-half pages long because it was a method in the process of being invented and perfected. The journals dealt with those assays with enormous care,” he observed.
As time went on, direct chemiluminescent, enzymatic, and fluorescent immunoassays came on the scene and surpassed conventional RIAs in use. These direct immunoassays provide rapid and inexpensive results, but they have several serious drawbacks as well. Due to lack of specificity of the antibodies, they tend to overestimate testosterone measurements. They also are prone to matrix effects and have inadequate sensitivity to measure low levels of testosterone accurately and reliably.
Even with these shortcomings, the direct assays work reasonably well in the majority of patients who need a testosterone test—men. “They were really developed to measure testosterone in men, and for most men, you just needed to know whether it was high or low. Precision at the lower end was not that important when the assay was first developed,” explained Hershel Raff, PhD, director of the endocrinology research laboratory for ACL Laboratories and professor of medicine at the Medical College of Wisconsin in Milwaukee.
Perhaps because the direct immunoassays are acceptable in evaluating testosterone levels in most men, it was an easy leap to assume they would work well in women and children, too. As the assays went into the mainstream and became easier to perform, “everyone stopped paying attention to how the test was done and the whole methods section in any journal article was reduced to ‘testosterone was measured by immunoassay’,” Rosner explained.
He is not the only person who sees plenty of blame to go around. “In my opinion, we got to this point because of FDA’s 510(k) process itself,” said Fitzgerald. “It almost legislates lack of any improvement, because a new assay only has to be substantially equivalent to the predicate device. So you’re better off demonstrating substantial equivalence than to go through a bunch of hoops to show why your assay’s better and the predicate device is wrong.”
Proficiency testing programs have contributed to the problem as well, some experts contend. “The most widely used quality control program is CAP’s [College of American Pathologists] and they don’t have an accuracy standard; they have a comparative standard,” noted Rosner. “If my lab and your lab both use Company A’s platform to measure testosterone, the quality control is only based on the fact that the answers have to agree on that platform. They don’t have to agree with company B’s or C’s machines.” That particular issue is changing, however, as CAP has launched a new accuracy-based proficiency testing program for steroid hormones.
Rosner also lays blame at the feet of manufacturers whom he believes have not done more to improve their immunoassays, physicians and laboratorians who have continued to tolerate inaccurate results, and research funders who have not demanded testosterone assay standardization. “It’s a terrible, vicious cycle of badness,” he said.
Hope on the Horizon
If the prospects for improvement look bleak given all the factors involved, Rosner is quick to disagree. “We’ve taken some great steps forward and I’m finally getting optimistic,” he said. “Seven years ago, I couldn’t see the end in sight, but now I do.” Rosner’s buoyancy stems, at least in part, from notable strides towards standardization that CDC has taken since 2007. Among other things, the Division of Laboratory Sciences (DLS) has developed an accurate and precise reference method. The method uses a reference material from the Australian National Measurement Institute, the only one that’s currently available for testosterone, according to Hubert Vesper, PhD, chief of the protein biomarkers laboratory in CDC’s DLS. The National Institute of Standards and Technology (NIST) also has developed a serum-based product, which is considered a secondary reference material.
CDC took another important step earlier this year when it launched a standardization program for testosterone. The program is open to manufacturers, commercial and reference labs and involves fresh-frozen, single-donor serum samples with reference values. “This program is similar to CDC’s cholesterol standardization program and the National Glycohemoglobin Standardization Program,” explained Vesper. “We provide materials that are value-assigned to the participating laboratories for calibration. Once they’ve adjusted the calibration, we assess the consistency of that calibration over time to make sure measurements are accurate and there are not trends or changes.” Ten participants have enrolled in the program, he indicated.
CAP also has stepped up to the plate by offering, starting earlier this year, a separate accuracy-based proficiency testing program for steroid hormones, including testosterone. “It’s voluntary and at present is not required for CLIA certification,” said John Eckfeldt, MD, PhD, Ellis Benson professor and vice chair for clinical affairs in the department of laboratory medicine and pathology at the University of Minnesota in Minneapolis. “Labs that are concerned about accuracy and want to pay the extra money and take the extra time to participate will do so.” Eckfeldt represented CAP at the testosterone standardization consensus conference.
The cost of CDC’s program—$9,000 in the first year and $6,000 subsequently—may make it unaffordable for many labs, contended Fitzgerald. “Even big commercial labs blink at that kind of expenditure, so there has to be a second-tier test that can be traced to that,” he said. Vesper emphasized that CDC doesn’t have funds to support standardization specifically, so the cost of the program is being financed by individual labs through the CDC Foundation. However, the agency hopes to find solutions that will enable smaller labs to benefit from the effort. CAP’s new accuracy-based initiative will cost a fraction of CDC’s program—about $450 annually—but is far less extensive than CDC’s, containing only four samples per annual mailing, according to Eckfeldt.
Getting Manufacturers on Board
As crucial as these efforts are, greater participation by manufacturers will be needed for testosterone immunoassays to leap forward in accuracy and precision at lower concentrations, according to experts. “Unless you get the manufacturers involved in improving the accuracy of their methods, you’re never going to take it very far in this country,” Eckfeldt said. “Clinical labs don’t have the time or wherewithal to go changing their calibration or improving an assay’s performance. They simply use the manufacturer’s reagents, instrument, and calibrator 99 percent of the time.”
The consensus statement calls for manufacturers to continue to work on new methodologies to ensure sensitive, specific, accurate, and cost-effective testosterone measurement, and Rosner also stressed that manufacturers have to be part of the solution. “We want their collaboration. If they can fix their assays and do it for a price that’s less than mass spec, we’d be 100 percent behind it,” he said.
Representatives from both Abbott Diagnostics and Beckman Coulter agreed that manufacturers have a crucial role in improving testosterone assays. “Clinical laboratories look to diagnostic manufacturers to provide reagents and instrumentation which lead to the best and most accurate results for medical diagnostics,” said Linda C. Rogers, PhD, DABCC, FACB, manager of market development for Beckman Coulter. In the end, “physicians need to be able to trust low concentration testosterone measurements. However, current research shows that immunoassays in the lower ranges are no better than random number generators,” added Stuart Blincko, PhD, principal research scientist at Abbott Diagnostics.
Rogers indicated that Beckman Coulter has conducted feasibility studies to evaluate the development of a highly sensitive testosterone assay, while Blincko stated that Abbott is developing a direct immunoassay that will measure both low and high testosterone concentrations. The latter will be designed to align with the method listed in the Joint Committee for Traceability in Laboratory Medicine. Abbott does not plan to pursue a direct pediatric claim for this new assay, according to Blincko.
Three immunoassay manufacturers are participating in CDC’s testosterone standardization program, according to Vesper. In addition, the agency worked with seven immunoassay manufacturers on a commutability study to assess NIST’s serum-based reference materials.
Mass Spectrometry and More
Major reference labs have invested in and use mass spectrometry to measure testosterone, particularly for women and children, and other labs, Fitzgerald’s included, have transitioned to this method. But the technology remains out of reach for many labs and has variability issues of its own. For instance, CDC conducted an inter-lab comparison study of mass spectrometry methods used to measure testosterone and found mean biases ranging from -14.1–19.2% at concentrations >100 ng/dL, and as high as 25.3% at levels <100 ng/dL (Steroids 2009;74:498–503).
Until mass spectrometry is a mainstream method, Fitzgerald urged labs relying on immunoassays to consider sending out any samples for measurement on women and children. “All the major reference labs have it available by mass spec, and as far as send-out tests go, it’s not extremely expensive. It’s on par with most send-out tests,” he observed.
Vesper suggested that labs need to be very familiar with the performance of their testosterone assays at all relevant concentration ranges. “When in doubt, they should contact their assay manufacturer or re-evaluate the assay in-house,” he said.
The consensus statement goes so far as to call upon insurers to reimburse only for standardized, accuracy-based tests. “Why should a third party payer pay for a test that’s useless?,” asked Rosner. “Aren’t they better off paying 50 percent more for a test that gives you the truth rather than 50 percent less for one that you may as well throw in the toilet?” That particular recommendation gave CAP pause in endorsing the statement, although it supports testosterone standardization efforts. “We didn’t want to give payers any other reason not to pay. If a lab uses an FDA-cleared kit and a CLIA-certified method, why shouldn’t it be paid?,” Eckfeldt explained. He indicated that CAP had other concerns as well. “The majority of serum testosterone assays in many labs are performed for males, and mandating potentially more expensive analytical techniques required for some indications that may not be necessary for others, could unnecessarily add to the costs of medical care,” Eckfeldt said.
The Endocrine Society is developing an ambitious timeline to achieve full standardization of testosterone assays. Although the process is fluid, the society expects to name a steering committee and relevant subcommittees this year and charge them with specific action steps, according to Loretta L. Doan, PhD, associate director of science policy.
In the meantime, labs need to keep on the lookout for updates involving this issue. Those still offering immunoassays for women and children should think carefully about this practice, according to Raff. “If you have a lab, like mine, that sends out the test for women and children to a reference lab that uses mass spectrometry, then you probably have nothing to worry about. But if you are performing the test in-house with a direct immunoassay, then you may have something to worry about,” he said. “If I’m providing a result that I’m not confident about, and I’m going to perform a test, then it’s my obligation to see that it’s done properly.”