Thyroid disease is common in the general population, and its prevalence increases with age. Because the signs and symptoms of the disease often resemble other disorders, before initiating treatment physicians need to determine whether the patient actually has thyroid disease or something else. The test most frequently ordered to test thyroid function is thyrotropin, commonly referred to as thyroid-stimulating hormone (TSH). Based on the functional interrelationship of the hypothalamus, pituitary gland, and thyroid, TSH should be elevated if the thyroid gland is not producing adequate thyroid hormone, and suppressed if it is producing too much (Figure 1). Today, however, we are beginning to realize that this well-established paradigm for TSH synthesis and release is an oversimplification.
Not only does the pituitary secrete TSH in a diurnal pattern, but many substances produced in the central nervous system, even in healthy euthyroid individuals, may enhance or suppress TSH production in addition to the feedback effect of thyroid hormone. Furthermore, although TSH levels rise and fall in response to changes in the concentration of free thyroxin (T4), individuals appear to have their own set-points, and factors such as race and age also contribute to variability in TSH levels. Alterations of the normal pituitary response are also common in patients with a variety of illnesses.
Although laboratory measurement of serum TSH is an essential tool for diagnosing and managing various thyroid disorders, the laboratory medicine community has long recognized that immunoassays used to measure the hormone are yet another source of variability in patients’ results. These inter-assay discrepancies are now the focus of an international harmonization effort to improve the reliability of TSH results and their application in clinical guidelines (1). This article presents an overview of the issues and a discussion of how laboratories can assist clinicians in using TSH results to diagnose and manage thyroid disease.
Evolution of TSH Assay Sensitivity
The sensitivity of TSH assays has increased substantially over the last several decades. The first TSH tests were competitive immunoassays that used polyclonal antibodies; however, the analytical sensitivity of these assays was not sufficient to differentiate hyperthyroid patients with suppressed TSH from euthyroid individuals. Non-competitive immunoassays introduced in the 1980s achieved this level of sensitivity, and in the 1990s, manufacturers further improved the assays by using monoclonal antibodies.
Today, we refer to the performance of these so-called second- and third-generation TSH assays in terms of functional rather than analytical sensitivity, meaning that imprecision is assessed over a concentration range during an extended period of time. Functional sensitivity for TSH is defined as the level of TSH below which the imprecision of the assay exceeds a coefficient of variation (CV) of 20%. By convention, second- and third-generation assays detect TSH with this degree of reproducibility down to a level of approximately 0.1 and 0.01 mIU/L, respectively.
The majority of currently available TSH immunoassays are capable of third-generation performance. However, daily precision in this low range may not be as robust as it appears when functional sensitivity is initially evaluated (2). Furthermore, most laboratories probably do not regularly monitor performance in this low range, because it would require the use of in-house pooled sera. In addition, few, if any, proficiency programs regularly challenge laboratories at the limit of third-generation performance.
The major advantage of using a third-generation TSH assay is that precision at a higher level, 0.1 mIU/L, is markedly improved compared to earlier generation assays. In fact, most clinical guidelines recommend 0.1 mIU/L as the cut-off for consideration of hyperthyroidism, followed by measuring patients’ total T4 and/or free T4. Third-generation assays also perform well in the range relevant for monitoring thyroid hormone replacement therapy after thyroid ablation.
Measuring TSH is also important when hypothyroidism is suspected, because the inverse logarithmic relationship between TSH and T4 means that TSH levels will rise long before either T4 or free T4 concentrations fall. The upper limit of euthyroidism with first-generation TSH assays was approximately 10 mIU/L, but with the introduction of second- and third-generation assays it fell to approximately 5 mIU/L. The most likely reason for this change was the reduced cross-reactivity afforded by the monoclonal antibodies used in the newer assays.
TSH Reference Interval
For many years, most physicians have considered TSH levels >10 mIU/L evidence of thyroid failure, and levels of 5–10 mIU/L evidence of mild or subclinical hypothyroidism. During the past decade, however, there has been considerable debate about the correct upper limit of the reference interval for TSH.
Although there is a consensus that the lower limit of the euthyroid reference interval for TSH should be 0.2–0.4 mIU/L, experts disagree about the appropriate upper limit. In 2002, researchers published an analysis of thyroid function test results from a large survey of individuals representative of the U.S. population (3). The study revealed that within a small standard error the mean TSH level in the general population is approximately 1.5 mIU/L. This finding prompted organizations to call for lowering the upper limit of the normal TSH reference range. The National Academy of Clinical Biochemistry recommended 4 mIU/L, while the American Association of Clinical Endocrinologists set the upper limit at 3 mIU/L, and other groups went as low as 2.5 mIU/L. Many clinicians resisted these new limits, because they worried that a significant number of patients would be unnecessarily labeled as having thyroid dysfunction, especially given the fact that there was no evidence that treatment of these individuals would provide any benefit.
In 2007, researchers analyzed the data from the survey a second time to clarify the relationship of TSH and antibodies to thyroid peroxidase (TPO), a recognized marker of autoimmune thyroid disease (4). TSH levels correlated with anti-TPO positivity, and the investigators asserted that reference interval studies would support the lower upper limit if such individuals, who probably have occult autoimmune hypothyroidism, were excluded. While some groups also have challenged this position, there is growing consensus that one TSH reference interval does not fit all.
The Special Cases: Pregnant Women, Newborns
Diagnosing thyroid dysfunction in pregnant women has long been problematic. Pregnancy has a significant effect on thyroid function, which changes over the course of gestation and makes assessment more difficult. The recent guidelines for diagnosing and managing thyroid disease during pregnancy issued by the American Thyroid Association recommend trimester-specific reference intervals for TSH, as well as TSH targets for diagnosing and treating hypothyroidism during pregnancy.
Neonatal screening for congenital hypothyroidism is another special problem. Many programs have switched from an initial T4 screen with TSH confirmation of low T4 results to an initial TSH screen. Each screening program sets its own cut-offs, but standardization of TSH immunoassays would benefit from this approach.
In addition to inter-assay differences, evidence is now accumulating that there may be significant variability in the structure of the patients' TSH molecules. TSH is a glycoprotein hormone with a structure similar to other anterior pituitary glycoprotein hormones. This group of hormones consists of non-covalently linked heterodimers: an α-chain common to all and a unique β-chain that specifies the hormone’s biological activity. TSH has three sites where oligosaccharides may be attached, which can contribute up to 25% of the molecule's mass. The pituitary actually releases a heterogeneous mixture of TSH glycoforms that consists of molecules with various side chains (Figure 2).
TSH isolated from normal pituitaries contains primarily branched glycans with sulfated acetylgalactosamine (GalNAc) residues; however, the sera of patients with hypothyroidism contain TSH with branched glycans composed of sialic acid attached to galactose residues. The liver recognizes glycoproteins with the GalNAc sulfate signal and removes them; therefore, sialylated TSH has a longer half-life. As a result, patients with hypothyroidism not only produce more TSH, but also TSH glycoforms that circulate longer in blood (5). Since laboratories use the same assay to measure TSH in all patients, it is important to know how various manufacturers’ immunoassays are affected by the presence of these different TSH glycoforms.
TSH Assays – The Status Quo
In recent years, laboratory professionals have increasingly recognized that TSH immunoassays do not perform consistently. Data from the U.K. National External Quality Assessment Service proficiency testing program revealed poor correlation between observed method bias and the reference intervals recommended in the package inserts from various manufacturers, which are essentially equivalent. These observations suggest that standardization of TSH assays would be helpful for improving patient care, particularly given the current discussions about lowering the upper decision limit for TSH in clinical practice guidelines.
All of these considerations contributed to the International Federation of Clinical Chemistry and Laboratory Medicine (IFCC) forming a Working Group for Standardization of Thyroid Function Tests (WG-STFT) in 2005. Now a full-fledged committee, the goal of this body—of which both authors are members—is to investigate practical options for standardization of thyroid hormones such as TSH, total and free T4, and T3. We first screened 200 apparently healthy individuals and selected a panel of 40 individuals to use as single-serum donors. For TSH, nine manufacturers representing a total of 16 immunoassays agreed to participate in the project.
In this initial investigation of TSH assay variability, we found that currently available TSH assays are precise. Within-run and total imprecision were 1.5–5.5% and 2.5–7.7%, respectively. Also, with regard to comparability, the mean results of most of the assays agreed within 10% of the target value. Because no reference measurement procedure exists for TSH, we used the all-procedure trimmed mean as the target (6). Only three deviated by more than 10%, with two differing by almost 40% in opposite directions. Mathematical recalibration using the regression of each assay to the target value satisfactorily removed the individual assay biases, and the recalibrated results complied with the biological total error goal of 22.8%, except for one result in each of four assays (Figure 3). This outcome indicated that our approach, harmonization rather than standardization (see text below), would be possible. In fact, a subsequent study in which the manufacturers included their master calibrators and were asked to recalibrate their assays was similarly successful (7).
Although different TSH glycoforms did not appear to affect assay performance in these initial studies, all of the samples were from apparently healthy individuals. Only two had suppressed TSH and only six had elevated TSH. In contrast, in a more recent study conducted in 2012, we evaluated TSH assays using more than 90 samples from patients with thyroid disease, in whom TSH concentrations varied widely from 0.04–80 mIU/L. While the results of this study have not yet been reported, we can reveal that this method comparison confirms that current TSH immunoassays are “glycosylation-blind” and that recalibration using the assay means significantly reduces individual assay bias.
TSH Assays – The Harmonized Future
According to the measurement paradigm, clinical laboratory results produced by assays claiming to measure the same measurand should be equivalent, regardless of where or how they are produced. This is especially important for patients who may visit different healthcare systems during the course of their treatment. Ideally, this would be accomplished by using calibrators traceable to a reference measurement procedure (RMP), which is a requirement of International Standards Organization (ISO) Directive 17511 (8). However, the ISO standard recognizes that many measurands lack either a RMP or a commutable primary reference material for calibration, or both. When this is the case, the assay can not be standardized, but it might be possible to harmonize it.
Although investigators have been trying to develop one, no primary RMP for TSH is available. Current TSH methods all claim “traceability” to the World Health Organization (WHO) TSH International Reference Preparation 80/558, but all of them do not produce comparable results. This is probably because the WHO standard is not commutable, meaning that it does not produce the same numeric relationship within clinically meaningful limits for different measurement procedures. This characteristic can be attributed to many factors. For example, laboratorians are familiar with the term matrix effect, which refers to alteration of either the sample matrix or the measurand during production of a reference material that makes one or both of them react differently in different measurement procedures. Therefore, it makes sense that a non-commutable measurement standard cannot be used for calibration traceability (9).
The IFCC’s Committee for Standardization of Thyroid Function Tests (C-STFT) proposes harmonization of TSH using a new measurement standard: a panel of native materials with values assigned by the all-procedure trimmed mean as the surrogate RMP (Figure 4). Full traceability to the existing WHO standard would be preserved by transferring the unit of the WHO international standard to the first panel of native sera by way of the all-procedure trimmed mean. However, from then on, the unit for immunoassays would be sequentially transferred to follow-up panels by measurement in overlap, rather than to the next WHO standard, which would become a secondary calibrator. We also would like to emphasize that this harmonization approach will only be successful if certain medical, physiological, and analytical circumstances outlined elsewhere are fulfilled (10).
Clearly, this harmonization effort requires several important components: a stable infrastructure for sourcing of clinical samples; effective communication to all stakeholders; and approval by regulatory authorities such as the U.S. Food and Drug Administration and the European Commission. The role of the manufacturers in this process is also critically important, and the C-STFT gratefully acknowledges their commitment to this project, not only for doing the measurements, but also for funding the samples.
The following recommendations should be formulated with regard to TSH testing: 1) until assay harmonization is implemented, assay-specific decision limits should be used in clinical practice guidelines and clinical studies; 2) third-generation assays should be clearly identified; 3) journal editors should only accept studies that identify manufacturers’ assays and interpret study results using assay-specific cut-offs; and 4) proficiency testing/external quality assessment programs should use native samples from single donations collected according to the Clinical Laboratory and Standards Institute C37-A protocol, but without filtration.
- Miller WG, Myers GL, Gantzer ML, et al. Roadmap for harmonization of clinical laboratory measurement procedures. Clin Chem 2011;57:1108–17.
- Reix N, Massart C, Gasser F, et al. Should functional sensitivity of a new thyroid stimulating hormone immunoassay be monitored routinely? Clin Biochem 2012;45:1260–2.
- Hollowell JG, Staehling NW, Flanders WD, et al. Serum TSH, T4, and thyroid antibodies in the United States population (1988 to 1994): National Health and Nutrition Examination Survey (NHANES II). J Clin Endocrinol Metab 2002;87:489–99.
- Spencer CA, Hollowell JG, Kazarosyan M, et al. National Health and Nutrition Examination Survey III thyroid-stimulating hormone (TSH)-thyroperoxidase antibody relationships demonstrate that TSH upper limit reference limits may be skewed by occult thyroid dysfunction. J Clin Endocrinol Metab 2007;92:4236–40.
- Donadio S, Pascual A, Thijssen JHH, et al. Feasibility study of new calibrators for thyroid-stimulating hormone (TSH) immunoprocedures based on remodeling of recombinant TSH to mimic glycoforms circulating in patients with thyroid disorders. Clin Chem 2006;52:286–97.
- Thienpont LM, Van Uytfanghe K, Beastall G, et al. Report of the IFCC Working Group for Standardization of Thyroid Function Tests, part 1: Thyroid-stimulating hormone. Clin Chem 2010;56:902–11.
- Thienpont LM, Van Uytfanghe K, Van Houcke S. Standardization activities in the field of thyroid function tests: A status report. Clin Chem Lab Med 2010;48:1577–83.
- International Organization for Standardization (ISO). In vitro diagnostic medical devices—measurement of quantities in samples of biological origin—metrological traceability of values assigned to calibrators and control materials. ISO 17511:2003 2003 ISO Geneva, Switzerland.
- Miller WG, Myers GL, Rej R. Why commutability matters. Clin Chem 2006;52:533–4.
- Thienpont LM, Van Houcke SK. Traceability to a common standard for protein measurements by immunoassay for in-vitro diagnostic purposes. Clin Chim Acta 2010;411:2058–61.
Disclosure: The authors have nothing to disclose.
Acknowledgements: The authors are indebted to these manufacturers: Abbott, Beckman Coulter, bioMérieux, DiaSorin, Ortho-Clinical Diagnostics, Roche Diagnostics GmbH, Siemens Healthcare Diagnostics, and TOSOH Corporation. We also acknowledge the other members of the IFCC C-STFT: Michael Rottmann, Frank Quinn, Barnali Das, and Finlay MacKenzie.