Reference Interval Corner
Pediatric Reference Intervals and the Puberty Factor

Jon Nakamoto, M.D., Ph.D.
Laboratory Medical Director, Quest Diagnostics Nichols Institute, San Juan Capistrano, CA 
Associate Professor (Voluntary) of Pediatrics and Endocrinology, University of California, San Diego 

Painfully obvious fact of the month: Hormones and many other analytes change substantially during puberty. But the age at which puberty starts can vary by up to five years in healthy children, approximately between 8 (perhaps younger in some populations) and 13 years of age in girls and between 9 and 14 years of age in boys. Therefore, a 9 year old "early blooming" boy might have similar testosterone levels to those found in an otherwise healthy "late-blooming" 13 year old adolescent. Is it therefore valuable to offer reference intervals adjusted for pubertal stage, rather than only offering age-stratified information? Absolutely, yes. Does everyone agree on how to generate these puberty-adjusted norms, and is it straightforward to obtain them? Absolutely not. Read on for a brief introduction to what, like adolescence itself is a surprisingly complex topic.

First of all, a brief review of how puberty is classified. James Tanner, a British pediatrician, was the first to standardize how clinical examination of breasts in girls, genitals in boys, and pubic hair in both sexes could be used to define specific stages of sexual maturation. Despite his protests that efforts to do this had started well before his time, the five point scale (classically using Roman numerals, but acceptable nowadays as Arabic) he defined became known as the Tanner stage scoring system and remains the de facto standard for clinical assessment of puberty. There are reasonably objective criteria for breast, genital and pubic hair development; for example, the transition from Tanner II to Tanner III pubic hair development is defined by when hairs become darker, coarser, and meet in the midline of the symphysis pubis. Summaries of Tanner staging (with pictures or diagrams) can be found in pediatric textbooks or online1,2.

Although Tanner stages have proven their utility in clinical practice, staging is not absolutely precise, with many ambiguities and variations that must be acknowledged. Training is required, as inter-rater reliability (IRR) can be quite low among untrained individuals. Even among trained individuals, there are certain stages that are extremely difficult to distinguish (e.g., male genital stage II versus stage III) and many specialists find the male genital staging to be so subjective that they prefer to use testicular volume or length for evaluating sexual maturity in boys, even though testicular size was not part of the original Tanner staging system. Tanner staging of breast development can also be difficult; although the original Tanner staging was based solely on inspection, it is clear that in overweight girls the examiner must actually palpate the breast in search of glandular tissue (true pubertal development) to distinguish it from increased breast size due to fat accumulation (not true puberty). The transition from breast Tanner II to Tanner III development can be difficult to define, and there may even be confusion between Tanner III development in a large-breasted individual versus Tanner V in a young woman with smaller breasts, although diameter of the breast papillae (nipples) > 1 cm can help define Tanner V breast development. Some women may always retain the secondary mound of the nipple and areola that defines Tanner IV, and may therefore never officially reach Tanner V.

An additional issue to consider is that breast/genital development (driven by pituitary-gonadal activation) may not always follow the same timing as that for pubic hair (driven by both gonadal and adrenal activation). It is not uncommon for a girl to have Tanner III pubic hair development but only Tanner II breast development – so is her pubertal stage Tanner II or Tanner III? Ideally she should be defined simply as Tanner “B II, PH III”, but this introduces substantial complexity to developing reference intervals stratified by Tanner stage. A related issue is that levels of some hormones correlate better to the stage of breast development while others are more closely related to the pubic hair Tanner stage. Thus, it may ultimately be wise for the endocrine and clinical chemistry communities to agree that Tanner stage-adjusted reference intervals for estradiol must be based specifically on girls defined by breast development, while those for DHEA-sulfate should be based on children stratified primarily by pubic hair stage. At present, few if any laboratories make this distinction.

A further difficulty with Tanner stage stratification lies in lack of complete homogeneity within a given stage, particularly within the important prepubertal (Tanner I) group. An infant, toddler, pre-schooler, and a 7-year old may all be Tanner I by breast/genital and pubic hair development, and yet have very different hormonal profiles. It is clear, for example, that adrenal activation (adrenarche) and rising DHEA-sulfate starts before age 6 years4, at least 3-4 years before the first appearance of pubic hair. Therefore, a study that included in the Tanner I category children age 1-7 years would have much lower average DHEA-sulfate levels as compared to a study that only used Tanner I children aged 5-7 years. Currently there is no standardization in how to define the study population for Tanner I children.

Beyond these methodological issues, there are often practical barriers to generating Tanner stage adjusted reference intervals. It can be difficult to obtain sufficient numbers of accurately-staged and truly representative subjects. The laboratory must have access to the subjects or at least to the clinical information, including detailed staging information (breast/genital development versus pubic hair, or whether testicular volume was used in place of male genital development). As noted above, those doing the staging must be properly trained. Institutional review boards and parents alike are often reluctant to allow breast palpation or external genital examinations of otherwise healthy children, but Tanner staging by inspection alone has inherent limitations (particularly for assessment of breast development), and studies of children performing self-assessment of their own breast or pubic hair development demonstrate unacceptably inaccurate scoring relative to that done by trained observers3. Given known ethnic differences in the timing of puberty and typical hormonal levels, it is desirable but not always achievable to draw from a population with diverse demographics. All of these barriers tend to make the already relatively small sample sizes in pediatric reference interval studies even smaller for Tanner stage adjusted reference interval studies. Since there is still rather large inter-individual variability of hormonal levels within a given Tanner stage, small datasets lead to increased variability due to sampling error, and even the smallest amount of data contamination (inaccurate Tanner staging; children with cryptic endocrinologic abnormalities) can skew the data analysis significantly.

In spite of the myriad difficulties noted, having both puberty-adjusted and age-adjusted reference intervals for selected analytes is extremely useful for the specialist. During the months or years that it can take to conduct the studies to obtain reference intervals by Tanner stage, there are interim alternatives available. First of all, because variation in the timing of puberty is common, groups of children stratified by age tend to include both early- and late-bloomers, and the reference intervals generated tend to be wide enough to be inclusive of most children regardless of differences in pubertal stage, much as commonly-used cross-sectional growth charts are still useful despite differences in timing of growth spurts by age. Secondly, an approximate adjustment for early- or late-bloomers can be made by looking at the reference intervals for older or younger age groups. This crude workaround performs surprisingly well and is preferable to using Tanner-stage adjusted data from an improperly-designed or inadequately-sized reference interval study.

In summary: it’s clinically useful for certain hormones and other analytes to have reference intervals stratified by Tanner stage, but doing the studies requires a lot of careful thought and planning. And like much that is valuable in life, they’re certainly not easy!


  1. en.wikipedia.org/wiki/Tanner_scale
  2. brightfutures.aap.org/pdfs/preventive%20services%20pdfs/physical%20examination.pdf
  3. Bonat S, et al. Self-assessment of pubertal stage in overweight children. Pediatrics 2002; 110:743-7.
  4. Palmert MR et al, The longitudinal study of adrenal maturation during gonadal suppression: evidence that adrenarche is a gradual process. J Clin Endocrinol Metab 2001; 86: 4536-42.
Page Access: