Can Data Mining Produce a Healthy Reference Population for Establishing Age-specific TSH Reference Intervals?

The method endorsed by the International Federation of Clinical Chemistry and Laboratory Medicine (IFCC) and Clinical and Laboratory Standards Institute (CLSI) for establishing age and sex-specific reference intervals requires obtaining results from at least 120 healthy individuals for each age range and sex category. These numbers are often difficult to achieve by traditional direct sampling (for example from healthy volunteers), but data analysis from such a reference population is generally straightforward. In contrast, indirect sampling of a laboratory database readily generates large numbers of results but usually requires data transformation and statistical methods to isolate the healthy subpopulation. This is because most applications of this method do not clearly define the health characteristics of the reference population, a practice not endorsed by the IFCC or CLSI (1). Our study used an indirect, a posteriori sampling of the laboratory database combined with detailed demographic clinical information in the electronic medical record (EMR) databases to select a healthy reference population (2).

As a starting point, a report was generated with all TSH results reported in two randomly selected weeks. The EMRs of patients with results in the report were queried for thyroid disease, thyroid medications, and other criteria relevant to ensuring a euthyroid reference population. After excluding for these criteria, a reference population of 22,444 results from presumably healthy individuals was extracted from a total testing volume of 50,305. Data from pediatric patients from additional weeks were added and similar exclusions were applied, resulting in a reference population of 33,038 individuals.

Large numbers of results allowed us to establish TSH reference intervals using the 2.5 and 97.5 percentiles for seven age brackets where the intervals for all but the youngest and oldest brackets were based on TSH results from over 3000 individuals. We were also able to compare data from the four most prevalent races in our racially diverse population, noting lower 97.5 percentile results in the black/African American cohort. The a posteriori-derived, age-specific TSH reference intervals were verified with a direct sampling study. Our data suggest increasing upper limits of normal with increasing age: 97.5 percentiles of 4.00, 4.37, 4.84, and 5.31 for age brackets 18-49, 50-64, 65-79, and > 80 years, respectively (2), where none of the upper limits had overlapping 90% confidence intervals. This increase of TSH in later life has been seen in other studies, and guidelines recommend treating elderly patients to higher TSH goals, yet most laboratories offer a single TSH reference interval for all adults. Offering age-specific TSH reference intervals could prevent inappropriate diagnoses of subclinical hypothyroidism in older patients.

Data mining makes it feasible to generate the large numbers of results required to establish age-specific reference intervals based on data from a laboratory’s own population. The a posteriori method has the added benefit of being able to use clinical exclusions based on diagnosis codes, laboratory data, and prescribed medications all found in the EMR to more precisely define a healthy reference population. This feature additionally allows a simpler interpretation of the data, as the central 95th percentile can be used without sophisticated statistics.

References:

1. Horowitz GL, et al. Defining, establishing, and verifying reference intervals in the clinical laboratory; Approved guideline—third edition. C28-A3c 2010; 28:30.

2. Drees JC, et al. Reference intervals generated by electronic medical record data mining with clinical exclusions: age-specific intervals for thyroid-stimulating hormone from 33038 euthyroid patients. JALM 2018;3:231-239.