OUP user menu

Performance Characteristics of Six Intact Parathyroid Hormone Assays

Sonia L. La’ulu, William L. Roberts MD, PhD
DOI: http://dx.doi.org/10.1309/AJCPLGCZR7IPVHA7 930-938 First published online: 1 December 2010


The aim of this study was to evaluate the performance characteristics of 6 intact parathyroid hormone assays: Access 2 (Beckman Coulter, Fullerton, CA), ARCHITECT i2000SR (Abbott Diagnostics, Abbott Park, IL), ADVIA Centaur (Siemens Healthcare Diagnostics, Deerfield, IL), Modular E170 (Roche Diagnostics, Indianapolis, IN), IMMULITE 2000 (Siemens Healthcare Diagnostics), and LIAISON (DiaSorin, Stillwater, MN). Sample collection tubes and storage conditions were compared. Imprecision studies were performed using commercial quality control materials. Linearity was assessed using pools prepared from samples. For method comparison, serum and EDTA plasma samples were tested by all methods, and the ARCHITECT was used as the comparison method. Reference intervals were determined using various vitamin D cutoffs. The types of collection tubes and storage conditions are more important for some methods than others. Total coefficients of variation were 10.9% or less. The maximum deviation from the target recovery for linearity ranged from 5.0% to 82.2%. Bland-Altman plots demonstrated percentage biases ranging from −36.3% to 24.4%. The lower limit of the reference interval was not influenced by vitamin D status, whereas the upper reference limit was affected.

Key Words:
  • Parathyroid hormone
  • Method comparison
  • Automated immunoassay

Parathyroid hormone (PTH) influences calcium and phosphorus homeostasis directly through its actions on bone and kidney and indirectly on the intestine through vitamin D.1 Determination of PTH is useful in the differential diagnosis of hypercalcemia and hypocalcemia, for assessing parathyroid function in renal failure, and for evaluating parathyroid function in bone and mineral disorders.1 With the introduction of reliable and specific assays for PTH, the diagnosis of parathyroid dysfunction has become much easier.2 However, numerous studies have demonstrated the lack of comparability among PTH assays.36 In addition, preanalytic variables and intermethod differences have the potential to adversely impact clinical decision making.3,57 This variability can be influenced by a variety of conditions such as the assay used, the population evaluated, vitamin D status, and numerous preanalytic conditions.5,6,811 PTH is extensively metabolized. Intact PTH assays measure not only PTH (1–84) but other fragments, including PTH (7–84), which may accumulate in patients with renal insufficiency.12 In the present study, we evaluated the performance characteristics that potentially have a role in intermethod variability of PTH assays, ie, sample collection tubes, sample storage conditions, imprecision, comparison of methods with 2 sample types, renal function, and reference intervals with consideration of vitamin D status.

Materials and Methods

Preanalytic comparisons, imprecision, linearity, method comparison, and reference intervals were evaluated on 6 intact PTH assays: Access 2 (Beckman Coulter, Fullerton, CA), ARCHITECT i2000SR (Abbott Diagnostics, Abbott Park, IL), ADVIA Centaur (Siemens Healthcare Diagnostics, Deerfield, IL), Modular E170 (Roche Diagnostics, Indianapolis, IN), IMMULITE 2000 (Siemens Healthcare Diagnostics), and LIAISON (DiaSorin, Stillwater, MN). All assays were performed according to manufacturers’ instructions at ARUP Laboratories, Salt Lake City, UT. All studies using human samples were approved by the University of Utah Institutional Review Board, Salt Lake City.

A study comparing collection tubes and sample stabilities was conducted with 15 apparently healthy participants each having samples drawn into 5 types of Vacutainer tubes (Becton Dickinson, Franklin Lakes, NJ) in the following order: red top, serum separator tube (SST), plasma separator tube (PST), green top (lithium heparin), and EDTA. Samples were allowed to stand at room temperature for 30 minutes so serum tubes could clot, followed by centrifugation for 10 minutes at 2,095g. An aliquot from each tube was tested by all methods after being subjected to one of the following conditions: tested immediately (fresh), frozen immediately and tested after a 3-day storage at −70°C (frozen), and tested after 24 hours and 48 hours refrigerated (2°C–8°C). Results from samples that were frozen or refrigerated were compared with those from fresh samples by using a paired t test. Also, to evaluate the impact of tube type on results, results for all collection tubes were compared with results for fresh samples collected in red-top tubes.

Imprecision studies were performed using 3 concentrations of commercially available quality control materials, Bio-Rad Liquichek Specialty Immunoassay Control (Bio-Rad Laboratories, Irvine, CA), according to the manufacturer’s instructions. For every run, a bottle of each level of control material was retrieved from −20°C storage, thawed, and tested in duplicate by all methods. Imprecision runs were performed twice a day, for 5 days, with a minimum of 2 hours separating each run.

Linearity was assessed by diluting a high patient serum pool, with an analyte concentration near the upper end of the analytical measurement range, with a low patient serum pool to yield concentrations of 100%, 90%, 80%, 70%, 60%, 50%, 40%, 30%, 20%, and 10% of the original high pool. All samples were tested in duplicate.

Method comparison was evaluated by testing 203 serum and 193 EDTA plasma samples by all methods. The ARCHITECT was arbitrarily chosen as the comparison method. To verify the accuracy of the comparison method, 1st International Reference Preparation PTH, human, NIBSC code, 79/500 (National Institute for Biological Standards and Control, Potters Bar, England) was prepared and tested.13 One ampule (100 ng) was dissolved with 2 mL of ARCHITECT Intact PTH Calibrator A (0 ng/L; Abbott Diagnostics) to yield 50,000 ng/L. This stock solution was serially diluted with ARCHITECT Calibrator A to yield concentrations of 4,000 to 15.6 ng/L. Diluted samples were frozen and stored at −20°C. A few randomly selected dilutions were thawed and tested in duplicate at the beginning and end of a method comparison sample run. The mean of 4 replicates was calculated and compared with the assigned concentration after correction for dilution. Method comparison samples were collected from −20°C storage after completion of clinical testing. Specimens were thawed in batches, mixed thoroughly, centrifuged for 5 minutes at 2,095g, checked for clots, and analyzed once by all methods on the same day. Samples were also tested for calcium and creatinine (kinetic Jaffé method) with a Modular P analyzer (Roche Diagnostics). The estimated glomerular filtration rate (eGFR) was calculated based on the Modification of Diet in Renal Disease equation: GFR (mL/min/1.73 m2) = 175 × (Scr)−1.154 × (Age)−0.203 × (0.742 if female) (conventional units). No correction was made for ethnicity because the ethnicity of the subjects from whom these samples were obtained was not known. We were not permitted to obtain clinical information on these subjects.

To determine reference intervals, serum samples were obtained from 130 apparently healthy adult subjects in the summer (July–August; 77 women; 53 men; 20–64 years old; median age, 30 years), and a separate set of 130 samples was collected in the winter (February; 72 women; 58 men; 19–65 years old; median age, 31 years). Volunteer participants were not taking prescription medications. Before testing, specimens were stored at −70°C. Specimens were thawed, mixed thoroughly, centrifuged for 5 minutes at 2,095g, checked for clots, and analyzed once by all methods on the same day. Samples were tested for 25-hydroxyvitamin D (25OHD) by the LIAISON analyzer and for calcium, creatinine, and eGFR as detailed in the preceding sections. Only samples with an eGFR of 60 or more and that were normocalcemic were used for reference interval determinations.

EP Evaluator Release 8 software (David G. Rhoads Associates, Kennett Square, PA) was used to calculate imprecision, linearity, and nonparametric reference intervals. Passing-Bablok, linear regression, and difference plots were generated using Analyse-It, version 2.10 (Analyse-It Software, Leeds, England). Analysis of differences observed among sample stabilities, collection tubes, and reference intervals was calculated using S-PLUS, version 8.0 (Insightful, Palo Alto, CA).


To evaluate the impact of sample storage conditions on PTH results, samples from healthy volunteers were frozen for 3 days or refrigerated for 24 and 48 hours. PTH results from the different storage conditions were compared with the results from the samples tested immediately after the blood draw (fresh samples). This analysis was performed on all 6 automated immunoassay analyzers to determine if there were differences based on method. Analysis of sample storage conditions showed that the majority (69/90 [77%]) of the frozen or refrigerated samples had a statistically significant difference (P < .05), as measured by a t test, compared with the fresh samples on all of the immunoassay platforms tested. However, when using a previously defined desirable analytic error limit of 12.5% to evaluate whether the differences were clinically significant, only 6 individual manufacturer/tube type/storage condition combinations exhibited a bias of 12.5% or more, the Access and Centaur for SST at 48 hours of refrigeration, the IMMULITE for PST at 48 hours of refrigeration and frozen, and the Centaur and LIAISON for green-top tube, frozen.14 No clinically significant difference was observed with any immunoassay method for storage at 24 hours of refrigeration vs fresh with any of the tube types.

To evaluate the impact of tube type on the PTH result obtained from the 6 immunoassays, the results for the fresh samples from each tube type were compared with the results obtained with the plain serum red-top tube. Evaluation of changes among the different collection tubes revealed 4 that had a significant difference compared with the red-top tube: PST, green top, and EDTA for IMMULITE and EDTA for the E170.

Quality control materials were used to evaluate imprecision of the PTH immunoassays. Total coefficients of variation (CVs) ranged from 1.6% to 10.9% for level 1 (28.8–85.0 pg/mL [28.8–85.0 ng/L]), 1.5% to 8.1% for level 2 (188.2–451.4 pg/mL [188.2–451.4 ng/L]), and 1.1% to 5.5% for level 3 (594.5–1,472.1 pg/mL [594.5–1,472.1 ng/L]) Table 1. The E170 always had the lowest imprecision, and the LIAISON had the highest imprecision.

The target value for each linearity sample was calculated based on the samples with the lowest and highest concentrations within the analytic measurement range for each method. The maximum deviation from target recovery ranged from 5.0% for the Centaur to 82.2% for the LIAISON Table 2. A linear regression plot of the results for the LIAISON is shown Figure 1.

View this table:
Table 1
View this table:
Table 2

NIBSC material 79/500 was obtained and tested in various dilutions to assess the accuracy of the ARCHITECT, which was chosen as the comparison method. Recoveries for diluted World Health Organization reference materials were 103.0%, 96.2%, 97.4%, 101.5%, 94.9%, 100.0%, 98.8%, and 97.1% for 15.63, 31.25, 62.5, 125, 250, 500, 1,000, and 2,000 ng/L, respectively, with all recoveries being within ± 5.1% of the target values. Method comparison samples were evaluated by Passing-Bablok regression based on renal insufficiency, using an eGFR of 30 mL/min/1.73m2 Table 3. Method comparison was also evaluated by Bland-Altman difference plots Figure 2. The mean percentage differences were −36.3%, −19.4%, −18.4%, 24.4%, and −8.8% for EDTA plasma samples and −25.5%, −30.2%, −18.5%, 1.6%, and 3.5% for serum samples on the Access, Centaur, E170, IMMULITE, and LIAISON, respectively. Testing of statistical outliers was repeated, and an average of the original and repeated results was used in the final method comparison analysis. In 1 case, the ARCHITECT had a repeatedly high EDTA plasma outlier that was removed from method comparison analysis but is shown in Figure 2 as “x.”

Figure 1

Dilution linearity of the LIAISON parathyroid hormone method. Linear regression is shown by solid lines. The dashed lines represent ideality (x = y). Parathyroid hormone levels are given in Système International units; to convert to conventional units (pg/mL), divide by 1.0.

View this table:
Table 3

In addition to testing PTH on all 6 immunoassay methods for analysis of nonparametric reference intervals, we also measured 25OHD, calcium, creatinine, and eGFR. Two subjects were excluded owing to abnormal calcium results, and samples from 2 subjects were excluded because the eGFR was less than 60 mL/min/1.73m2. Analysis of serum samples showed no statistically significant difference between samples drawn in the summer vs winter (P = .418). Therefore, both seasons were combined and used to calculate the serum reference intervals Table 4. Reference interval results were further analyzed based on 25OHD results using cutoffs of 10, 20, and 30 ng/mL (25, 50, and 75 nmol/L). For all samples analyzed, the lower limit of the reference interval ranged from 4.4 to 8.7 ng/mL (11.0–21.8 ng/L) and the upper limit was 28.3 to 49.4 ng/mL (70.6–123.4 ng/L). As the 25OHD cutoff increased, the lower reference limits for PTH showed no significant change; however, the upper reference limits trended lower. In all cases, except for the LIAISON, the upper reference limit for a 25OHD cutoff of 30 ng/mL (75 nmol/L) was significantly lower than the upper reference limit when all samples were included.

Figure 2

Bland-Altman plots showing the percentage difference between each of the 5 parathyroid hormone assays and the ARCHITECT as the comparison method on the y-axis. An ideal mean percentage difference of 0 is indicated by a dotted line. The mean percentage difference is indicated by a dark, solid line. The limits of agreement for the mean percentage difference, as defined by 95% confidence limits, are indicated by dashed lines. AE, EDTA plasma samples. FJ, Serum samples. A, The Access gave a mean percentage difference of −36.3% with 95% confidence limits of −66.5% and −6.2%. B, The Centaur gave a mean percentage difference of −19.4% with 95% confidence limits of −45.0% and 6.2%. C, The E170 gave a mean percentage difference of −18.4% with 95% confidence limits of −39.2% and 2.3%. D, The IMMULITE gave a mean percentage difference of 24.4% with 95% confidence limits of −22.0% and 70.7%. E, The LIAISON gave a mean percentage difference of −8.8% with 95% confidence limits of −50.6% and 33.0%. F, The Access gave a mean percentage difference of −25.5% with 95% confidence limits of −60.6% and 9.6%. G, The Centaur gave a mean percentage difference of −30.2% with 95% confidence limits of −91.9% and 31.5%. H, The E170 gave a mean percentage difference of −18.5% with 95% confidence limits of −48.1% and 11.2%. I, The IMMULITE gave a mean percentage difference of 1.6% with 95% confidence limits of −45.1% and 48.3%. J, The LIAISON gave a mean percentage difference of 3.5% with 95% confidence limits of −62.2% and 69.3%. In 1 case, the ARCHITECT had a repeatedly high EDTA plasma outlier that was removed from method comparison analysis but is shown as “x.”


Collection tube types and sample storage conditions are more important for some methods than others. Even though the majority of sample stabilities evaluated showed statistically significant differences from frozen or refrigerated storage compared with fresh, only 6 instances were clinically significant and had a bias that was greater than the desirable variance for PTH.14 Red top and EDTA showed no clinically significant differences between analysis immediately after collection and analysis after frozen or refrigerated storage. SST samples only had a clinically significant difference for immediate analysis compared with 48 hours of refrigeration. No sample type for any method had a clinically significant difference after 24 hours of refrigerated storage compared with immediate analysis. Therefore, refrigerated storage is acceptable if samples will be analyzed within 24 hours.

View this table:
Table 4

When comparing different collection tubes, we observed a statistically significant difference between serum and EDTA plasma for the IMMULITE method, in which results for EDTA plasma were significantly higher than serum. Also, the E170 had a statistically significant difference between EDTA plasma and serum (red top). However, other Roche Elecsys methods have previously shown no significant difference between SST and EDTA for fresh samples, although this study used a different Roche platform (Elecsys 1010) and compared with a different type of serum tube (SST instead of red top).8 Previously, significant differences were observed for triiodothyronine when comparing SST with glass or plastic collection tubes.15 In our study, the presence of a gel separator had no significant effect on PTH results for serum or heparin plasma.

While the majority of studies that have evaluated different collection tubes and their effect on PTH results have compared EDTA plasma with SST and typically used only 1 method, our study involved a comprehensive analysis of 5 collection tubes in 6 methods.5,811,1619 The majority of previous studies performed the comparison on patients undergoing hemodialysis or with other disease, whereas our study used apparently healthy adults. English et al16 observed that their renal cohort produced similar results to those for normal subjects for EDTA plasma when using the Centaur, showing no significant decrease in PTH concentrations. In addition, they found that EDTA had a positive bias with respect to SST for renal patients and healthy volunteers.16 We were not able to assess the effects of sample type in hemodialysis patients.

Desirable performance for imprecision based on the analytic CV was previously described as 12.7% and optimal performance as 6.3%.14 All methods met the analytic quality goals of desirable performance with total CVs of 10.9% or less. However, 4 methods did not demonstrate optimal performance: Centaur (7.4%), IMMULITE (6.4%), and LIAISON (10.9%) for level 1 and LIAISON (8.1%) for level 2. The variability of mean results observed among methods may be due to a matrix effect of the quality control material.

All methods showed acceptable dilution linearity, except for the LIAISON in which the maximum deviation from target recovery was 82.2%. In an effort to resolve this unsatisfactory finding, we repeated testing using freshly prepared samples and initially replicated the original conditions, then used a different lot of reagent, followed by testing on a different instrument. No improvement was observed. We also verified the dilutions by testing on the E170 in parallel, in which maximum deviations from target recoveries were less than 7%. Conversely, favorable results for dilution linearity and recovery have been described previously.20 Verification of linearity for the LIAISON on new lots of reagent by individual laboratories would seem to be justified.

Overall, there was good correlation for all methods based on correlation coefficients; however, there was considerable bias between methods. While others have also observed good correlation, lack of comparability and intermethod variability have also been reported.3,5,6,21 Unlike results seen by Joly et al,5 we did not see large differences between the LIAISON and ARCHITECT methods. Rather, the LIAISON and the ARCHITECT demonstrated good agreement, with the LIAISON showing the least amount of bias (−8.8%) compared with the ARCHITECT for EDTA plasma samples and only a 3.5% bias for serum samples. The greatest bias observed was between the Access and ARCHITECT for EDTA plasma and serum, with differences being most prominent for EDTA plasma. In addition, we observed a bias between EDTA plasma and serum for the IMMULITE, which has been confirmed by others.5,9,10,18 Consequently, IMMULITE users should exercise caution when comparing results based on these 2 different sample types.

When separating the method comparison samples at an eGFR of 30 mL/min/1.73m2, we did not observe large differences except for serum samples on the LIAISON. Passing-Bablok regression slopes for serum samples with an eGFR less than 30 did not fall within the 95% confidence intervals of serum samples with an eGFR of 30 or more and vice versa. One explanation is that PTH fragments accumulate in patients with renal failure, and the LIAISON is possibly susceptible to measuring these fragments of PTH. Since there were no differences observed for the other methods compared with the ARCHITECT, these methods all exhibit cross-reactivity similar to the ARCHITECT method, with PTH fragments.

Because PTH may be increased in patients with vitamin D insufficiency and decrease when vitamin D–insufficient people are given vitamin D, a suggestion has been made that people with vitamin D insufficiency should be excluded from a reference population for PTH.21,22 Furthermore, because vitamin D insufficiency is usually clinically silent, it has been recommended that 25OHD be measured beforehand and people with a concentration below the threshold defining vitamin D insufficiency be excluded from the reference population.21 We measured 25OHD and looked at the differences in reference intervals based on the 25OHD cutoff used. The lower limit of the PTH reference interval was not affected by vitamin D status for any method. However, the upper limit seems to be influenced by the 25OHD cutoff used, except for the LIAISON. In most cases, the PTH upper reference limit for subjects with 25OHD results more than 30 ng/mL (75 nmol/L) does not fall within the confidence limits of the calculated reference interval for all healthy subjects or subjects with 25OHD more than 10 ng/mL (25 nmol/L). While it remains controversial on the threshold used to define vitamin D insufficiency, some have suggested using a 25OHD cutoff of 20 ng/mL (50 nmol/L).21 At this cutoff, most methods had a PTH upper reference limit that overlapped with the 90% confidence interval for the other 25OHD cutoffs.

There are many factors that potentially influence variability for PTH measurements, including the method used, PTH source (synthetic vs endogenous), population evaluated, vitamin D status, preanalytic variables such as sample matrix, and storage time and temperature. We assessed some of these factors that can produce variability and confirmed that there is variability between PTH assays. Therefore, we agree that standardization efforts are warranted and assay-specific decision limits are required.3,5,6,21


We gratefully acknowledge Andrew Wilson for assistance with statistical analysis.


  • Supported by Abbott Diagnostics and the ARUP Institute for Clinical and Experimental Pathology. Abbott Diagnostics, Beckman Coulter, and Roche Diagnostics provided instrumentation to perform testing using their methods.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
View Abstract