OUP user menu

A Systematic Review and Meta-analysis of the Diagnostic Accuracy of Ultrasound-Guided Core Needle Biopsy for Salivary Gland Lesions

Robert L. Schmidt MD, PhD, MMed, MBA, Brian J. Hall MD, Lester J. Layfield MD
DOI: http://dx.doi.org/10.1309/AJCP5LTQ4RVOQAIT 516-526 First published online: 1 October 2011


Core needle biopsy (CNB) of salivary gland lesions is a relatively new technique that may offer benefits for diagnosis of the lesions. We conducted a systematic literature review to identify studies published between January 1, 1985, and March 15, 2011. Summary estimates of sensitivity and specificity were obtained by using a summary receiver-operating characteristic (SROC) curve. Study quality was assessed by using the QUADAS survey. We identified 5 studies (277 cases) for inclusion. The area under the SROC for CNB was 1.00 (95% confidence interval [CI], 0.99–1.00). Based on histologically verified cases, the sensitivity of CNB is 0.92 (95% CI, 0.77–0.98) and the specificity is 1.00 (95% CI, 0.76–1.00). We conclude that CNB has high accuracy and a low (1.2%) inadequacy rate. CNB is more accurate than fine-needle aspiration, at least in some settings, but the best selection of which test to use for an individual patient and setting remains to be defined.

Key Words:
  • Salivary gland
  • Core biopsy
  • Sensitivity and specificity
  • Meta-analysis
  • Systematic review

The diagnosis of a salivary gland mass is often a diagnostic challenge. Accurate presurgical diagnosis is important because most nonneoplastic lesions and some neoplastic lesions do not require surgery. Also, presurgical diagnosis can guide surgical planning.1,2

Fine-needle aspiration cytology (FNAC) is a well-accepted and widely used technique for the preliminary diagnosis of salivary gland masses. FNAC is safe, fast, and well-tolerated; however, it is known to have several deficiencies. The adequacy rate of FNAC is dependent on the availability of a cytopathologist for immediate specimen assessment.35 Thus, best practice is limited to specialized centers with a cytopathologist. On average, FNAC has high specificity (98%) but lower sensitivity (80%).6 Thus, a positive diagnosis by FNAC is reliable, but the false-negative rate associated with FNAC may be unacceptable. There is considerable variability in FNAC accuracy by practice setting, and it is difficult to assess whether a given site provides clinically acceptable accuracy.6 Finally, some diagnoses (eg, low-grade lymphoma vs reactive nodal hyperplasia) cannot be reliably made on the basis of cytology without ancillary studies such as flow cytometry.

Core needle biopsy (CNB) is a relatively new technique for the diagnosis of salivary gland masses that offers several potential advantages relative to FNAC. Because CNB obtains a larger sample, the inadequacy rate of CNB is likely to be lower than that of FNAC, which may make it possible to obtain high-quality samples in settings where a cytopathologist is not available to assess specimen adequacy. CNB preserves histologic architecture, which can improve the accuracy of diagnosis and, in addition, allow some diagnoses that are not possible on the basis of cytology alone. For example, it is sometimes possible to see capsular invasion and thereby distinguish adenoid cystic carcinoma from some forms of monomorphic adenoma by CNB.7 The separation of some forms of monomorphic adenoma from adenoid cystic carcinoma is extremely difficult by cytologic examination. Finally, unlike FNAC, immunohistochemical techniques are more likely to be reliable with core needle biopsy specimens. In addition, core needle biopsy specimens are formalin-fixed and paraffin-embedded so that immunohistochemical controls more closely simulate this material than they do smear or cytocentrifuged preparations. CNB also has several disadvantages relative to FNAC. CNB requires at least local anesthesia, is more painful than FNAC, is often frightening to patients, and, if inappropriately performed, can have greater morbidity.

The potential advantages and disadvantages of CNB raise questions about the relative roles of CNB and FNAC in the diagnosis of salivary gland lesions. Should CNB be reserved for special circumstances, for example, following an inadequate or inconclusive FNA biopsy or does it have sufficient accuracy for more widespread use? The answers to these questions require an accurate comparison of the diagnostic performance of CNB and FNAC.

Systematic reviews are the foundation of evidence-based medicine and provide the basis for the development of guidelines for patient management. Several studies on the diagnostic performance of CNB have been recently published; however, the results of these studies have never been summarized by meta-analysis. Many of these studies had relatively small samples, so it is difficult to assess the diagnostic accuracy of CNB. Our objective was to obtain improved estimates of the diagnostic accuracy of CNB to provide insight into the potential role of CNB in patient management. To that end, we conducted a comprehensive systematic review of the literature and used meta-analytic methods to develop a summary ROC (SROC) curve for the diagnostic performance of CNB in the evaluation of salivary gland tumors. We also conducted a quality assessment of included articles to explore potential sources of bias and to provide recommendations to improve future studies.

Materials and Methods

We followed current guidelines for the systematic review and meta-analysis of diagnostic studies.8,9

Literature Search

We searched MEDLINE, Embase, and the bibliographies of retrieved articles for studies evaluating the diagnostic accuracy of FNAC or CNB for salivary gland lesions published between January 1, 1985, and March 15, 2011, using a sensitive search strategy developed in consultation with an experienced medical reference librarian. Language was not restricted. Scopus was used to perform a “forward search” to obtain articles citing the set of retrieved articles. Our search strategy was broad and included articles on the diagnostic accuracy of FNAC, frozen section, or CNB for salivary gland lesions or head and neck lesions.


Titles and abstracts were evaluated independently by 2 authors (R.L.S. and B.J.H.) for eligibility. Studies were eligible if they seemed to contain accuracy data for the diagnosis of salivary gland tumors or head and neck tumors. Prospective and retrospective studies were eligible. Full reports were obtained for all eligible articles.

Inclusion Criteria

Eligible studies were independently evaluated by 2 authors (R.L.S. and B.J.H.), and discrepancies were resolved by consensus. Studies were included if they contained extractable data on salivary gland lesions, contained histologic verification of all cases, and provided data that enabled lesions to be classified into broad categories (malignant vs benign or neoplastic vs nonneoplastic). We excluded case reports and studies with fewer than 5 cases. Eligible studies were included if accuracy data could be extracted in the form required for analysis (true-positive, false-positive, false-negative, and true-negative). In cases with overlapping data sets, we included only the most recent comprehensive data.

Data Extraction

Data extraction was completed independently by 2 authors (R.L.S. and B.J.H.), and discrepancies were resolved by consensus or by correspondence with study authors. Inadequate biopsies were not counted in the calculation of accuracy.

Quality Assessment

Quality assessment of articles was conducted by using the QUADAS tool.10,11 Assessment was completed independently by 2 authors (R.L.S. and B.J.H.) using a scoring form, and discrepancies were resolved by consensus.

Statistical Analysis

SROC curves were constructed by using the hierarchical method.12,13 Computations were done by using Stata 11 (StataCorp, College Station, TX) and the metandi procedure for SROC curve analysis.14 Statistical significance was tested at an α level of .05 (ie, P ≥ .05) by comparison of confidence intervals. Heterogeneity was tested using the I2 statistic15,16 and the Cochrane Q statistic using the log-rank test. The study by Yamashita et al17 was not included in the statistical analysis because the software requires that the total negative cases (false-positive + true-positive) and positive cases (false-negative + true-positive) be greater than zero in each study. This exclusion would have little effect on the overall accuracy estimates because the study sample was small (n = 6). Also, the study had perfect accuracy so the estimates obtained without it slightly underestimate the accuracy.


Literature Search

We screened 3,848 titles and abstracts to obtain a set of 22 eligible articles. The reports of the eligible studies were screened to obtain 5 studies that met our inclusion criteria. These studies included a total of 277 histologically verified cases. In addition, these studies contained results for 126 nonsurgical cases that were verified by clinical follow-up. The numbers of cases for each study are listed in Table 1.1,7,1719 In 1 instance, the results from several previous studies2022 were included in a larger follow-up study,7 and we included only the larger study. We were unable to extract data from the study by Pratap et al.23

View this table:
Table 1
View this table:
Table 2

Study Characteristics

The parameters describing the CNBs are given in Table 2.1,7,1719,23 All of the biopsies were performed by radiologists and typically used 2 to 4 passes of a 16- to 18-gauge needle. None of the studies contained matched samples for comparison of CNB with FNAC, although 1 study7 contained at least 16 samples that were referred for CNB owing to inadequate FNAC. The use of immunohistochemical analysis on CNB specimens was not specifically mentioned with the exception of the report by Taki et al,1 which indicated that immunohistochemical analysis was used as appropriate.

Diagnostic Accuracy (Malignant vs Benign)

The study results for histologically verified cases are shown in Table 3.1,7,1719,23 The SROC curve for a diagnosis of malignancy based on histologically verified cases is shown in Figure 1, and the accuracy estimates are given in Table 4. On average, CNB had 100% specificity (95% confidence interval [CI], 76%–100%) and 92% sensitivity (95% CI, 77%–98%). There was more variability in sensitivity than in specificity Figure 2. A test for study heterogeneity was not significant (log-rank Cochrane Q = 0.10; df = 2; P = .5). The study results for all cases (histologically verified plus cases with clinical follow-up) are given in Table 5.1,7,1719,23 We computed the accuracy statistics by using the data from only histologically verified cases and compared these results with the accuracy statistics using all cases (ie, histologically confirmed and clinically confirmed cases). There was no significant difference between the summary estimates obtained from either group for a diagnosis of malignancy (Table 4).

View this table:
Table 3
Figure 1

Hierarchical summary receiver operating characteristic (HSROC) curve for included studies. The circles represent individual studies. The circle diameter is proportional to the weight given to each study.

View this table:
Table 4
Figure 2

Forest plot of diagnostic performance statistics. Boxes denote point estimates of sensitivity and specificity. Lines denote 95% confidence intervals. Diamonds represent 95% confidence intervals for the combined estimate from all studies.

Diagnostic Accuracy (Neoplastic vs Nonneoplastic)

The study results for a diagnosis of neoplasia vs nonneoplasia are given in Table 6.1,7,1719,23 The accuracy statistics are summarized in Table 7, in which the accuracy of the diagnosis of malignancy is compared with the accuracy of the diagnosis of neoplasia. There was no significant difference in accuracy (Δ area under the SROC = 0) between these diagnoses.

Diagnostic Accuracy for Histologic Diagnosis

Several studies reported accuracy results for specific histologic diagnoses Table 8.1,1719,21,23 The weighted average of the accuracy of histologic diagnoses was 95%.

Inadequacy Rate

The number of inadequate CNB specimens was 5 (1.2%) of 403 cases. The adequacy rate for FNAC is estimated at 8.1%.6 The difference in adequacy rate between FNAC and CNB was 0.069 (95% CI, 0.042–0.096) and was statistically significant (z > 5; P < .001).


Several studies reported small hematomas, none of which required treatment. No other complications were reported. The overall rate of hematomas was 7 (1.7%) of 403 cases.

View this table:
Table 5
View this table:
Table 6
View this table:
Table 7
View this table:
Table 8

Quality Assessment

A summary of the QUADAS quality assessment is given in Table 9. All studies were retrospective. In most studies, patients were selected on the basis of receiving CNB (item 1). It is unclear whether the patients referred for CNB were representative of patients undergoing evaluation for salivary gland swelling because the selection method was not specified (item 2). The error rate of the reference standard is unknown (item 3). No studies specifically mentioned the time between CNB and histologic evaluation (item 4). Although this could lead to timing bias, we believe the interval between CNB and histologic evaluation was probably short. All cases were evaluated by a reference standard (item 5), but studies with nonsurgical cases used both histologic confirmation and clinical follow-up as reference standards (item 6). Some studies based the analysis only on histologically verified samples, which could lead to partial verification bias because cases not referred to surgery did not receive some form of verification (item 5). The index test is independent of the reference test (item 7). The performance of the index test was generally well described (item 8). The reference test for histologic verification is quite standard and does not require a detailed description (item 9); however, the criteria used in the alternative reference test (ie, clinical follow-up) were poorly described. The index test was always interpreted without knowledge of the reference standard (item 10), but it is usual practice for the reference standard to be evaluated with knowledge of the index test results (item 11). Thus, there is potential for the knowledge of the CNB diagnosis to influence the histologic diagnosis, which would falsely elevate accuracy. The studies were retrospective, so the normal clinical data were most likely available at the time of diagnosis by the index test (item 12). There were no intermediate results (item 13), and withdrawals were not applicable because the studies were retrospective (item 14).

Comparison of the Accuracy of CNB and FNAC

We recently reviewed the diagnostic accuracy of FNAC6 for the diagnosis of parotid gland lesions and use those results for comparison with CNB Table 10. The area under the SROC curve is a measure of accuracy that can be used to compare methods. The diagnostic accuracy of CNB was significantly higher than the diagnostic accuracy of FNAC (t72 = 10.6; P < .001). The specificity of CNB was quite high (1.00); however, the sensitivity was generally lower (0.92), and, across studies, there was more variability in sensitivity than in specificity. We found a similar pattern for the diagnostic performance of FNAC6 and frozen section24 for the diagnosis of parotid gland lesions. The specificity values of CNB and FNAC are clinically acceptable (1.00 for CNB vs 0.96 for FNAC). Thus, a positive result is quite reliable. Although the sensitivities of CNB and FNAC were not statistically different, the sensitivity of CNB was slightly higher (0.92; 95% CI, 0.77–0.98) than the sensitivity of FNAC (0.80; 95% CI, 0.76–0.83).

View this table:
Table 9
View this table:
Table 10


Potential Sources of Heterogeneity

The degree of heterogeneity observed in this collection of studies was statistically insignificant (log-rank Cochrane Q = 0.10; df = 2; P = .48). Thus, there is much less variation in diagnostic performance of CNB compared with FNAC. This finding implies that it may be possible to develop general guidelines for the use of CNB for the presurgical diagnosis of parotid lesions, which is in contrast with the situation in FNAC in which the clinical usefulness of FNAC must be evaluated on a case-by-case basis owing to the high variability in diagnostic performance.6 The heterogeneity in our study may have been underestimated because the statistical tests for heterogeneity are known to have low power when the number of studies is low.16 Thus, additional studies may be required to show that CNB has less performance variability than FNAC. In general, performance variation can be attributed to 4 factors: real differences between tests (population, test performance, reference test, and outcome measure), threshold effects, bias, and random variation. Understanding the causes of performance variation is important because it provides a basis to improve consistency and diagnostic performance.

Threshold Effects

Our statistical analysis suggests that most of the performance variation in the accuracy of CNB is most likely due to threshold effects. In general, variation in a direction parallel to the SROC curve can be attributed to threshold effects, whereas variation in the direction perpendicular to the SROC curve can be attributed to accuracy. In the evaluation of biopsy specimens, a difference in accuracy would mean that pathologists differ in their ability to detect features and to interpret them correctly. A difference in threshold would mean that pathologists see the same features but use different criteria for malignancy. As shown in Figure 1, almost all of the studies lie along the SROC curve and the estimated percentage of variation due to threshold effects is close to 100%. This finding suggests that pathologists recognize the same features but use different thresholds in calling a sample malignant. Threshold effects have also been recognized by Renshaw et al.25 Such variation might be minimized by more uniform application of diagnostic criteria.

Sources of Bias

The purposes of a quality assessment are to identify potential sources of bias and to estimate their impact. Based on our survey (Table 9), the most likely sources of bias are spectrum bias (item 2), misclassification bias (item 3), verification bias (item 5), differential verification (item 6), and review bias (item 11). There are also some issues concerning test definition (item 8). We discuss each of these in the following sections.

Spectrum Bias

The performance of a diagnostic test depends on the spectrum of patients in the study population. A test will perform well if there is wide separation between the negative and positive cases but will perform relatively poorly if the patient population consists of challenging cases that are not well separated. Usually, one guards against spectrum bias by selecting consecutive patients or by selecting random patients from a population of “presenting” patients. In our collection of studies, Wan et al,19 Taki et al,1 and Breeze et al7 seemed to enroll consecutive patients who underwent a CNB. Naqvi et al18 seemed to have enrolled patients who underwent a CNB and were subsequently referred for surgical treatment. Thus, the patient populations in these studies are mostly likely comparable to one another. It is less clear whether the patients are representative of the patients who undergo evaluation for facial swelling. In many cases, the FNAC results for the patients in these were inconclusive or specimens were inadequate. It is possible that such cases differ from the population that initially undergoes evaluation for facial swelling.

In general, the mechanism by which patients were selected for CNB was not well described. CNB is used relatively infrequently compared with FNAC, and it is not clear whether patients selected for CNB are comparable to patients who would ordinarily be selected for FNAC. In general, we believe authors could more clearly describe the criteria used to select patients. In particular, it is important for authors to indicate whether the study included all consecutive patients in a given period and why the patients were selected for CNB rather than FNAC. Although it is possible that patient selection in these studies was affected by spectrum bias, it is not possible to predict the effects, if any. Other than receiving CNB, there was no clear indication that the patients in this study differed in any significant way from patients who would ordinarily undergo evaluation for facial swelling.

Misclassification Bias

Misclassification bias results from an imperfect reference test (ie, definitive histologic evaluation). There are 2 types of misclassification: differential and nondifferential. Differential misclassification occurs when the error rate of the reference test (“gold standard”) is related to the result of the index test. For example, differential misclassification would occur if the error rate by definitive histologic confirmation was different for the samples called positive by frozen section compared with those called negative. Nondifferential misclassification occurs when the error rate of the reference test is independent of the index test result (ie, CNB). Although no data are available on misclassification rates, we believe that error rates are most likely nondifferential (ie, independent of the CNB result). We investigated the effect of nondifferential misclassification on our summary estimates. To that end, we took the totals from all of the studies Table 11 and conducted a sensitivity analysis Figure 3. Given the data in our study, nondifferential misclassification would cause an overestimation of sensitivity and would have relatively little impact on specificity.

Verification Bias

Verification bias is a common problem in diagnostic studies that rely on histologic confirmation as a reference standard because positive results are referred for definitive histologic confirmation at a higher rate than are negative results. Retrospective studies often use surgery lists as a method of collecting cases, and, as a consequence, positive cases are sampled at a much higher rate than are negative cases. Some of the studies included in our analysis avoided this problem because they included all cases within a specified period and used clinical follow-up to verify cases with a negative CNB. In the study by Naqvi et al,18 all cases had histologic confirmation. The selection method used to obtain cases in this study was not clearly specified. Given that all cases were histologically verified, it is most likely that cases were obtained from surgery lists rather than from the set of patients undergoing evaluation for a salivary mass. Thus, estimates obtained from the study by Naqvi et al18 will have verification bias. The study by Breeze et al7 included nonsurgical cases with clinical verification; however, the clinical verification of these cases was inadequately described. Taki et al1 and Wan et al19 provided an adequate description of the clinical follow-up of nonsurgical cases.

Figure 3

The effect of nondifferential misclassification on the estimates of sensitivity and specificity. Summary totals for all included studies (Table 11) were used as a reference. The values shown in the graph are the calculated sensitivity and specificity relative to the initial sensitivity and specificity in Table 11. For example, the sensitivity based on the summary totals is 0.95. At a misclassification rate of 0.05, the observed sensitivity would be 0.83. The sensitivity shown in the graph is the relative change, 0.83/0.95 = 0.87.

View this table:
Table 11

In general, there is a trade-off between verification bias (due to incomplete follow-up) and bias due to differential verification (using different reference standards). The accuracy estimates obtained from our collection could be subject to both kinds of verification bias. Estimates from the study by Naqvi et al18 (or estimates obtained only from histologically verified cases from the other studies, eg, Table 4) are most likely affected by verification bias (incomplete verification), whereas estimates from the studies by Wan et al,19 Taki et al,1 and Breeze et al7 are potentially affected by differential verification bias (Table 5). We estimate the potential impact of these sources of bias in the next section.

Incomplete Verification.— The relationship between the true and observed sensitivity is given by6: Sn=Sn/[r+(1r)Sn] Equation 1 Sp=rSp/[1(1r)Sp] Equation 2

Where Sn and Sp are the actual sensitivity and specificity, Sn′ and Sp′ are the apparent sensitivity and specificity, α is the sampling fraction of positive cases, β is the sampling fraction of negative cases, and r = α/β, the relative sampling fraction of positive to negative cases.

By using the data from Wan et al,19 Taki et al,1 and Breeze et al,7 we obtain the following: α = .59, β = .56, and r =1.05. As expected, the sampling fraction of positive cases is greater than the sampling fraction of negative cases. By substituting the value of r and the observed estimates of sensitivity and specificity (Table 4), we obtain Sn = 0.915 and Sp = 1.00, which are within rounding error of our original estimates. Thus, the observed estimates are unlikely to be affected by incomplete verification. This conclusion is supported by the fact that the accuracy estimates are relatively unaffected by the inclusion of the nonsurgical cases (Table 5).

Differential Verification

Bias can also arise if 2 different reference tests are used and the tests have different accuracy. The observed accuracy for the clinically observed cases was 100%. If clinical observation misclassified cases (see “Misclassification Bias”), the false-positive and false-negative rates would be expected to be higher than observed in Table 11. Thus, it is likely that the clinical follow-up has a high misclassification rate.

In our opinion, the design of these studies is generally good because, with the exception of the study by Naqvi et al,18 verification bias was avoided by including negative cases; however, there is a need to improve the reporting of the clinical follow-up of nonsurgical cases. Overall, it does not seem that our accuracy estimates are likely to be significantly affected by verification bias.

Review Bias

All of the included studies were retrospective studies using data obtained under actual clinical conditions. Thus, pathologists were not blinded to the results of CNB when making the final diagnosis, and there is some potential for CNB results to influence the reference test. This bias would tend to increase sensitivity and specificity; however, we believe that this effect is relatively minor because final histologic confirmation is generally weighed much more heavily than CNB when making a final diagnosis.

Test Definition

It is important that tests are fully and precisely described to facilitate valid comparisons in diagnostic accuracy studies. Such descriptions also facilitate subgroup analyses for investigation of performance differences. For example, in our set of studies, the use of immunohistochemical stains was rarely documented, which would have an effect on overall diagnostic performance. Similarly, in studies of FNAC, investigators rarely document whether the “test” consists of smears or smears and cell blocks.

Random Variation

The I2 statistic indicates that none of the total variation is due to between-study variation (real differences between studies). Thus, the observed variation is mostly attributable to within-study or random variation. The within- vs between-study variation is shown in Figure 2.

Overall, the variability in sensitivity seen between studies is most likely due to a combination of threshold differences and random variation. The studies were homogeneous with respect to real differences in test conditions (population characteristics, index test performance, reference test performance, and outcome measures), so test conditions are an unlikely source of variation. We identified only minor potential sources of bias (review bias, misclassification bias, and verification bias); however, these biases would tend to be fairly uniform across studies and, according to our analysis, would mostly likely lead to an underestimate of sensitivity.

Role of CNB in the Diagnosis of Salivary Gland Lesions

Given the drawbacks of CNB, we do not believe that CNB is likely to replace FNAC as the primary diagnostic test for the evaluation of salivary gland lesions; however, there are a number of ways in which CNB might be used to supplement FNAC, which we explore herein.

Use of CNB to Confirm FNAC Diagnoses

Given the high specificity of FNAC, we do not believe that CNB is required to support a positive diagnosis by FNAC; however, CNB may offer an advantage over FNAC because of its higher sensitivity. Thus, CNB might be used to confirm a negative diagnosis obtained by FNAC. Assuming that FNAC and CNB diagnoses are independent, such a policy would have a sensitivity of 0.99 and specificity of 0.96 and would cut the false-negative rate from 21% (FNAC only) to 1% (combined FNAC and CNB). In contrast, substituting CNB for FNAC would cut the false-negative rate from 20% (FNAC) to 5% (CNB). CNB might provide an alternative to FNAC in practice locations where the sensitivity of FNAC is low or unknown or a cytopathologist is not available for immediate specimen assessment. The potential benefits of CNB (improved accuracy, ability to obtain a diagnostic sample) would have to be weighed against the potential costs (use of anesthesia, higher morbidity, and patient discomfort).

Use of CNB to Increase Sample Adequacy

FNAC has an inadequacy rate of approximately 8%,6 whereas the inadequacy rate of CNB in these studies averaged 1%. Thus, CNB might be used to obtain an adequate sample by repeated biopsy in cases in which an adequate sample could not be obtained by FNAC. It has been shown that repeated FNAC has a high yield in cases in which the initial sample was inadequate or nondiagnostic.26 Thus, it is uncertain whether the incremental gain in sample adequacy would be worth the additional costs associated with exposure of patients with inadequate biopsy specimens to CNB vs FNAC. We compare 2 potential policies for repeated biopsy: policy 1, FNAC followed by repeated FNAC if the initial FNAC specimen is inadequate; policy 2, FNAC followed by CNB if the initial FNAC specimen is inadequate.

A flow diagram comparing the policies is shown in Figure 4. As the diagram shows, policy 1 would result in 6 inadequate biopsy specimens per 1,000 with approximately 80 repeated FNACs. Policy 2 would result in 1 inadequate biopsy specimen per 1,000 with 80 patients undergoing CNB. Thus, policy 2 would reduce the number of inadequate biopsy specimens from 6 to 1 per 1,000 patients at an incremental “cost” of 80 patients being exposed to CNB vs FNAC. We suspect that policy 1 is preferable for addressing inadequate specimens.

Study Limitations

Our study provides evidence suggesting that CNB is more accurate than FNAC for the diagnosis of salivary gland lesions. Although the results are suggestive, they are based on a small number of studies. In an earlier study,6 we found significant variability in the diagnostic performance of FNAC. Thus, in some settings, FNAC is as accurate as CNB, and, for that reason, it is not possible to provide a broad conclusion about the relative merits of the 2 techniques. Also, it is not clear how patients were selected for CNB in the studies. It may be that patients were selected for CNB by some characteristic that is associated with high diagnostic accuracy. Thus, further work is required to compare CNB and FNAC in populations with a similar spectrum of disease (eg, by randomizing patients to FNAC or CNB). Finally, the adoption of a technique should not be based solely on accuracy. The use of CNB would involve a trade-off between generic costs (patient safety, use of anesthesia, patient comfort) and the benefits of improved accuracy.

Figure 4

Comparison of core needle biopsy (CNB) vs fine-needle aspiration cytology (FNAC) for rebiopsy for inadequate FNAC biopsies. Squares represent decision nodes. Circles represent chance nodes. Triangles indicate the end of a branch. FNA, fine-needle aspiration. Policy 1, FNAC followed by repeated FNAC if the initial FNAC specimen is inadequate; policy 2, FNAC followed by CNB if the initial FNAC specimen is inadequate.


The overall accuracy of CNB is greater than that of FNAC and, in addition, there is less variability in performance. Both methods have the same overall pattern of high specificity with somewhat lower sensitivity. There is a need to improve the reporting and design of future studies to reduce sources of bias and variability. The selective application of CNB in conjunction with FNAC may improve the accuracy of diagnosis of salivary gland lesions. CNB is more accurate than FNA, at least in some settings, but the best selection of which test to use for an individual patient and in specific settings remains to be defined.


Upon completion of this activity you will be able to:

  • list the advantages and disadvantages of fine-needle aspiration cytology relative to core needle biopsy for salivary gland lesions.

  • understand the definitions of the major types of bias that occur in diagnostic accuracy studies: spectrum bias, verification bias, misclassification bias, and review bias.

  • describe the purpose of a quality assessment in a systematic review.

The ASCP is accredited by the Accreditation Council for Continuing Medical Education to provide continuing medical education for physicians. The ASCP designates this journal-based CME activity for a maximum of 1 AMA PRA Category 1 Credit ™ per article. Physicians should claim only the credit commensurate with the extent of their participation in the activity. This activity qualifies as an American Board of Pathology Maintenance of Certification Part II Self-Assessment Module.

The authors of this article and the planning committee members and staff have no relevant financial relationships with commercial interests to disclose.

Questions appear on p 653. Exam is located at www.ascp.org/ajcpcme.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
View Abstract