OUP user menu

Image Analysis of HER2 Immunohistochemical Staining
Reproducibility and Concordance With Fluorescence In Situ Hybridization of a Laboratory-Validated Scoring Technique

Douglas M. Minot, Jesse Voss, Susan Rademacher, Toe Lwin, Jessica Orsulak, Bolette Caron, Rhett Ketterling MD, Aziza Nassar MD, Beiyun Chen MD, PhD, Amy Clayton MD
DOI: http://dx.doi.org/10.1309/AJCP9MKNLHQNK2ZX 270-276 First published online: 1 February 2012


Image analysis of the HER2 immunohistochemical (IHC) stain can help determine which breast cancer patients may benefit from HER2-targeted therapy. We studied the concordance of HER2 IHC and fluorescence in situ hybridization (FISH) as well as reproducibility of surgical pathologist (SP) and cytotechnologist (CT) interpretations using manual and image analysis methodologies on 154 IHC cases. Concordances with FISH were good for IHC negative (0, 1+) cases (range, 97%–100%) and positive (3+) cases (range, 87%–100%). Image analysis had fewer equivocal (2+) results (10.4%) than CT (14.9%) and SP (16.2%) manual methods, with higher concordances to FISH (31%, 26%, and 20% for image analysis, CT manual, and SP manual, respectively). CT manual (κ = 0.747) and image analysis (κ = 0.779) methods had better interobserver reproducibility than SP manual (κ = 0.697). CT image analysis had better intraobserver reproducibility (κ = 0.882) than CT (κ = 0.828) and SP (κ = 0.766) manual methods. HER2 IHC analysis performed by image analysis can produce accurate results with improved reproducibility.

Key Words
  • Breast neoplasms
  • Immunohistochemistry
  • Fluorescence in situ hybridization
  • Trastuzumab
  • Herceptin
  • HER2
  • Targeted therapy

The human epidermal growth factor receptor 2 (HER2) protein is a transmembrane receptor kinase, which is encoded by the HER2/neu (ERBB2) gene and is amplified in approximately 25% to 30% of all breast cancers. The HER2 signaling pathway and estrogen receptor pathway are considered 2 of the most effective drivers of cell survival and proliferation in 85% of all breast cancers. Trastuzumab (Herceptin, Genentech, San Francisco, CA) is a humanized monoclonal antibody directed at the HER2 pathway that binds the extracellular domain of HER2, thus interfering with the signal transduction cascade initiated by HER2 overexpression.1 The requirement that HER2 protein overexpression or gene amplification is present before treatment with trastuzumab is supported by multiple clinical studies that show effectiveness in this cohort of patients.25

There are currently 2 primary laboratory methods used to determine HER2 status: immunohistochemical analysis, which determines the level of protein expression, and HER2 fluorescence in situ hybridization (FISH), a cytogenetic technique developed to detect amplification of the ERBB2 gene. The US Food and Drug Administration (FDA)-approved immunohistochemical kits include HercepTest (DAKO, Carpinteria, CA) and Pathway (Ventana Medical Systems, Tucson, AZ). Currently, there are 3 FISH assays with FDA approval: PathVysion (Abbott Laboratories, Abbott Park, IL), Inform (Ventana Medical Systems), and PharmDx (DAKO, Glostrup, Denmark).

Considerable debate exists as to whether immunohistochemical analysis or FISH is the best to determine a patient’s HER2 status and subsequent eligibility to receive targeted therapy. This debate exists, in part, because both testing modalities were used in the prospective randomized adjuvant trials of trastuzumab. Evidence of immunohistochemical overexpression (3+), FISH amplification, or a combination of the 2 tests (2+ immunohistochemical staining with FISH amplification) were used to determine enrollment eligibility.

Further confounding this debate are the multiple studies that demonstrate suboptimal concordance rates between immunohistochemical assessments and FISH analyses between laboratories performing HER2 testing.69 Guidelines and recommendations were published in 2007 by a joint American Society of Clinical Oncology/College of American Pathologists (ASCO/CAP) Task Force in an effort to standardize testing protocols and test interpretation.10 In addition to providing specific guidance intended to reduce assay variation with specific scoring algorithms, the panel also recommended that laboratories show 95% concordance with another validated laboratory test. Some recent studies indicate that adherence to the guidelines can benefit laboratories by improving HER2 immunohistochemical and FISH concordance, reduce inconclusive FISH cases,11 and decrease interobserver variability.12 Other studies have suggested that biologic reasons may have a large role in discordant cases,13 and further clinical studies are needed to determine whether the updated guidelines are better at predicting response to anti-HER2 therapy.14

Historically, the primary method used to determine immunohistochemical overexpression has been visual assessment by bright-field microscopy. In recent years, investigators have demonstrated that image analysis can be an effective tool for achieving accurate and reproducible interpretation of HER2 protein expression.1518 A study previously published from our group demonstrated that by using a laboratory-validated scoring technique, one can accurately identify non-amplified and amplified FISH cases by immunohistochemical analysis with fewer 2+ cases being “reflexed” to FISH.19 Despite the inroads that digital imaging technology companies have made in pathology, there seems to be a dearth of published peer-reviewed articles that clearly describe image analysis techniques with accuracy and precision assessments. The goal of this study was to investigate the accuracy and precision of our validated scoring technique using this technology by multiple observers.

Materials and Methods

Patient Population

This retrospective study was performed on 154 of the 159 slides from our original validation study as previously described.19 In brief, 187 consecutive patients with breast cancer were selected who underwent routine HER2 testing between January 22, 2008, and February 13, 2008, at the Mayo Clinic. Of the original 187 patients, the data for 28 were excluded owing to various reasons (FISH hybridization failure, equivocal FISH results, or insufficient tissue for FISH analysis), leaving a total of 159 patients for the original validation study. The 5 specimens excluded from this study were because of significant slide damage (eg, broken, cracked) that prohibited them from being scanned by the imaging instrument for analysis. All cases had FISH analysis performed to determine ERBB2 amplification status regardless of immunohistochemical score, which is a departure from our clinical practice of performing FISH analysis on immunohistochemically 2+ cases only. The mean, median, and range of ages of the patients in this study were 63.9, 64.0, and 24 to 97 years, respectively. The study population included 12 metastatic lesions and 142 primary breast tumors.

HER2 Immunohistochemical Analysis

Breast cancer specimens were stained as part of clinical practice using HercepTest, a semiquantitative assay used to determine HER2 protein overexpression. Our laboratory analyzes HER2-stained slides using the Automated Cellular Imaging System (ACIS) III (DAKO, Carpinteria), which is an automated image analysis system that uses a bright-field microscope. The ACIS digitizes immunohistochemically stained tissue slides, which can then be evaluated using proprietary image analysis software. HER2 slides were subjected to manual (visual assessment using bright-field microscopy) and ACIS-assisted assessments using our validated scoring method. Our validated scoring method consists of the mean score from 6 areas similar to the area of tissue seen using a 40× objective, with 2 areas each of high-intensity cytoplasmic membrane staining, moderate-intensity cytoplasmic membrane staining, and low-intensity membrane staining. When low-intensity staining was absent, 2 tumor areas without cell membrane staining were substituted.

The study participants were 3 surgical pathologists (SPs) and 3 cytotechnologists (CTs). CTs were included because it is our laboratory practice to have CTs prescreen all HER2 immunohistochemically stained breast cancer specimens before pathologist review and final sign-out. First, pathologists and CTs performed manual assessments of the 154 slides without knowledge of existing immunohistochemical scores or FISH results and recorded their findings based on the HercepTest scale (0, 1+, 2+, and 3+) using the specific immunohistochemical scoring criteria defined in the ASCO/CAP guidelines Table 1.10,20 A percentage estimate of cells staining at each of the 4 levels (0, 1+, 2+, and 3+) was also collected at this time to interpret the immunohistochemical slides using the HercepTest package insert method (Table 1). Second, ACIS-assisted reviews were performed by the CTs after a minimum 2-week washout period. Finally, to determine intraobserver variability, participants reanalyzed 20 slides that represented a range of immunohistochemical scores, manually (SPs and CTs) and with ACIS-assistance (CTs) after an additional 2-week period.

View this table:
Table 1

Statistical Analysis

Statistical analysis comparing interobserver and intraobserver variability was performed using Graphpad Software (Graphpad Software, La Jolla, CA) using the “Quantify agreement with κ” function.


Observer concordance with FISH is summarized in Table 2. All SP and CT immunohistochemical assessments (manual and ACIS-assisted) had perfect and near-perfect concordance with FISH in the negative (0 and 1+) HercepTest scoring categories (100% and 97%–99%, respectively). There were 6 false-positive cases (HER2 immunohistochemical score 3+/FISH nonamplified) identified for all assessments with 3 identified in the SP assessments, 2 in the CT ACIS-assisted assessments, and 1 in the CT manual assessments. The average positive predictive values of a 3+ HER2 immunohistochemical score for SP, CT (manual), and CT (ACIS-assisted) were 93%, 96.7%, and 94%, respectively. The overall average correlations of immunohistochemical scores between SPs was 70.3%, whereas the correlation of CT assessments was 80.5% for manual interpretation and 78.8% for ACIS-assisted interpretation. If the 0 and 1+ categories were combined, overall average correlations for SP, CT manual assessments, and CT ACIS-assisted assessments improved to 87.9%, 90.9%, and 93.1%, respectively.

CTs with ACIS-assistance identified fewer (mean, 17.3) equivocal immunohistochemical 2+ cases than in their manual assessments (mean, 23.3) and SP manual assessments (mean, 25.7). Of 3 CTs, 2 called an average of 9 fewer 2+ cases with ACIS assistance than the CT manual assessments, and a higher proportion of the 2+ cases identified by the 2 CTs using the ACIS method were FISH+ than their manual readings (30% vs 24%). Overall, the percentages of equivocal (2+) immunohistochemical cases that were positive by FISH were 16.9%, 25.7%, and 28.8% for the SP, CT manual, and CT ACIS-assisted assessments, respectively.

We compared the performance of HER2 immunohistochemical manual assessment using the HercepTest scoring algorithm and the ASCO/CAP-recommended scoring algorithm (Table 1) based on data collected for estimated staining percentages by the multiple observers in our study. Table 3 shows observer scores when strict adherence to the HercepTest scoring algorithm was followed as described in the package insert.20 The majority of the SPs (2/3) would have called more 2+ cases (mean, 2 cases) using the Hercep-Test package insert instructions than when using the ASCO/CAP guidelines, with a lower percentage of these 2+ cases being FISH+ (mean, 13% vs 16%). The majority of SPs (2/3) would also have called more 3+ cases (mean, 3 cases), with a lower percentage of the cases being FISH+ than when these cases were manually assessed under the current ASCO/CAP guidelines (87.5% vs 96%). Two CTs called fewer 2+ cases using the package insert criteria (mean, 2.5 cases), but a lower percentage of these cases was FISH+ (mean, 19% vs 27.5%) compared with using the ASCO/CAP criteria. Also in contrast with the SPs, all 3 CTs identified more 3+ cases (mean, 2.3 cases), but a similar proportion of these cases was positive by FISH on average, when compared with their original manual assessments (97.3% vs 96.7%).

View this table:
Table 2
View this table:
Table 3
View this table:
Table 4

We also examined interobserver and intraobserver agreement levels (κ statistic) Table 4. Interobserver reproducibility statistics show average κ scores of 0.690, 0.747, and 0.779 for SP, CT manual, and CT ACIS-assisted assessments, respectively. In addition, 2 of 3 CTs saw improvements in their reproducibility using the ACIS-assisted method compared with their manual assessments. Intraobserver reproducibility statistics show average κ scores of 0.766, 0.828, and 0.882 for SP, CT manual, and CT ACIS-assisted, respectively. Of 3 cytotechnologists, 2 saw an improvement of this measure using our validated image analysis technique compared with the manual method.


This study demonstrates that laboratory personnel can achieve high precision and high accuracy with image analysis of HER2-stained slides using a laboratory-validated image analysis scoring technique. In fact, by some measurements, the image analysis technique performed better than manual assessments. We saw increased concordance with FISH and better interobserver and intraobserver reproducibility overall with the ACIS-assisted method over manual assessments. Concordance rates with FISH were perfect (100%) in the HercepTest– 0 category, regardless of the analysis method. In the HercepTest– 1+ category, we saw similar false-negative rates across all analysis methods (range, 1%–3%). These are encouraging findings because they suggest that few cases are being “undercalled.” Our data also show that patients who have a strongly positive (3+) immunohistochemical result are also being appropriately identified, an important finding because patients with these results are also known to derive the most therapeutic benefit from HER2 pathway inhibitory agents. Pathologists’ concordance with FISH in the 3+ category ranged from 87% to 100%, CT 2 had perfect (100%) concordance regardless of analysis method, and CT 1 improved slightly and CT 3 saw a slight decrease in performance when using the image analysis technique.

Another significant finding in our study was the decrease in the number of cases being interpreted as equivocal (2+) using our validated ACIS-assisted technique. Of 3 CTs, 2 identified fewer 2+ cases (4 and 14 cases, respectively) when using the ACIS, while the percentage of 2+ cases that were FISH+ increased on average (30% vs 24%). This observation, coupled with the identification of more immunohistochemically negative (0, 1+) cases and a similar number of cases in the positive (3+) category for ACIS-assisted assessments, suggests that the ACIS is appropriately downgrading a proportion of these equivocal cases without missing FISH+ cases. Cantaloni et al21 found that they were able to reduce the number of specimens submitted for FISH analysis by 18% by using computer-assisted immunohistochemical analysis on a cohort of equivocal 2+ cases while maintaining a very low (1%) false-negative rate. These data suggest that a validated image analysis technique can reduce the number of equivocal (2+) cases that would be reflexed to FISH while maintaining a very low false-negative rate.

HER2 immunohistochemical analysis is typically performed using 1 of 2 FDA-approved kits, HercepTest and Pathway. An important and somewhat controversial issue is that the recommended ASCO/CAP guidelines are significantly different from the HercepTest scoring algorithm. This panel now recommends that more than 30% of tumor cells must show circumferential membrane staining to consider a tumor as strongly positive (3+). This would result in all cases that are greater than 10% but less than or equal to 30%, which would be called strongly positive (3+) according to the HercepTest package insert, now being called equivocal (2+) under the new guidelines.10 Theoretically, adoption of these criteria could lead to more equivocal immunohistochemical cases, which thus would be subjected to additional FISH testing.

We were able to compare the performance of manual assessment of HER2 immunohistochemical analysis using the HercepTest scoring algorithm and the ASCO/CAP-recommended scoring algorithm (Table 1) based on the estimated staining percentages recorded by multiple observers in our study. If interpretations were made based on the percentage categorization (0, 1+, 2+, and 3+) of the tumor cells by SPs and CTs, more cases would have been called equivocal (2+) by 2 of 3 SPs, thus potentially increasing the number of cases reflexed to FISH. However, CTs did not see a similar increase in the number of equivocal (2+) cases when using the package insert method. A relatively low percentage of these equivocal cases were positive by FISH on average when using the package insert method (13% for SPs and 17% for CTs). In comparison, average FISH positivity rates were considerably higher in the equivocal category for the ASCO/CAP guideline methods for SP (16.9%) and CT manual (25.7%) interpretations and the CT ACIS-assisted method (28.8%). Our data also suggest that pathologists could have overcalled (3+ immunohistochemical staining/FISH–) a higher proportion of cases using the package insert method vs using the ASCO/CAP guidelines (15% vs 7%, respectively).

We found that by using image analysis, the number of equivocal HER2 immunohistochemical results can be reduced, thus reducing the number of cases reflexed to FISH analysis. However, the optimal HER2 testing algorithm is still under debate.22 Several modeling and meta-analysis studies have been performed and seem to offer little clarity to this issue.2326 Elkin et al23 concluded that it would be more cost-effective to use FISH alone or as confirmation of all positive HercepTest results, rather than using FISH to confirm only weakly positive results or using HercepTest alone, whereas Lidgren et al24 concluded in their analysis that FISH testing for all patients is a cost-effective treatment option from a societal perspective. Dendukuri et al25 performed a meta-analysis and found that the strategy with the lowest cost-effectiveness ratio involved screening all newly diagnosed cases of breast cancer with immunohistochemical analysis and confirming scores of 2+ or 3+ with FISH. The current ASCO/CAP guidelines recommend that clinicians first perform immunohistochemical analysis, with equivocal results (2+) reflexed to FISH. The published guidelines identify a potential role for image analysis as a means to achieve consistent interpretation. Image analysis is being used by some institutions, as shown by a recent survey distributed to participants in the HER2 immunohistochemical analysis proficiency program that indicated that 33% of laboratories are currently using quantitative computer image analysis.27

Recent image analysis studies have been published that show positive correlations with FISH, that meet or exceed the guideline goal of 95% concordance for negative and positive immunohistochemical results.15,18 Dobson et al15 achieved a 95% concordance rate between their image analysis technique and FISH by determining the extent of circumferential HER2 staining along with other tissue staining features. While our ACIS-assisted method increased the overall concordance of HER2 immunohistochemical analysis and FISH, our method also improved interobserver and intraobserver reproducibility. Of 3 CTs in our study, 2 noticed improvements in interobserver and intraobserver variability by using image analysis, and the CT ACIS-assisted method had better κ statistic values overall on average than CT manual and SP assessments. This finding is consistent with the findings of other investigators who have demonstrated that image analysis can decrease the variation in HER2 immunohistochemical scoring.16,17,28

Strengths of our study include our use of clinical patient samples, a multiple observer study format with pathologist and cytotechnologist participants, a clearly defined and validated image analysis method, and a single “gold standard” for all study specimens. A potential limitation to this study is that we compared the HercepTest and ASCO/CAP criteria based on our presumption that an observer would make a manual immunohistochemical interpretation using the estimated percentage levels of tumor staining. It is also possible that other immeasurable variables could impact a final interpretation outside of simple percentage assessments. In addition, the specimens used for this study were obtained during a 2-month time frame. It is possible, but unlikely, that variabilities in staining from lot changes and/or shifts in staining personnel could have affected the results of our study if specimens were obtained during a longer time. Image analysis with the ACIS relies on consistent staining quality since the user is unable to modify the instruments’ color thresholding algorithm in the FDA-approved HER2 software application.

The results from this study demonstrate that a lofty goal of 95% concordance rates between immunohistochemical analysis and FISH is achievable using the ACIS image analysis instrument with our validated scoring algorithm. Interobserver and intraobserver variability can also be improved over manual assessment through the use of this ACIS-assisted scoring technique. It is clear that additional experiential studies are needed from other investigators to determine whether this technique is reproducible at other institutions. As the spotlight grows brighter on laboratory testing, it will become increasingly important for laboratories to demonstrate accuracy, precision, and cost-effectiveness of HER2 immunohistochemical testing, and we believe that image analysis can be a useful method to achieve these objectives.


Upon completion of this activity you will be able to:

  • discuss the importance of concordance between immunohistochemistry and fluorescence in situ hybridization results in evaluating HER2 status in breast cancer patients.

  • describe the validated image analysis technique for HER2 evaluation.

  • compare performance characteristics of manual HER2 immunohistochemical assessment and the image analysis method.

  • discuss the effect on inter- and intraobserver variability measurements using the validated image analysis method.

The ASCP is accredited by the Accreditation Council for Continuing Medical Education to provide continuing medical education for physicians. The ASCP designates this journal-based CME activity for a maximum of 1 AMA PRA Category 1 Credit ™ per article. Physicians should claim only the credit commensurate with the extent of their participation in the activity. This activity qualifies as an American Board of Pathology Maintenance of Certification Part II Self-Assessment Module.

The authors of this article and the planning committee members and staff have no relevant financial relationships with commercial interests to disclose.

Questions appear on p 319. Exam is located at www.ascp.org/ajcpcme.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
View Abstract