OUP user menu

Software-Automated Counting of Ki-67 Proliferation Index Correlates With Pathologic Grade and Disease Progression of Follicular Lymphomas

Mark A. Samols MD, PhD, Nathan E. Smith MD, Jonathan M. Gerber MD, Milena Vuica-Ross MD, PhD, Christopher D. Gocke MD, Kathleen H. Burns MD, PhD, Michael J. Borowitz MD, PhD, Toby C. Cornish MD, PhD, Amy S. Duffield MD, PhD
DOI: http://dx.doi.org/10.1309/AJCPTMA1F6LWYTQV 579-587 First published online: 1 October 2013

Abstract

Objectives: To examine the accuracy of software-assisted measurement of the Ki-67 proliferation index (PI) and its correlation with the grade and clinical progression of follicular lymphoma (FL).

Methods: High-power field equivalents were extracted from H&E- and Ki-67–immunostained slides of FL, and a nuclear quantitation algorithm was used to calculate a PI. Representative fields were manually counted for validation with close agreement.

Results: The PI was significantly higher in World Health Organization grade 3 FL than grade 1 to 2 FL. Disease progression, as defined by subsequent treatment with radiation or cytotoxic chemotherapy, was also significantly associated with elevated PI but not pathologic grade.

Conclusions: These data show that software-automated quantitation of Ki-67 can provide both a useful adjunct to pathologic grade in FL and improved prognostic information for patients.

Key Words:
  • Follicular lymphoma
  • Ki-67
  • Automated counting
  • Histologic grade
  • Proliferation index
  • Quantitation

Follicular lymphoma (FL) is the most common low-grade B-cell lymphoma and is characterized by neoplastic expansion of follicle center cells. Disease progression in FL is often relatively slow, although more rapid progression is associated with a high score on the FL International Prognostic Index, the presence of associated diffuse large B-cell lymphoma (DLBCL), and high histologic grade.1 Pathologic examination of the tissue is required to determine the histologic grade; currently, this is determined by calculating the mean number of centroblasts per ×40 high-power field (hpf) in 10 neoplastic follicles.2,3

In the 2001 World Health Organization (WHO) classification,4 FL was subdivided into 3 grades: grade 1 was defined as having 0 to 5 centroblasts/hpf, grade 2 with 6 to 15 centroblasts/hpf, and grade 3 with greater than 15 centroblasts/hpf. Grade 3 was further subdivided into 3A, in which the neoplastic follicles contain a mixture of centroblasts and centrocytes, and grade 3B, in which the neoplastic follicles only contain centroblasts. The 3 grades were shown to correlate roughly with disease prognosis, with grades 1 and 2 having a more indolent course and prolonged survival as compared with grade 3 tumors.5,6 In the 2008 WHO classification, grades 1 and 2 were combined into grade 1 to 2 (low-grade) FL due to poor interobserver reproducibility and lack of significant survival differences between grades 1 and 2.2,7,8

The correlation between the Ki-67 proliferation index (PI), histologic grade, and prognosis has been studied in several lymphomas, including FL. These studies relied on various methods for determining Ki-67 PI, including estimating positive cells as well as the more precise but laborious manual counting of cells. In general, these studies demonstrate that high proliferative activity is associated with more aggressive lymphomas, although there was not a clear consensus regarding a “cutoff” Ki-67 PI at which a lymphoma can be designated high or low grade.913

Because it is impractical to count positive cells manually for routine diagnostic purposes and visual estimates are subjective and dependent on observer experience, image analysis is becoming increasingly commonplace for the quantitation of biomarkers in formalin-fixed, paraffin-embedded tissue. Several manufacturers have received Food and Drug Administration clearance to market image analysis software for the measurement of estrogen receptor, progesterone receptor, and HER2 expression in breast cancer, and these methods are well established in current clinical practice.1416 Studies have shown these methods to be accurate and comparable to carefully performed manual quantitation.1719 Thus, the use of image analysis methods can provide consistent and objective measurements of labeling indices.

Several studies have examined the use of image analysis software to measure the nuclear staining index of Ki-67 in lymphomas. A few studies have focused on the measurement of PI in FL and found that automated Ki-67 counts were similar to manual counts; however, these studies were either performed before modern image analysis techniques were developed or lacked associated clinical information.20,21 Interestingly, a recent study that used automated methods to quantify the PI concluded that software-assisted counting provided no significant association with either pathologist estimate or survival in DLBCL.22

In this study, we used current technology for slide scanning and quantitation of immunohistochemical (IHC) labeling to determine the Ki-67 PI and investigated its relationship to histologic grade and disease progression in FL. We found that image analysis provides a relatively rapid and reproducible method for objective quantitation of Ki-67 PI in FL and that PI correlates with both histologic grade and clinical progression of disease.

Materials and Methods

The pathology database at The Johns Hopkins Hospital was searched using the keywords follicular and lymphoma. All cases of FL from January 2001 to December 2010 were identified. Of note, rituximab was in use at our institution by 2001, and core biopsy specimens were excluded from the study. Only cases that had at least 1 H&E slide as well as selected corresponding IHC stains (CD20, CD3, Bcl-2, and Ki-67) available for review were included. The H&E and IHC stains were reviewed in all cases. Since the WHO classification for FL changed with the 2008 edition, the collected cases follow either the old 3-tiered or newer 2-tiered grading system, depending on when the original diagnosis was made.

H&E and Ki-67 (Ventana Medical Systems, Tucson, AZ) slides were scanned using the ×20 objective (0.498 μm/pixel) on a ScanScope CS slide scanner (Aperio, San Diego, CA). Ten hpf-equivalent fields (550 × 550 μm) of representative neoplastic follicles were selected using the H&E slides, and the corresponding fields were extracted from the Ki-67 whole slide image. Only intact fields without crush or other artifact were selected.

The extracted fields were uploaded to a Spectrum digital slide repository (Aperio). An analysis macro based on the Nuclear Quantitation algorithm from Aperio’s image analysis tool box was then tuned using a subset of images representing low, medium, and high Ki-67 indices. The algorithm parameters used in this study are provided in Table 1. After tuning, the analysis macro was applied to the entire image set, identifying nuclei and classifying them as positive or negative for Ki-67. Briefly, the Aperio Nuclear Quantitation algorithm first separates 3,3′-diaminobenzidine (DAB) and hematoxylin stains by applying color deconvolution using the nuclear (hematoxylin) and positive (DAB) optical density vectors given in Table 1.23 The hematoxylin image is smoothed by applying an averaging filter with a radius of 3 pixels. The nuclei are segmented from background using intensity-based segmentation with an intensity range of 0 to 200. A curvature-based watershed algorithm separates overlapping nuclei, and the resulting nuclei are filtered using shape descriptors, including size, roundness, compactness, and elongation.24 Nuclei with a mean DAB value of less than 190 are classified as positive.

For validation of automated counts, 10 fields from 3 cases representing low, medium, and high Ki-67 indices were manually counted using a custom ImageJ-based macro (National Institutes of Health, Bethesda, MD). Using this macro, human observers (M.A.S. and N.E.S.) annotated each nucleus as positive or negative for Ki-67 using different-colored markers. The annotated images were saved, and the nuclear markers were counted by the software. Correlation between the automated and manual counts was calculated using a Pearson correlation coefficient.

View this table:
Table 1
View this table:
Table 2

Clinical data were collected from The Johns Hopkins electronic patient records in accordance with institutional review board–approved protocol NA_00051478. Treatment was classified as radiation therapy and/or cytotoxic chemotherapy. Single-agent treatment with rituximab or prednisone was not regarded as cytotoxic chemotherapy. Statistical significance between the Ki-67 PI and pathologic grade or progression of disease was calculated using a Wilcoxon 1-tailed rank sum test. Pathologic grade was compared with progression of disease using a Fisher exact test.

Results

Thirty-one cases of FL were identified, including 19 cases that were WHO grade 1 to 2 and 12 cases that were WHO grade 3 Table 2. We first assessed the validity of the software-automated counts. To accomplish this, 3 cases of FL were selected that demonstrated low, medium, and high PI levels. For each case, images of all 10 selected fields were manually counted for both positive and negative nuclei Image 1. Ki-67 PI was also estimated in all fields by 5 experienced hematopathologists (M.V.-R., C.D.G., K.H.B., M.J.B., and A.S.D.) who did not have knowledge of the tumors’ histologic grade or measured PI.

Quantitation of PI by image analysis showed a strong positive correlation with the manual Ki-67 PI count (Pearson r = 0.99, P < .001) Figure 1. The software consistently counted greater numbers of both positive and negative nuclei per field, with an average of 10.9% more positive nuclei and 4.8% more negative nuclei. The percentage of Ki-67–positive nuclei was almost identical to those from the manual counts. In comparison, the Ki-67 PI estimates from hematopathologists slightly overestimated the PI in cases with a higher PI but showed no significant difference from the automated results in cases with lower PIs. The hematopathologists estimated a mean ± SD PI 0.7% ± 1.3% greater than the algorithm for low PI fields, 3.0% ± 6.6% greater for the medium PI fields, and 8.0% ± 5.3% greater for the set of high PI fields.

Next, the Ki-67 PI of all 31 FL cases was compared with the original histologic grade. In keeping with the 2008 WHO classification, a distinction was not made between grade 1 and grade 2 FL. Higher histologic grade was significantly associated with a higher PI; grade 1 to 2 FL had a median Ki-67 PI of 21.9%, and grade 3 FL had a median Ki-67 PI of 39.9% (P = .02, Wilcoxon 1-tailed rank sum test) Figure 2A. Although we were able to show a significant difference between the PI of low-grade (WHO 1–2) and intermediate-grade (WHO 3A/B) FL, we were unable to set a discrete cutoff value of PI to separate the grades completely since we found several cases with either low histologic grade and high PI or higher histologic grade and low PI. A second review of these discrepant cases confirmed the original histologic grading.

Image 1

Automated and manual measurement of the Ki-67 proliferation index (PI). Representative examples of extracted fields from scanned Ki-67 immunohistochemical slides: high PI (A–D; B–D depict framed area in A) and low PI (E–H; F–H depict framed area in E) fields. Manual counts identified 2,000 positive and 1,415 negative nuclei for a PI of 58.6% (B) and 173 positive and 3,624 negative nuclei for a PI of 4.6% (F) for the high and low PI fields, respectively. The nuclear quantitation algorithm identified 2,600 positive (yellow) and 1,396 negative (blue) nuclei for a PI of 65.1% for the high PI field (C) and 216 positive and 3,672 negative nuclei for a PI of 5.6% for the low PI field (G). The Ki-67 images are overlaid with the algorithm results (outlines; yellow = positive; blue = negative) and the manual counts (dots; red = positive; green = negative), showing the close correlation for these methods (D, H).

At our institution, patients with FL typically do not require immediate treatment with cytotoxic chemotherapy or radiotherapy unless there is evidence of aggressive disease such as a large tumor mass, B symptoms, or organ compression. Due to the long natural history of FL and the fact that many of the selected cases were relatively recent, the need to treat with radiation therapy or cytotoxic chemotherapy was used as an end point rather than overall survival, and the clinician’s decision to treat the patient was used as a marker for disease progression. Clinical follow-up data were available for 26 of the 31 FL cases. One patient with multiple medical problems died of other causes soon after the diagnosis of FL and was not included in the analysis. Of the remaining 25 patients, 5 received no treatment, 4 received radiation therapy, and 16 received cytotoxic chemotherapy Table 3. Higher PI was significantly associated with the need to treat; the 5 untreated patients had a median Ki-67 PI of 10.0%, whereas the 20 treated patients had a median Ki-67 PI of 31.4% (P = .02, Wilcoxon rank sum test) Figure 2B. A cutoff PI value of 15% strongly correlated with a need to treat in 4 of 5 untreated patients and in 2 of 20 treated patients falling below this threshold (P = .005, Fisher exact test). In our data set, intermediate histologic grade (WHO 3A/B) was not significantly associated with disease progression (P = .31, Fisher exact test).

Discussion

Follicular lymphoma is one of the most common lymphomas, but accurate grading of this neoplasm is difficult and shows relatively poor interobserver agreement.7,8 Current grading schemata rely on morphologic identification and quantitation of centroblasts; however, variations in fixation and staining can make this seemingly straightforward task relatively difficult. We investigated whether recent advances in the automated quantitation of IHC stains provide a reliable means to quantify the Ki-67 PI to improve the prognostication of FL.

Our results show that the PI in FL neoplastic follicles determined using image analysis correlates strongly with precise Ki-67 PI counts performed manually. Hematopathologists had a tendency to overestimate PIs with higher Ki-67 staining, although their PI estimates were fairly accurate in the lower range; however, it should be noted that these estimates were not made in the normal course of reviewing slides but rather during a specific exercise estimating the exact field quantified by the software. The software-quantified PI was also associated with both histologic grade (3) and the need to treat. Although a cutoff value of PI to separate the histologic grades could not be defined, a PI cutoff of 15% strongly correlated with the need to treat.

Figure 1

Validation of the algorithm-derived Ki-67 proliferation index (PI). The software algorithm counted an average of 10.9% more positive nuclei (A) and 4.8% more negative nuclei (B) for each field. However, the PI for each field was very similar to the algorithm, calculating on average a higher PI of 3.5% (C). The algorithm and manual counts for each field are plotted against each other, revealing a strong positive Pearson product-moment correlation (Pearson r = 0.99, df = 28, P < .001) with a slope of 1.04 (D).

Figure 2

Software-automated measurement of the proliferation index (PI) is associated with histologic grade and disease progression. A, Using a Wilcoxon 1-tailed rank sum test, higher PI showed a significant positive correlation with higher histologic grade; grade 1 to 2 follicular lymphoma (FL) had a median Ki-67 PI of 21.9% and grade 3 FL of 39.9% (P = .02). B, Higher PI was also significantly associated with the need to treat; untreated (n = 5) patients had a median Ki-67% of 10.0% and treated (n = 20) patients of 31.4% (P = .02). In our data set, higher histologic grade was not significantly associated with disease progression (P = .31, Fisher exact test). WHO, World Health Organization.

View this table:
Table 3

A previous study demonstrated that Ki-67 can show a significantly higher PI with grade 2 or 3 compared with grade 1 FL but could not demonstrate a significant difference between grade 2 and grade 3 FL.21 When analyzing data from older cases that used the 3-tiered system, our data show similar results. Within our data set, 6 cases were designated as grade 1, 9 as grade 2, and 4 as grade 1 to 2. Excluding the cases graded 1 to 2, we were able to show significant differences in PI between grade 1 (median PI, 12.6%) and grade 2 (median PI, 35.4%; P = .005). FL grade 1 was also significantly different from FL grade 3 (median PI, 40.3%; P = .002). However, FL grade 2 was not significantly different from grade 3 (P = .14). When we used the newer 2-tiered grading system for all cases, there was a significant difference of PI between low-grade (WHO 1–2) and intermediate-grade (WHO 3A/B) FL (Figure 2A).

Another finding that emerged from these data was that a subset of low-grade FL cases had a high PI. Wang et al25 previously reported a similar group of histologically low-grade FLs that had a PI ranging from 30% to 80%. They demonstrated that although cases of low-grade FL with a high PI had an unexpectedly longer 5-year disease-free survival than those with low-grade FL with low PI, these patients had a significantly shorter overall survival. These authors suggested that these high PI/low-grade FLs had similar clinical behavior to grade 3 FL and should be considered separately from grade 1 to 2 FL. Although more cases with a discrepancy between histologic grade and PI would be needed to prove this definitively, our data are consistent with the conclusions of Wang et al. In the current study, higher PI was significantly associated with the need to treat regardless of the histologic grade, suggesting that cases of low-grade FL with a higher PI clinically behave like a higher-grade neoplasm. For our data set, 10 histologically low-grade cases with a PI greater than 15% required treatment. These high PI/low-grade FLs that act aggressively may account for the lack of correlation between higher histologic grade and the need to treat in our data set.

Other approaches have been made to automate grading of FL that do not depend on Ki-67–derived PI values. One group has developed an algorithm-based approach to count follicular center centroblasts using scanned H&E images.2629 The method uses CD3 and CD20 IHC stains to automatically identify follicles and then hpfs on a corresponding H&E image to identify centroblasts and determine FL grade. However, this approach had a high level of false positives, resulting in an approximately 10% positive predictive value for centroblast detection, and it was also affected by variability in fixation and staining. Different IHC stains besides Ki-67 have also been assessed for the grading of FL. Zhang et al21 reported that SKP2, another proliferation marker, was positively associated with higher FL grade and could distinguish between FL grades 2 and 3, whereas Ki-67 could not. Llanos et al11 found that Bcl-2 IHC stains were negatively associated with higher FL grade, although they lacked prognostic predictive value. In terms of outcomes, Björck et al30 demonstrated that higher expression of cyclin B1 in FL was positively associated with a better response to cyclophosphamide, hydroxydaunorubicin, vincristine, and prednisone chemotherapy. It is thus possible that, in the future, an automated software grading system could incorporate data from multiple IHC stains to establish tumor grade and also provide additional information that will help guide treatment decisions.

This study shows that image analysis provides an accurate and reproducible means to quantify Ki-67 immunostaining in FL. Existing commercial algorithms can be tuned to achieve PIs that closely agree with those determined by human observers performing manual counts. Computer-assisted quantitation of Ki-67 labeling in FL is analogous to computer-assisted quantitation of Ki-67, estrogen receptor, or progesterone receptor in breast cancer and could be integrated into the clinical workflow in a similar manner. While the patient population in this study is relatively small, these data also demonstrate that there is a statistically significant positive correlation between software-automated Ki-67 PI and both histologic grade and disease progression in FL. Thus, application of this new technology could provide a means by which current difficulties in FL grading are circumvented and may provide improved prognostic information for patients and clinicians.

CME/SAM

Upon completion of this activity you will be able to:

  • describe the criteria used to grade follicular lymphoma.

  • discuss the significance of the Ki-67 proliferation index in follicular lymphoma.

  • explain strategies for software-assisted quantification of immunostains.

The ASCP is accredited by the Accreditation Council for Continuing Medical Education to provide continuing medical education for physicians. The ASCP designates this journal-based CME activity for a maximum of 1 AMA PRA Category 1 CreditTM per article. Physicians should claim only the credit commensurate with the extent of their participation in the activity. This activity qualifies as an American Board of Pathology Maintenance of Certification Part II Self-Assessment Module.

The authors of this article and the planning committee members and staff have no relevant financial relationships with commercial interests to disclose.

Questions appear on p 596. Exam is located at www.ascp.org/ajcpcme.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
View Abstract