Cancer Res Treat.  2025 Jan;57(1):116-125. 10.4143/crt.2024.113.

Molecular Classification of Breast Cancer Using Weakly Supervised Learning

Affiliations
  • 1Department of Pathology, Korea University Guro Hospital, Korea University College of Medicine, Seoul, Korea
  • 2Department of Medical and Digital Engineering, Hanyang University College of Engineering, Seoul, Korea
  • 3Department of Pathology, Korea University Anam Hospital, Korea University College of Medicine, Seoul, Korea
  • 4Division of Oncology/Hematology, Department of Internal Medicine, Korea University Anam Hospital, Korea University College of Medicine, Seoul, Korea
  • 5Department of Hospital Pathology, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
  • 6Artificial Intelligence Center, Korea University Anam Hospital, Korea University College of Medicine, Seoul, Korea
  • 7Department of Medical Informatics, Korea University College of Medicine, Seoul, Korea

Abstract

Purpose
The molecular classification of breast cancer is crucial for effective treatment. The emergence of digital pathology has ushered in a new era in which weakly supervised learning leveraging whole-slide images has gained prominence in developing deep learning models because this approach alleviates the need for extensive manual annotation. Weakly supervised learning was employed to classify the molecular subtypes of breast cancer.
Materials and Methods
Our approach capitalizes on two whole-slide image datasets: one consisting of breast cancer cases from the Korea University Guro Hospital (KG) and the other originating from The Cancer Genomic Atlas dataset (TCGA). Furthermore, we visualized the inferred results using an attention-based heat map and reviewed the histomorphological features of the most attentive patches.
Results
The KG+TCGA-trained model achieved an area under the receiver operating characteristics value of 0.749. An inherent challenge lies in the imbalance among subtypes. Additionally, discrepancies between the two datasets resulted in different molecular subtype proportions. To mitigate this imbalance, we merged the two datasets, and the resulting model exhibited improved performance. The attentive patches correlated well with widely recognized histomorphologic features. The triple-negative subtype has a high incidence of high-grade nuclei, tumor necrosis, and intratumoral tumor-infiltrating lymphocytes. The luminal A subtype showed a high incidence of collagen fibers.
Conclusion
The artificial intelligence (AI) model based on weakly supervised learning showed promising performance. A review of the most attentive patches provided insights into the predictions of the AI model. AI models can become invaluable screening tools that reduce costs and workloads in practice.

Keyword

Breast neoplasms; Weakly supervised learning; Molecular classification; Computational pathology

Figure

  • Fig. 1. Patient inclusion, exclusion criteria, and dataset configurations. To optimize model performance and mitigate class imbalance, the TCGA dataset and the KG dataset (an in-house dataset) were consolidated. DCIS, ductal carcinoma in situ; KG, breast cancer cases from Korea University Guro Hospital; TCGA, breast cancer cases from The Cancer Genomic Atlas.

  • Fig. 2. Representative patches of the recorded histomorphologic features (A, tumor patches; B, non-tumor patches). Tumor patches encompassed high-grade nucleus (1), tumor necrosis (2), intratumoral tumor-infiltrating lymphocytes (TILs) (3), and stromal TILs (4). Non-tumor patches included lymphoid aggregates (5), collagen fibers (6), red blood cells (7), neutrophils (8), and skin (9) (H&E stain, ×200).

  • Fig. 3. Diagrammatic representation of the study design. The input whole-slide images were segmented into multiple instances, each encoded using a pretrained encoder (ResNet encoder). These instances were then aggregated and applied to classify breast cancer subtypes. The attention score, indicative of the degree of focus, identified the most significant instances pertinent to subtype classification. HER-2, human epidermal growth factor receptor 2; TNBC, triple-negative breast cancer (H&E stain, ×200).

  • Fig. 4. The area under the receiver operating characteristics of artificial intelligence models based on the training and test datasets. An upward trend with the merged dataset (KG+TCGA) is noted. AUROC, area under the receiver operating characteristics; KG, Korea University Guro Hospital; TCGA, The Cancer Genomic Atlas.

  • Fig. 5. Confusion matrix of prediction outputs. The diagonal axis represents accurate predictions, and each confusion matrix has been normalized. HER-2, human epidermal growth factor receptor 2; TNBC, triple-negative breast cancer.

  • Fig. 6. Visualization of attention within a given whole-slide image (WSI). (A) A thumbnail of the WSI. (B) The attention weights attributed to each instance. (C) The magnified instances that are most significant for model predictions are depicted (H&E stain, ×200).

  • Fig. 7. Representative top attentive patches of human epidermal growth factor receptor 2 (HER-2) (A) and triple-negative breast cancer (TNBC) subtypes (B). Both subtypes showed a high frequency of high-grade nucleus, tumor necrosis, and intratumoral tumor-infiltrating lymphocytes (TILs) patches (H&E stain, ×200).


Reference

References

1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021; 71:209–49.
Article
2. Yersal O, Barutca S. Biological subtypes of breast cancer: prognostic and therapeutic implications. World J Clin Oncol. 2014; 5:412–24.
Article
3. Lim SK, Lee MH, Park IH, You JY, Nam BH, Kim BN, et al. Impact of molecular subtype conversion of breast cancers after neoadjuvant chemotherapy on clinical outcome. Cancer Res Treat. 2016; 48:133–41.
Article
4. Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, et al. Molecular portraits of human breast tumours. Nature. 2000; 406:747–52.
Article
5. Jaber MI, Song B, Taylor C, Vaske CJ, Benz SC, Rabizadeh S, et al. A deep learning image-based intrinsic molecular subtype classifier of breast tumors reveals tumor heterogeneity that may affect survival. Breast Cancer Res. 2020; 22:12.
Article
6. Goldhirsch A, Winer EP, Coates AS, Gelber RD, Piccart-Gebhart M, Thurlimann B, et al. Personalizing the treatment of women with early breast cancer: highlights of the St Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer 2013. Ann Oncol. 2013; 24:2206–23.
Article
7. Hagemann IS. Molecular testing in breast cancer: a guide to current practices. Arch Pathol Lab Med. 2016; 140:815–24.
Article
8. Lawrie CH, Ballabio E, Soilleux E, Sington J, Hatton CS, Dirnhofer S, et al. Inter- and intra-observational variability in immunohistochemistry: a multicentre analysis of diffuse large B-cell lymphoma staining. Histopathology. 2012; 61:18–25.
Article
9. Echle A, Rindtorff NT, Brinker TJ, Luedde T, Pearson AT, Kather JN. Deep learning in cancer pathology: a new generation of clinical biomarkers. Br J Cancer. 2021; 124:686–96.
Article
10. Dimitriou N, Arandjelovic O, Caie PD. Deep learning for whole slide image analysis: an overview. Front Med (Lausanne). 2019; 6:264.
Article
11. Yao J, Zhu X, Jonnagaddala J, Hawkins N, Huang J. Whole slide images based cancer survival prediction using attention guided deep multiple instance learning networks. Med Image Anal. 2020; 65:101789.
Article
12. Teramoto A, Kiriyama Y, Tsukamoto T, Sakurai E, Michiba A, Imaizumi K, et al. Weakly supervised learning for classification of lung cytological images using attention-based multiple instance learning. Sci Rep. 2021; 11:20317.
Article
13. Gadermayr M, Tschuchnig M. Multiple instance learning for digital pathology: a review of the state-of-the-art, limitations & future potential. Comput Med Imaging Graph. 2024; 112:102337.
Article
14. Campanella G, Hanna MG, Geneslaw L, Miraflor A, Werneck Krauss Silva V, Busam KJ, et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med. 2019; 25:1301–9.
Article
15. Li B, Li Y, Eliceiri KW. Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. Conf Comput Vis Pattern Recognit Workshops. 2021; 2021:14318–28.
Article
16. Couture HD, Williams LA, Geradts J, Nyante SJ, Butler EN, Marron JS, et al. Image analysis with deep learning to predict breast cancer grade, ER status, histologic subtype, and intrinsic subtype. NPJ Breast Cancer. 2018; 4:30.
Article
17. Liu H, Xu WD, Shang ZH, Wang XD, Zhou HY, Ma KW, et al. Breast cancer molecular subtype prediction on pathological images with discriminative patch selection and multiinstance learning. Front Oncol. 2022; 12:858453.
Article
18. Lu MY, Williamson DF, Chen TY, Chen RJ, Barbieri M, Mahmood F. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat Biomed Eng. 2021; 5:555–70.
Article
19. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In : 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016 Jun 27-30; Las Vegas, NV, USA. Piscataway, NJ: Institute of Electrical and Electronics Engineers;2016. p. 770–8.
Article
20. Wu R, Oshi M, Asaoka M, Yan L, Benesch MG, Khoury T, et al. Intratumoral tumor infiltrating lymphocytes (TILs) are associated with cell proliferation and better survival but not always with chemotherapy response in breast cancer. Ann Surg. 2023; 278:587–97.
Article
21. Kos Z, Roblin E, Kim RS, Michiels S, Gallas BD, Chen W, et al. Pitfalls in assessing stromal tumor infiltrating lymphocytes (sTILs) in breast cancer. NPJ Breast Cancer. 2020; 6:17.
22. Wiechmann L, Sampson M, Stempel M, Jacks LM, Patil SM, King T, et al. Presenting features of breast cancer differ by molecular subtype. Ann Surg Oncol. 2009; 16:2705–10.
Article
23. Brockmoeller S, Echle A, Ghaffari Laleh N, Eiholm S, Malmstrom ML, Plato Kuhlmann T, et al. Deep learning identifies inflamed fat as a risk factor for lymph node metastasis in early colorectal cancer. J Pathol. 2022; 256:269–81.
Article
24. Masood S. Breast cancer subtypes: morphologic and biologic characterization. Womens Health (Lond). 2016; 12:103–19.
Article
25. Valenza C, Taurelli Salimbeni B, Santoro C, Trapani D, Antonarelli G, Curigliano G. Tumor infiltrating lymphocytes across breast cancer subtypes: current issues for biomarker assessment. Cancers (Basel). 2023; 15:767.
Article
26. Kim D, Yu Y, Jung KS, Kim YH, Kim JJ. Tumor microenvironment can predict chemotherapy response of patients with triple-negative breast cancer receiving neoadjuvant chemotherapy. Cancer Res Treat. 2024; 56:162–77.
Article
27. Mujtaba SS, Ni YB, Tsang JY, Chan SK, Yamaguchi R, Tanaka M, et al. Fibrotic focus in breast carcinomas: relationship with prognostic parameters and biomarkers. Ann Surg Oncol. 2013; 20:2842–9.
Article
28. Wang X, Steensma JT, Bailey MH, Feng Q, Padda H, Johnson KJ. Characteristics of The Cancer Genome Atlas cases relative to U.S. general population cancer cases. Br J Cancer. 2018; 119:885–92.
Article
29. Kong X, Liu Z, Cheng R, Sun L, Huang S, Fang Y, et al. Variation in breast cancer subtype incidence and distribution by race/ethnicity in the United States from 2010 to 2015. JAMA Netw Open. 2020; 3:e2020303.
Article
30. Aryal M, Yahyasoltani N. Context-aware self-supervised learning of whole slide images. Preprint arXiv at: http://arxiv.org/abs/2306.04763 (2023).
Article
Full Text Links
  • CRT
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2025 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr