J Korean Med Sci.  2015 Aug;30(8):1025-1034. 10.3346/jkms.2015.30.8.1025.

Computational Discrimination of Breast Cancer for Korean Women Based on Epidemiologic Data Only

Affiliations
  • 1The Interdisciplinary Program for Bioengineering, Graduate School, Seoul National University, Seoul, Korea.
  • 2Department of Biomedical Engineering, Seoul National University College of Medicine, Seoul, Korea. sungwan@snu.ac.kr
  • 3Graduate School of Cancer Science and Policy and National Cancer Control Institute, National Cancer Center, Goyang, Korea.
  • 4Korea Aerospace Research Institute, Dajeon, Korea.
  • 5Department of Preventive Medicine, Seoul National University College of Medicine, Seoul, Korea.
  • 6Department of Biomedical Science, Seoul National University Graduate School, Seoul, Korea.
  • 7Cancer Research Institute, Seoul National University, Seoul, Korea.
  • 8Department of Mechanical and Aerospace Engineering, Seoul National University College of Engineering, Seoul, Korea.
  • 9Institute of Advanced Aerospace Technology, Department of Mechanical and Aerospace Engineering, Seoul National University, Seoul, Korea.
  • 10Institute of Medical and Biological Engineering, Medical Research Center, Seoul National University, Seoul, Korea.

Abstract

Breast cancer is the second leading cancer for Korean women and its incidence rate has been increasing annually. If early diagnosis were implemented with epidemiologic data, the women could easily assess breast cancer risk using internet. National Cancer Institute in the United States has released a Web-based Breast Cancer Risk Assessment Tool based on Gail model. However, it is inapplicable directly to Korean women since breast cancer risk is dependent on race. Also, it shows low accuracy (58%-59%). In this study, breast cancer discrimination models for Korean women are developed using only epidemiological case-control data (n = 4,574). The models are configured by different classification techniques: support vector machine, artificial neural network, and Bayesian network. A 1,000-time repeated random sub-sampling validation is performed for diverse parameter conditions, respectively. The performance is evaluated and compared as an area under the receiver operating characteristic curve (AUC). According to age group and classification techniques, AUC, accuracy, sensitivity, specificity, and calculation time of all models were calculated and compared. Although the support vector machine took the longest calculation time, the highest classification performance has been achieved in the case of women older than 50 yr (AUC = 64%). The proposed model is dependent on demographic characteristics, reproductive factors, and lifestyle habits without using any clinical or genetic test. It is expected that the model could be implemented as a web-based discrimination tool for breast cancer. This tool can encourage potential breast cancer prone women to go the hospital for diagnostic tests.

Keyword

Breast Neoplasms; Support Vector Machines; Neural Networks; Computers

MeSH Terms

Adult
Aged
Aged, 80 and over
Breast Neoplasms/*diagnosis/*epidemiology
Diagnosis, Computer-Assisted/*methods
Early Detection of Cancer/*methods
Female
Humans
*Machine Learning
Middle Aged
Pattern Recognition, Automated/methods
Prevalence
Reproducibility of Results
Republic of Korea/epidemiology
Risk Assessment/methods
Risk Factors
Sensitivity and Specificity
Women's Health/*statistics & numerical data

Figure

  • Fig. 1 Incidence rates of breast cancer (in 2008): Korean women vs white women in the USA (1415).

  • Fig. 2 Artificial neural network (ANN) structure. AFFP, age of first full-term pregnancy; NOC, number of children; AOMn, age of menarche; BMI, body mass index; FMH, family medical history of breast cancer; MS, menopausal status; RM, regular mammography; RE, regular exercise; ED, estrogen duration.

  • Fig. 3 Naive structure of a Bayesian network (BN).

  • Fig. 4 Receiver operating characteristic (ROC) curves according to the classification algorithms and age division models. (A) Support Vector Machine (SVM). (B) Artificial Neural Network (ANN). (C) Bayesian Network (BN).

  • Fig. 5 Contribution of a specific risk factor on the area under curve (AUC). AFFP, age of first full-term pregnancy; NOC, number of children; AOMn, age of menarche; BMI, body mass index; FMH, family medical history of breast cancer; MS, menopausal status; RM, regular mammography; RE, regular exercise; ED, estrogen duration; SVM, support vector machine; ANN, artificial neural network, BN, Bayesian network; U50, under 50 yr old group; O50, equal to or over 50 yr old group.


Reference

1. Shin HR, Joubert C, Boniol M, Hery C, Ahn SH, Won YJ, Nishino Y, Sobue T, Chen CJ, You SL, et al. Recent trends and patterns in breast cancer incidence among Eastern and Southeastern Asian women. Cancer Causes Control. 2010; 21:1777–1785.
2. Survival analysis of Korean breast cancer patients diagnosed between 1993 and 2002 in Korea: a Nationwide Study of the Cancer Registry. J Breast Cancer. 2006; 9:214–229.
3. National Cancer Institute. Breast cancer risk assessment tool. accessed on 8 December 2014. Available at http://www.cancer.gov/bcrisktool/.
4. Gail MH, Brinton LA, Byar DP, Corle DK, Green SB, Schairer C, Mulvihill JJ. Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst. 1989; 81:1879–1886.
5. Rockhill B, Spiegelman D, Byrne C, Hunter DJ, Colditz GA. Validation of the Gail et al. model of breast cancer risk prediction and implications for chemoprevention. J Natl Cancer Inst. 2001; 93:358–366.
6. Boyd CR, Tolson MA, Copes WS. Evaluating trauma care: the TRISS method. Trauma Score and the Injury Severity Score. J Trauma. 1987; 27:370–378.
7. Levy SM, Herberman RB, Maluish AM, Schlien B, Lippman M. Prognostic risk assessment in primary breast cancer by behavioral and immunological parameters. Health Psychol. 1985; 4:99–113.
8. Choi JP, Han TH, Park RW. A hybrid bayesian network model for predicting breast cancer prognosis. J Korean Soc Med Inform. 2009; 15:49–57.
9. Kiyan T, Yildirim T. Breast cancer diagnosis using statistical neural networks. IU-JEEE. 2004; 4:1149–1153.
10. Ayer T, Alagoz O, Chhatwal J, Shavlik JW, Kahn CE Jr, Burnside ES. Breast cancer risk estimation with artificial neural networks revisited: discrimination and calibration. Cancer. 2010; 116:3310–3321.
11. Burnside ES, Rubin DL, Fine JP, Shachter RD, Sisney GA, Leung WK. Bayesian network to predict breast cancer risk of mammographic microcalcifications and reduce number of benign biopsy results: initial experience. Radiology. 2006; 240:666–673.
12. Lee SM. Comparisons of predictive modeling techniques for breast cancer in Korean women. J Korean Soc Med Inform. 2008; 14:37–44.
13. Smigal C, Jemal A, Ward E, Cokkinides V, Smith R, Howe HL, Thun M. Trends in breast cancer by race and ethnicity: update 2006. CA Cancer J Clin. 2006; 56:168–183.
14. Centers for Disease Control and Prevention. United States Cancer Statistics: 1999-2011 Cancer Incidence and Mortality Data. accessed on 08 December 2014. Available at www.cdc.gov/uscs.
15. Jung KW, Park S, Kong HJ, Won YJ, Lee JY, Park EC, Lee JS. Cancer statistics in Korea: incidence, mortality, survival, and prevalence in 2008. Cancer Res Treat. 2011; 43:1–11.
16. Park B, Ma SH, Shin A, Chang MC, Choi JY, Kim S, Han W, Noh DY, Ahn SH, Kang D, et al. Korean risk assessment model for breast cancer risk prediction. PLoS One. 2013; 8:e76736.
17. McPherson K, Steel CM, Dixon JM. ABC of breast diseases. Breast cancer-epidemiology, risk factors, and genetics. BMJ. 2000; 321:624–628.
18. Suzuki S, Kojima M, Tokudome S, Mori M, Sakauchi F, Fujino Y, Wakai K, Lin Y, Kikuchi S, Tamakoshi K, et al. Japan Collaborative Cohort Study Group. Effect of physical activity on breast cancer risk: findings of the Japan collaborative cohort study. Cancer Epidemiol Biomarkers Prev. 2008; 17:3396–3401.
19. Won YJ, Sung J, Jung KW, Kong HJ, Park S, Shin HR, Park EC, Ahn YO, Hwang IK, Lee DH, et al. Nationwide cancer incidence in Korea, 2003-2005. Cancer Res Treat. 2009; 41:122–131.
20. Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996; 49:1373–1379.
21. Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995; 20:273–297.
22. Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics. 2000; 16:906–914.
23. Rodriguez-Moguel L, Bega-Ramos B. Risk of breast cancer of low differentiation in tumors with estrogen-negative receptors. Ginecol Obstet Mex. 1999; 67:503–507.
24. Polat K, Güneş S. Breast cancer diagnosis using least square support vector machine. Digit Signal Process. 2007; 17:694–701.
25. Hecht-Nielsen R. Theory of the backpropagation neural network. In : Proceedings of the International Joint Conference on Neural Networks; Washington, D.C.: IEEE Press;1989. p. 593–605.
26. Pearl J. Probabilistic reasoning in intelligent systems: networks of plausible inference. San Mateo, CA: Morgan Kaufmann Publishers Inc.;1988.
27. Shapiro SS, Wilk MB. An analysis of variance test for normality (complete samples). Biometrika. 1965; 52:591–611.
28. Clemons M, Goss P. Estrogen and the risk of breast cancer. N Engl J Med. 2001; 344:276–285.
29. Park B. Development of sporadic and hereditary breast cancer risk assessment model in Korean women. Seoul: Seoul National University;2012. Dissertation.
30. Rokach L. Pattern classification using ensemble methods. Danvers, MA: World Scientific Pub. Co.;2010. Series in Machine Perception and Artificial Intelligence; vol 75.
Full Text Links
  • JKMS
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr