Healthc Inform Res.  2023 Jul;29(3):228-238. 10.4258/hir.2023.29.3.228.

Understanding Arteriosclerotic Heart Disease Patients Using Electronic Health Records: A Machine Learning and Shapley Additive exPlanations Approach

Affiliations
  • 1Department of Information Systems, School of Information Systems, Bina Nusantara University, Jakarta, Indonesia
  • 2Department of Cardiology and Vascular Medicine, Faculty of Medicine, Universitas Indonesia/National Cardiovascular Center Harapan Kita, Jakarta, Indonesia
  • 3Riphah Institute of Computing and Applied Sciences, Riphah International University, Raiwind, Lahore, Pakistan
  • 4Mathematics Department, School of Computer Science, Bina Nusantara University, Jakarta, Indonesia

Abstract


Objectives
The number of deaths from cardiovascular disease is projected to reach 23.3 million by 2030. As a contribution to preventing this phenomenon, this paper proposed a machine learning (ML) model to predict patients with arteriosclerotic heart disease (AHD). We also interpreted the prediction model results based on the ML approach and deployed modelagnostic ML methods to identify informative features and their interpretations.
Methods
We used a hematology Electronic Health Record (EHR) with information on erythrocytes, hematocrit, hemoglobin, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, leukocytes, thrombocytes, age, and sex. To detect and predict AHD, we explored random forest (RF), XGBoost, and AdaBoost models. We examined the prediction model results based on the confusion matrix and accuracy measures. We used the Shapley Additive exPlanations (SHAP) framework to interpret the ML model and quantify the contribution of features to predictions.
Results
Our study included data from 6,837 patients, with 4,702 records from patients diagnosed with AHD and 2,135 records from patients without an AHD diagnosis. AdaBoost outperformed RF and XGBoost, achieving an accuracy of 0.78, precision of 0.82, F1-score of 0.85, and recall of 0.88. According to the SHAP summary bar plot method, hemoglobin was the most important attribute for detecting and predicting AHD patients. The SHAP local interpretability bar plot revealed that hemoglobin and mean corpuscular hemoglobin concentration had positive impacts on AHD prediction based on a single observation.
Conclusions
ML models based on real clinical data can be used to predict AHD.

Keyword

Machine Learning, Coronary Artery Disease, Hematology, Machine Learning, Supervised Machine Learning

Figure

  • Figure 1 Steps of the proposed research study. EHR: Electronic Health Record, AHD: atherosclerotic heart disease, ROCAUC: ROC-AUC: receiver operating characteristic-area under the curve, SHAP: Shapley Additive exPlanations.

  • Figure 2 Confusion matrix for the training data (top) and testing data (bottom).

  • Figure 3 Receiver operating characteristic (ROC) curve and area under the curve (AUC) for (A) random forest, (B) XGBoost, and (C) AdaBoost.

  • Figure 4 Global interpretability: feature importance for the AdaBoost algorithm to detect and predict atherosclerotic heart disease (as visualized by summary_plot method with plot type bar in the Python library). MCH: mean corpuscular hemoglobin, MCHC: mean corpuscular hemoglobin concentration, SHAP: Shapley Additive exPlanations.

  • Figure 5 Local interpretability for the AdaBoost algorithm to detect and predict atherosclerotic heart disease (as visualized by the plot method with plot type bar plot in the Python library). In data preprocessing, we converted female to 0 and male to 1. MCH: mean corpuscular hemoglobin, MCHC: mean corpuscular hemoglobin concentration SHAP: Shapley Additive exPlanations.


Reference

References

1. Vinciguerra M, Romiti S, Fattouch K, De Bellis A, Greco E. Atherosclerosis as pathogenetic substrate for Sars-Cov2 cytokine storm. J Clin Med. 2020; 9(7):2095. https://doi.org/10.3390/jcm9072095.
Article
2. World Health Organization. Cardiovascular disease [Internet]. Geneva, Switzerland: World Health Organization;c2023. [cited at 202. Jul 27]. Available from https://www.who.int/health-topics/cardiovasculardiseases#tab=tab_1.
3. Ministry of Health Republic of Indonesia. Non-communicable disease management guide. [Internet]. Jakarta, Indonesia: Ministry of Health Republic of Indonesia;2019. [cited at 2023 Jul 27]. Available from http://p2ptm.kemkes.go.id/uploads/VHcrbkVobjRzUDN3UCs4eU-J0dVBndz09/2019/03/Buku_Pedoman_Manajemen_PTM.pdf.
4. Capotosto L, Massoni F, De Sio S, Ricci S, Vitarelli A. Early diagnosis of cardiovascular diseases in workers: role of standard and advanced echocardiography. Biomed Res Int. 2018; 2018:7354691. https://doi.org/10.1155/2018/7354691.
Article
5. Muhammad Y, Tahir M, Hayat M, Chong KT. Early and accurate detection and diagnosis of heart disease using intelligent computational model. Sci Rep. 2020; 10(1):19747. https://doi.org/10.1038/s41598-020-76635-9.
Article
6. Guo CY, Wu MY, Cheng HM. The comprehensive machine learning analytics for heart failure. Int J Environ Res Public Health. 2021; 18(9):4943. https://doi.org/10.3390/ijerph18094943.
Article
7. Karthick K, Aruna SK, Samikannu R, Kuppusamy R, Teekaraman Y, Thelkar AR. Implementation of a heart disease risk prediction model using machine learning. Comput Math Methods Med. 2022; 2022:6517716. https://doi.org/10.1155/2022/6517716.
Article
8. Chen Z, Yang M, Wen Y, Jiang S, Liu W, Huang H. Prediction of atherosclerosis using machine learning based on operations research. Math Biosci Eng. 2022; 19(5):4892–910. https://doi.org/10.3934/mbe.2022229.
Article
9. Park S, Hong M, Lee H, Cho NJ, Lee EY, Lee WY, et al. New model for predicting the presence of coronary artery calcification. J Clin Med. 2021; 10(3):457. https://doi.org/10.3390/jcm10030457.
Article
10. Fan J, Chen M, Luo J, Yang S, Shi J, Yao Q, et al. The prediction of asymptomatic carotid atherosclerosis with electronic health records: a comparative study of six machine learning models. BMC Med Inform Decis Mak. 2021; 21(1):115. https://doi.org/10.1186/s12911-021-01480-3.
Article
11. Ward A, Sarraju A, Chung S, Li J, Harrington R, Heidenreich P, et al. Machine learning and atherosclerotic cardiovascular disease risk prediction in a multiethnic population. NPJ Digit Med. 2020; 3:125. https://doi.org/10.1038/s41746-020-00331-1.
Article
12. Terrada O, Cherradi B, Raihani A, Bouattane O. A novel medical diagnosis support system for predicting patients with atherosclerosis diseases. Inf Med Unlocked. 2020; 21:100483. https://doi.org/10.1016/j.imu.2020.100483.
Article
13. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In : Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016 Apr 13–17; San Francisco, CA. p. 785–94. https://doi.org/10.1145/2939672.2939785.
Article
14. Frizzell JD, Liang L, Schulte PJ, Yancy CW, Heidenreich PA, Hernandez AF, et al. Prediction of 30-day all-cause readmissions in patients hospitalized for heart failure: comparison of machine learning and other statistical approaches. JAMA Cardiol. 2017; 2(2):204–9. https://doi.org/10.1001/jamacardio.2016.3956.
Article
15. Budholiya K, Shrivastava SK, Sharma V. An optimized XGBoost based diagnostic system for effective prediction of heart disease. J King Saud Univ Comput Inf Sci. 2022; 34(7):4514–23. https://doi.org/10.1016/j.jksuci.2020.10.013.
Article
16. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017; 30:4765–74.
17. Cho E, Chang TW, Hwang G. Data preprocessing combination to improve the performance of quality classification in the manufacturing process. Electronics. 2022; 11(3):477. https://doi.org/10.3390/electronics11030477.
Article
18. Johnson A, Cooper GF, Visweswaran S. A novel personalized random forest algorithm for clinical outcome prediction. Stud Health Technol Inform. 2022; 290:248–52. https://doi.org/10.3233/SHTI220072.
Article
19. Absar N, Das EK, Shoma SN, Khandaker MU, Miraz MH, Faruque MR, et al. The efficacy of machine-learning-supported smart system for heart disease prediction. Healthcare (Basel). 2022; 10(6):1137. https://doi.org/10.3390/healthcare10061137.
Article
20. Almustafa KM. Prediction of heart disease and classifiers’ sensitivity analysis. BMC Bioinformatics. 2020; 21(1):278. https://doi.org10.1186/s12859-020-03626-y.
Article
21. Su X, Xu Y, Tan Z, Wang X, Yang P, Su Y, et al. Prediction for cardiovascular diseases based on laboratory data: an analysis of random forest model. J Clin Lab Anal. 2020; 34(9):e23421. https://doi.org/10.1002/jcla.23421.
Article
22. Cao J, Zhang L, Ma L, Zhou X, Yang B, Wang W. Study on the risk of coronary heart disease in middle-aged and young people based on machine learning methods: a retrospective cohort study. PeerJ. 2022; 10:e14078. https://doi.org/10.7717/peerj.14078.
Article
23. Mahesh TR, Dhilip Kumar V, Vinoth Kumar V, Asghar J, Geman O, Arulkumaran G, et al. AdaBoost ensemble methods using k-fold cross validation for survivability with the early detection of heart disease. Comput Intell Neurosci. 2022; 2022:9005278. https://doi.org/10.1155/2022/9005278.
Article
24. Alelyani S. Detection and evaluation of machine learning bias. Appl Sci. 2021; 11(14):6271. https://doi.org/10.3390/app11146271.
Article
25. He S, Qu L, He X, Zhang D, Xie N. Comparative evaluation of 15-minute rapid diagnosis of ischemic heart disease by high-sensitivity quantification of cardiac biomarkers. Exp Ther Med. 2020; 20(2):1702–8. https://doi.org/10.3892/etm.2020.8853.
Article
26. Lu S, Chen R, Wei W, Belovsky M, Lu X. Understanding heart failure patients EHR clinical features via SHAP interpretation of tree-based machine learning model predictions. AMIA Annu Symp Proc. 2022; 2021:813–22.
27. Futagami K, Fukazawa Y, Kapoor N, Kito T. Pairwise acquisition prediction with SHAP value interpretation. J Financ Data Sci. 2021; 7:22–44. https://doi.org/10.1016/j.jfds.2021.02.001.
Article
28. Wang K, Tian J, Zheng C, Yang H, Ren J, Liu Y, et al. Interpretable prediction of 3-year all-cause mortality in patients with heart failure caused by coronary heart disease based on machine learning and SHAP. Comput Biol Med. 2021; 137:104813. https://doi.org/10.1016/j.compbiomed.2021.104813.
Article
29. Lee G, Choi S, Kim K, Yun JM, Son JS, Jeong SM, et al. Association of hemoglobin concentration and its change with cardiovascular and all-cause mortality. J Am Heart Assoc. 2018; 7(3):e007723. https://doi.org/10.1161/JAHA.117.007723.
Article
30. Goel H, Hirsch JR, Deswal A, Hassan SA. Anemia in cardiovascular disease: marker of disease severity or disease-modifying therapeutic target? Curr Atheroscler Rep. 2021; 23(10):61. https://doi.org/10.1007/s11883-021-00960-1.
Article
Full Text Links
  • HIR
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr