Healthc Inform Res.  2020 Oct;26(4):284-294. 10.4258/hir.2020.26.4.284.

Mortality Prediction from Hospital-Acquired Infections in Trauma Patients Using an Unbalanced Dataset

Affiliations
  • 1School of Management & Information Sciences, Shiraz University of Medical Sciences, Shiraz, Iran
  • 2Trauma Research Center, Shahid Rajaee (Emtiaz) Trauma Hospital, Shiraz University of Medical Sciences, Shiraz, Iran
  • 3Department of Computer Science, Laurentian University, Sudbury, Canada

Abstract


Objectives
Machine learning has been widely used to predict diseases, and it is used to derive impressive knowledge in the healthcare domain. Our objective was to predict in-hospital mortality from hospital-acquired infections in trauma patients on an unbalanced dataset.
Methods
Our study was a cross-sectional analysis on trauma patients with hospital-acquired infections who were admitted to Shiraz Trauma Hospital from March 20, 2017, to March 21, 2018. The study data was obtained from the surveillance hospital infection database. The data included sex, age, mechanism of injury, body region injured, severity score, type of intervention, infection day after admission, and microorganism causes of infections. We developed our mortality prediction model by random under-sampling, random over-sampling, clustering (k-mean)-C5.0, SMOTE-C5.0, ADASYN-C5.5, SMOTE-SVM, ADASYN-SVM, SMOTE-ANN, and ADASYN-ANN among hospital-acquired infections in trauma patients. All mortality predictions were conducted by IBM SPSS Modeler 18.
Results
We studied 549 individuals with hospital-acquired infections in a trauma hospital in Shiraz during 2017 and 2018. Prediction accuracy before balancing of the dataset was 86.16%. In contrast, the prediction accuracy for the balanced dataset achieved by random under-sampling, random over-sampling, clustering (k-mean)-C5.0, SMOTE-C5.0, ADASYN-C5.5, and SMOTE-SVM was 70.69%, 94.74%, 93.02%, 93.66%, 90.93%, and 100%, respectively.
Conclusions
Our findings demonstrate that cleaning an unbalanced dataset increases the accuracy of the classification model. Also, predicting mortality by a clustered under-sampling approach was more precise in comparison to random under-sampling and random over-sampling methods.

Keyword

Machine Learning, Mortality, Injuries, Healthcare Associated Infections, Data Mining, Decision Tree, C5.0

Reference

References

1. Lee DG, Ryu KS, Bashir M, Bae JW, Ryu KH. Discovering medical knowledge using association rule mining in young adults with acute myocardial infarction. J Med Syst. 2013; 37(2):9896.
Article
2. Guo H, Li Y, Shang J, Gu M, Huang Y, Gong B. Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl. 2017; 73:220–39.
3. Li Y, Guo H, Liu X, Li Y, Li J. Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data. Knowl Based Syst. 2016; 94:88–104.
4. Krawczyk B. Learning from imbalanced data: open challenges and future directions. Prog Artif Intell. 2016; 5(4):221–32.
Article
5. Wallace WC, Cinat M, Gornick WB, Lekawa ME, Wilson SE. Nosocomial infections in the surgical intensive care unit: a difference between trauma and surgical patients. Am Surg. 1999; 65(10):987–90.
6. Burke JP. Infection control: a problem for patient safety. N Engl J Med. 2003; 348(7):651–6.
7. Anderson RN. Deaths: leading causes for 1999. Hyattsville (MD): National Center for Health Statistics;2001.
8. Czaja AS, Rivara FP, Wang J, Koepsell T, Nathens AB, Jurkovich GJ, et al. Late outcomes of trauma patients with infections during index hospitalization. J Trauma. 2009; 67(4):805–14.
Article
9. Glance LG, Stone PW, Mukamel DB, Dick AW. Increases in mortality, length of stay, and cost associated with hospital-acquired infections in trauma patients. Arch Surg. 2011; 146(7):794–801.
Article
10. Sheng WH, Wang JT, Lin MS, Chang SC. Risk factors affecting in-hospital mortality in patients with nosocomial infections. J Formos Med Assoc. 2007; 106(2):110–8.
Article
11. Yadollahi M, Ghaedsharaf Z, Jamali K, Niakan MH, Pazhuheian F, Karajizadeh M. The accuracy of GAP and MGAP scoring systems in predicting mortality in trauma: a diagnostic accuracy study. Adv J Emerg Med. 2020; 4(3):e73.
12. Spelmen VS, Porkodi R. A review on handling imbalanced data. In : Proceedings of 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT); 2018 Mar 1–3; Coimbatore, India. p. 1–11.
Article
13. Saarela M, Ryynanen OP, Ayramo S. Predicting hospital associated disability from imbalanced data using supervised learning. Artif Intell Med. 2019; 95:88–95.
Article
14. Klikowski J, Wozniak M. Multi sampling random subspace ensemble for imbalanced data stream classification. Burduk R, Kurzynski M, Wozniak M, editors. Progress in computer recognition systems. Cham, Switzerland: Springer;2019. p. 360–9.
Article
15. Roumani YF, May JH, Strum DP, Vargas LG. Classifying highly imbalanced ICU data. Health Care Manag Sci. 2013; 16(2):119–28.
Article
16. Paoin W. Lessons learned from data mining of WHO mortality database. Methods Inf Med. 2011; 50(4):380–5.
Article
17. Wirth R, Hipp J. CRISP-DM: towards a standard process model for data mining. In : Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining; 2000 Apr 11–13; Manchester, UK. p. 29–39.
18. Bolandparvaz S, Yadollahi M, Abbasi HR, Anvar M. Injury patterns among various age and gender groups of trauma patients in southern Iran: a cross-sectional study. Medicine (Baltimore). 2017; 96(41):e7812.
19. Alonso SG, de la Torre-Diez I, Hamrioui S, Lopez-Coronado M, Barreno DC, Nozaleda LM, et al. Data mining algorithms and techniques in mental health: a systematic review. J Med Syst. 2018; 42(9):161.
Article
20. Lin CL, Fan CL. Evaluation of CART, CHAID, and QUEST algorithms: a case study of construction defects in Taiwan. J Asian Archit Build Eng. 2019; 18(6):539–53.
Article
21. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002; 16:321–57.
Article
22. Arisholm E, Briand LC, Johannessen EB. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J Syst Softw. 2010; 83(1):2–17.
Article
23. Yen SJ, Lee YS. Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl. 2009; 36(3):5718–27.
Article
24. Rahman MM, Davis D. Cluster based under-sampling for unbalanced cardiovascular data. In : Proceedings of the World Congress on Engineering (WCE); 2013 Jul 3–5; London, UK.
25. Onan A. Consensus clustering-based undersampling approach to imbalanced learning. Sci Program. 2019; 2019:5901087.
Article
26. Tyagi AK, Reddy VK. Performance analysis of under-sampling and over-sampling techniques for solving class imbalance problem. In : Proceedings of International Conference on Sustainable Computing in Science, Technology and Management (SUSCOM); 2019 Feb 26–28; Jaipur, India.
Article
Full Text Links
  • HIR
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr