Korean J healthc assoc Infect Control Prev.  2024 Dec;29(2):146-154. 10.14192/kjicp.2024.29.2.146.

Seq2Seq Deep Learning Architecture Based COVID-19 Infected Patient Severity Prediction Using Electronic Health Records

Affiliations
  • 1School of Computer Science and Engineering, College of IT Engineering, Kyungpook National University, Daegu, Korea
  • 2Department of Internal Medicine, School of Medicine, Kyungpook National University, Daegu, Korea

Abstract

Background
The COVID-19 pandemic has disrupted healthcare systems worldwide, with overwhelmed facilities leading to high morbidity and mortality rates. Deep learning models that predict patient severity can aid in optimizing resource allocation and patient monitoring. However, conventional models rely on excessive clinical features, reduce generalizability, and fail to provide real-time severity tracking. This study proposes a sequence-to-sequence (Seq2Seq) deep-learning model for predicting COVID-19 severity using minimal clinical features.
Methods
Data from 4,462 patients from two tertiary care hospitals in Daegu, Korea (2020– 2022) were used to train the model, with 442 external validation cases collected from the National Institute of Health in Korea. Seq2SeqAttn inputs the observation of 17 clinical features of at most five days and outputs the predicted severity level of up to three days.
Results
The model achieved a 98% recall and 97.6% receiver operating characteristic curve for validation. Seq2SeqAttn correctly identified severe cases, with lactate dehydrogenase (LDH) and neutrophil-lymphocyte ratios significantly differing between the severity groups. Integrated gradients revealed that peripheral oxygen saturation and LDH levels were critical predictors. The model outperformed conventional severity assessment tools, such as the WHO Clinical Progression Scale and National Early Warning Score.
Conclusion
This study presented a real-time COVID-19 severity prediction model using minimal clinical features. The high accuracy and interpretability of the model demonstrates its potential to improve resource allocation and patient care during pandemics. Future studies should investigate its applicability to other respiratory and infectious diseases.

Keyword

COVID-19; Clinical prediction rule; Disease severity; Deep learning

Figure

  • Fig. 1 Structure of the main model. x represents collection of data, consisting of 17 clinical features. t denotes current day, and a maximum input sequence length of Li=5 days can be fed into the model. Model predicts the requirement for severe oxygen treatment for t, t+1,…, t+Lo(=4)−1 days, denoted as o. Abbreviation: LSTM, long-short term memory.

  • Fig. 2 Clinical variables in external validation data. Six clinical features that show significant differences between the Non-Severe group and Severe groups, as defined by our prediction values, are described. Blue points represent feature values included in the Non-Severe group, while red points represent those in the Severe group. Numerical values located under the feature names are P-values calculated using the t-test. Abbreviations: BPM, beat per minutes; LDH, lactate dehydrogenase; NLR, neutrophil to lymphocyte ratio; CRP, C-reactive protein; WBC Count, white blood cell count.

  • Fig. 3 Average feature importance across data. For each data, when the model predicts that a patient will require severe oxygen treatment, the contribution of each clinical feature to that prediction is calculated using explainable AI. As a feature’s importance value increases, it indicates that the feature is more strongly related to severity. Abbreviations: SBP, systolic blood pressure; DBP, diastolic blood pressure; SpO2, peripheral oxygen saturation; LDH, lactate dehydrogenase; NLR, neutrophil to lymphocyte ratio; CRP, C-reactive protein; WBC count, white blood cell count.

  • Fig. 4 Predicted Values of the Severity Prediction Model, National Early Warning Score, and WHO Clinical Progression Scale for deteriorating patient. From admission to discharge, our model’s average prediction values, National Early Warning Score, and WHO Clinical Progression Scale are presented for each hospitalized day. The colors at each point indicate the oxygen treatment levels the patient actually received.

  • Fig. 5 Clinical feature values and feature importance values about patient in Fig. 5. The clinical feature values of the patient are displayed in red if they are higher than the average clinical feature values of patients in the non-severe group and in blue if they are lower. Red contribution values indicate that the corresponding clinical feature value at that time contributed to the severity prediction model classifying the patient as severe, while blue contribution values indicate the opposite. Abbreviations: LDH, lactate dehydrogenase; NLR, neutrophil to lymphocyte ratio; WBC count, white blood cell count; CRP, C-reactive protein; BPM, beat per minutes; SpO2, peripheral oxygen saturation.


Reference

1. World Health Organization (WHO). 2024. WHO COVID-19 dashboard. Available from: https://data.who.int/dashboards/covid19/cases. updated Dec 1; cited Dec 5.
2. Nizam NB, Siddiquee SM, Shirin M, Bhuiyan MIH, Hasan T. 2023; COVID-19 severity prediction from chest X-ray images using an anatomy-aware deep learning model. J Digit Imaging. 36:2100–12. https://doi.org/10.1007/s10278-023-00861-6. DOI: 10.1007/s10278-023-00861-6. PMID: 37369941. PMCID: PMC10502002.
3. Park H, Choi CM, Kim SH, Kim SH, Kim DK, Jeong JB. 2024; In-hospital real-time prediction of COVID-19 severity regardless of disease phase using electronic health records. PLoS One. 19:e0294362. https://doi.org/10.1371/journal.pone.0294362. DOI: 10.1371/journal.pone.0294362. PMID: 38271404. PMCID: PMC10810421.
4. Schwab P, Mehrjou A, Parbhoo S, Celi LA, Hetzel J, Hofer M, et al. 2021; Real-time prediction of COVID-19 related mortality using electronic health records. Nat Commun. 12:1058. https://doi.org/10.1038/s41467-020-20816-7. DOI: 10.1038/s41467-020-20816-7. PMID: 33594046. PMCID: PMC7886884.
5. Lee EE, Hwang W, Song KH, Jung J, Kang CK, Kim JH, et al. 2021; Predication of oxygen requirement in COVID-19 patients using dynamic change of inflammatory markers: CRP, hypertension, age, neutrophil and lymphocyte (CHANeL). Sci Rep. 11:13026. https://doi.org/10.1038/s41598-021-92418-2. DOI: 10.1038/s41598-021-92418-2. PMID: 34158545. PMCID: PMC8219792.
6. Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, et al. 2014; Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv. https://doi.org/10.48550/arXiv.1406.1078. DOI: 10.3115/v1/D14-1179.
7. Hochreiter S, Schmidhuber J. 1997; Long short-term memory. Neural Comput. 9:1735–80. https://doi.org/10.1162/neco.1997.9.8.1735. DOI: 10.1162/neco.1997.9.8.1735. PMID: 9377276.
8. Bahdanau D, Cho K, Bengio Y. 2016; Neural machine translation by jointly learning to align and translate. arXiv. https://doi.org/10.48550/arXiv.1409.0473.
9. Hendrycks D, Gimpel K. 2023; Gaussian error linear units (GELUs). arXiv. https://doi.org/10.48550/arXiv.1606.08415.
10. Sundararajan M, Taly A, Yan Q. 2017; Axiomatic attribution for deep networks. arXiv. https://doi.org/10.48550/arXiv.1703.01365.
11. Fialek B, Pruc M, Smereka J, Jas R, Rahnama-Hezavah M, Denegri A, et al. 2022; Diagnostic value of lactate dehydrogenase in COVID-19: a systematic review and meta-analysis. Cardiol J. 29:751–8. https://doi.org/10.5603/cj.a2022.0056. DOI: 10.5603/CJ.a2022.0056. PMID: 35762075. PMCID: PMC9550334.
12. Toori KU, Qureshi MA, Chaudhry A, Safdar MF. 2021; Neutrophil to lymphocyte ratio (NLR) in COVID-19: a cheap prognostic marker in a resource constraint setting. Pak J Med Sci. 37:1435–9. https://doi.org/10.12669/pjms.37.5.4194. DOI: 10.12669/pjms.37.5.4194.
13. Rubio-Rivas M, Mora-Luján JM, Formiga F, Arévalo-Cañas C, Lebrón Ramos JM, Villalba García MV, et al. SEMI-COVID-19 Network. 2022; WHO ordinal scale and inflammation risk categories in COVID-19. Comparative study of the severity scales. J Gen Intern Med. 37:1980–7. https://doi.org/10.1007/s11606-022-07511-7. DOI: 10.1007/s11606-022-07511-7. PMID: 35396659. PMCID: PMC8992782.
14. Alam N, Vegting IL, Houben E, van Berkel B, Vaughan L, Kramer MH, et al. 2015; Exploring the performance of the National Early Warning Score (NEWS) in a European emergency department. Resuscitation. 90:111–5. https://doi.org/10.1016/j.resuscitation.2015.02.011. DOI: 10.1016/j.resuscitation.2015.02.011. PMID: 25748878.
15. Buonacera A, Stancanelli B, Colaci M, Malatino L. 2022; Neutrophil to lymphocyte ratio: an emerging marker of the relationships between the immune system and diseases. Int J Mol Sci. 23:3636. https://doi.org/10.3390/ijms23073636. DOI: 10.3390/ijms23073636. PMID: 35408994. PMCID: PMC8998851.
16. Zhu B, Feng X, Jiang C, Mi S, Yang L, Zhao Z, et al. 2021; Correlation between white blood cell count at admission and mortality in COVID-19 patients: a retrospective study. BMC Infect Dis. 21:574. https://doi.org/10.1186/s12879-021-06277-3. DOI: 10.1186/s12879-021-06277-3. PMID: 34126954. PMCID: PMC8202964.
17. Yeun YR, Kwak YS, Kim HY. 2023; Association between serum creatinine levels and pulmonary function of Korean adults: the 2016-2019 Korea National Health and Nutrition Examination Survey. Phys Act Nutr. 27:60–5. https://doi.org/10.20463/pan.2023.0008. DOI: 10.20463/pan.2023.0008. PMID: 37132212. PMCID: PMC10164507.
Full Text Links
  • KJHAICP
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2025 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr