KoreaMed, a service of the Korean Association of Medical Journal Editors (KAMJE), provides access to articles published in Korean medical, dental, nursing, nutrition and veterinary journals. KoreaMed records include links to full-text content in Synapse and publisher web sites.
Background The COVID-19 pandemic has disrupted healthcare systems worldwide, with overwhelmed facilities leading to high morbidity and mortality rates. Deep learning models that predict patient severity can aid in optimizing resource allocation and patient monitoring.
However, conventional models rely on excessive clinical features, reduce generalizability, and fail to provide real-time severity tracking. This study proposes a sequence-to-sequence (Seq2Seq) deep-learning model for predicting COVID-19 severity using minimal clinical features.
Methods Data from 4,462 patients from two tertiary care hospitals in Daegu, Korea (2020– 2022) were used to train the model, with 442 external validation cases collected from the National Institute of Health in Korea. Seq2SeqAttn inputs the observation of 17 clinical features of at most five days and outputs the predicted severity level of up to three days.
Results The model achieved a 98% recall and 97.6% receiver operating characteristic curve for validation. Seq2SeqAttn correctly identified severe cases, with lactate dehydrogenase (LDH) and neutrophil-lymphocyte ratios significantly differing between the severity groups.
Integrated gradients revealed that peripheral oxygen saturation and LDH levels were critical predictors. The model outperformed conventional severity assessment tools, such as the WHO Clinical Progression Scale and National Early Warning Score.
Conclusion This study presented a real-time COVID-19 severity prediction model using minimal clinical features. The high accuracy and interpretability of the model demonstrates its potential to improve resource allocation and patient care during pandemics. Future studies should investigate its applicability to other respiratory and infectious diseases.
Fig. 1
Structure of the main model. x represents collection of data, consisting of 17 clinical features. t denotes current day, and a maximum input sequence length of Li=5 days can be fed into the model. Model predicts the requirement for severe oxygen treatment for t, t+1,…, t+Lo(=4)−1 days, denoted as o.
Abbreviation: LSTM, long-short term memory.
Fig. 2
Clinical variables in external validation data. Six clinical features that show significant differences between the Non-Severe group and Severe groups, as defined by our prediction values, are described. Blue points represent feature values included in the Non-Severe group, while red points represent those in the Severe group. Numerical values located under the feature names are P-values calculated using the t-test.
Abbreviations: BPM, beat per minutes; LDH, lactate dehydrogenase; NLR, neutrophil to lymphocyte ratio; CRP, C-reactive protein; WBC Count, white blood cell count.
Fig. 3
Average feature importance across data. For each data, when the model predicts that a patient will require severe oxygen treatment, the contribution of each clinical feature to that prediction is calculated using explainable AI. As a feature’s importance value increases, it indicates that the feature is more strongly related to severity.
Abbreviations: SBP, systolic blood pressure; DBP, diastolic blood pressure; SpO2, peripheral oxygen saturation; LDH, lactate dehydrogenase; NLR, neutrophil to lymphocyte ratio; CRP, C-reactive protein; WBC count, white blood cell count.
Fig. 4
Predicted Values of the Severity Prediction Model, National Early Warning Score, and WHO Clinical Progression Scale for deteriorating patient. From admission to discharge, our model’s average prediction values, National Early Warning Score, and WHO Clinical Progression Scale are presented for each hospitalized day. The colors at each point indicate the oxygen treatment levels the patient actually received.
Fig. 5
Clinical feature values and feature importance values about patient in Fig. 5. The clinical feature values of the patient are displayed in red if they are higher than the average clinical feature values of patients in the non-severe group and in blue if they are lower. Red contribution values indicate that the corresponding clinical feature value at that time contributed to the severity prediction model classifying the patient as severe, while blue contribution values indicate the opposite.
Abbreviations: LDH, lactate dehydrogenase; NLR, neutrophil to lymphocyte ratio; WBC count, white blood cell count; CRP, C-reactive protein; BPM, beat per minutes; SpO2, peripheral oxygen saturation.
5. Lee EE, Hwang W, Song KH, Jung J, Kang CK, Kim JH, et al. 2021; Predication of oxygen requirement in COVID-19 patients using dynamic change of inflammatory markers: CRP, hypertension, age, neutrophil and lymphocyte (CHANeL). Sci Rep. 11:13026. https://doi.org/10.1038/s41598-021-92418-2. DOI: 10.1038/s41598-021-92418-2. PMID: 34158545. PMCID: PMC8219792.
6. Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, et al. 2014; Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv. https://doi.org/10.48550/arXiv.1406.1078. DOI: 10.3115/v1/D14-1179.
12. Toori KU, Qureshi MA, Chaudhry A, Safdar MF. 2021; Neutrophil to lymphocyte ratio (NLR) in COVID-19: a cheap prognostic marker in a resource constraint setting. Pak J Med Sci. 37:1435–9. https://doi.org/10.12669/pjms.37.5.4194. DOI: 10.12669/pjms.37.5.4194.
13. Rubio-Rivas M, Mora-Luján JM, Formiga F, Arévalo-Cañas C, Lebrón Ramos JM, Villalba García MV, et al. SEMI-COVID-19 Network. 2022; WHO ordinal scale and inflammation risk categories in COVID-19. Comparative study of the severity scales. J Gen Intern Med. 37:1980–7. https://doi.org/10.1007/s11606-022-07511-7. DOI: 10.1007/s11606-022-07511-7. PMID: 35396659. PMCID: PMC8992782.
17. Yeun YR, Kwak YS, Kim HY. 2023; Association between serum creatinine levels and pulmonary function of Korean adults: the 2016-2019 Korea National Health and Nutrition Examination Survey. Phys Act Nutr. 27:60–5. https://doi.org/10.20463/pan.2023.0008. DOI: 10.20463/pan.2023.0008. PMID: 37132212. PMCID: PMC10164507.