Tuberc Respir Dis.  2023 Jul;86(3):203-215. 10.4046/trd.2022.0048.

Prediction of Postoperative Lung Function in Lung Cancer Patients Using Machine Learning Models

Affiliations
  • 1Division of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
  • 2Department of Applied Statistics, Yonsei University, Seoul, Republic of Korea
  • 3Division of Allergy, Department of Internal Medicine, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
  • 4Cancer Research Institute, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
  • 5Postech-Catholic Biomedical Engineering Institute, Songeui Multiplex Hall, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea

Abstract

Background
Surgical resection is the standard treatment for early-stage lung cancer. Since postoperative lung function is related to mortality, predicted postoperative lung function is used to determine the treatment modality. The aim of this study was to evaluate the predictive performance of linear regression and machine learning models.
Methods
We extracted data from the Clinical Data Warehouse and developed three sets: set I, the linear regression model; set II, machine learning models omitting the missing data: and set III, machine learning models imputing the missing data. Six machine learning models, the least absolute shrinkage and selection operator (LASSO), Ridge regression, ElasticNet, Random Forest, eXtreme gradient boosting (XGBoost), and the light gradient boosting machine (LightGBM) were implemented. The forced expiratory volume in 1 second measured 6 months after surgery was defined as the outcome. Five-fold cross-validation was performed for hyperparameter tuning of the machine learning models. The dataset was split into training and test datasets at a 70:30 ratio. Implementation was done after dataset splitting in set III. Predictive performance was evaluated by R2 and mean squared error (MSE) in the three sets.
Results
A total of 1,487 patients were included in sets I and III and 896 patients were included in set II. In set I, the R2 value was 0.27 and in set II, LightGBM was the best model with the highest R2 value of 0.5 and the lowest MSE of 154.95. In set III, LightGBM was the best model with the highest R2 value of 0.56 and the lowest MSE of 174.07.
Conclusion
The LightGBM model showed the best performance in predicting postoperative lung function.

Keyword

Lung Cancer; Chronic Obstructive Pulmonary Disease; Postoperative Lung Function; Linear Regression; Machine Learning
Full Text Links
  • TRD
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2025 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr