Korean J Intern Med.  2021 Jul;36(4):845-856. 10.3904/kjim.2020.020.

Application of deep learning to predict advanced neoplasia using big clinical data in colorectal cancer screening of asymptomatic adults

  • 1Division of Gastroenterology, Department of Internal Medicine and Gastrointestinal Cancer Center, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul, Korea
  • 2Department of Bioinformatics, Soongsil University, Seoul, Korea
  • 3Functional Genome Institute, PDXen Biosystems Inc., Seoul, Korea


We aimed to develop a deep learning model for the prediction of the risk of advanced colorectal neoplasia (ACRN) in asymptomatic adults, based on which colorectal cancer screening could be customized.
We collected data on 26 clinical and laboratory parameters, including age, sex, smoking status, body mass index, complete blood count, blood chemistry, and tumor marker, from 70,336 first-time colonoscopy screening recipients. For reference, we used a logistic regression (LR) model with nine variables manually selected from the 26 variables. A deep neural network (DNN) model was developed using all 26 variables. The area under the receiver operating characteristic curve (AUC), sensitivity, and specificity of the models were compared in a randomly split validation group.
In comparison with the LR model (AUC, 0.724; 95% confidence interval [CI], 0.684 to 0.765), the DNN model (AUC, 0.760; 95% CI, 0.724 to 0.795) demonstrated significantly improved performance with respect to the prediction of ACRN (p < 0.001). At a sensitivity of 90%, the specificity significantly increased with the application of the DNN model (41.0%) in comparison with the LR model (26.5%) (p < 0.001), indicating that the colonoscopy workload required to detect the same number of ACRNs could be reduced by 20%.
The application of DNN to big clinical data could significantly improve the prediction of ACRNs in comparison with the LR model, potentially realizing further customization by utilizing large quantities and various types of biomedical information.


Colorectal neoplasms; Deep learning; Big data; Risk assessment; Mass screening
Full Text Links
  • KJIM
export Copy
  • Twitter
  • Facebook
Similar articles
    DB Error: unknown error