BACKGROUND: The Acute Physiology and Chronic Health Evaluation (APACHE) IV model has not yet been validated in Korea. The aim of this study was to compare the ability of the APACHE IV with those of APACHE II, Simplified Acute Physiology Score (SAPS) 3, and Korean SAPS 3 in predicting hospital mortality in a surgical intensive care unit (SICU) population. METHODS: We retrospectively reviewed electronic medical records for patients admitted to the SICU from March 2011 to February 2012 in a university hospital. Measurements of discrimination and calibration were performed using the area under the receiver operating characteristic curve (AUC) and the Hosmer-Lemeshow test, respectively. We calculated the standardized mortality ratio (SMR, actual mortality predicted mortality) for the four models. RESULTS: The study included 1,314 patients. The hospital mortality rate was 3.3%. The discriminative powers of all models were similar and very reliable. The AUCs were 0.80 for APACHE IV, 0.85 for APACHE II, 0.86 for SAPS 3, and 0.86 for Korean SAPS 3. Hosmer and Lemeshow C and H statistics showed poor calibration for all of the models (P < 0.05). The SMRs of APACHE IV, APACHE II, SAPS 3, and Korean SAPS 3 were 0.21, 0.11 0.23, 0.34, and 0.25, respectively. CONCLUSIONS: The APACHE IV revealed good discrimination but poor calibration. The overall discrimination and calibration of APACHE IV were similar to those of APACHE II, SAPS 3, and Korean SAPS 3 in this study. A high level of customization is required to improve calibration in this study setting.