Korean J Radiol.  2016 Jun;17(3):339-350. 10.3348/kjr.2016.17.3.339.

How to Develop, Validate, and Compare Clinical Prediction Models Involving Radiological Parameters: Study Design and Statistical Methods

Affiliations
  • 1Department of Radiology and Research Institute of Radiological Science, Severance Hospital, Yonsei University College of Medicine, Seoul 03722, Korea.
  • 2Department of Biostatistics and Medical Informatics, Yonsei University College of Medicine, Seoul 03722, Korea. biostat@yuhs.ac

Abstract

Clinical prediction models are developed to calculate estimates of the probability of the presence/occurrence or future course of a particular prognostic or diagnostic outcome from multiple clinical or non-clinical parameters. Radiologic imaging techniques are being developed for accurate detection and early diagnosis of disease, which will eventually affect patient outcomes. Hence, results obtained by radiological means, especially diagnostic imaging, are frequently incorporated into a clinical prediction model as important predictive parameters, and the performance of the prediction model may improve in both diagnostic and prognostic settings. This article explains in a conceptual manner the overall process of developing and validating a clinical prediction model involving radiological parameters in relation to the study design and statistical methods. Collection of a raw dataset; selection of an appropriate statistical model; predictor selection; evaluation of model performance using a calibration plot, Hosmer-Lemeshow test and c-index; internal and external validation; comparison of different models using c-index, net reclassification improvement, and integrated discrimination improvement; and a method to create an easy-to-use prediction score system will be addressed. This article may serve as a practical methodological reference for clinical researchers.

Keyword

Prediction model; Prognosis; Diagnosis; Patient outcome

MeSH Terms

Area Under Curve
Coronary Artery Disease/*diagnosis/diagnostic imaging/mortality
Humans
Logistic Models
*Models, Statistical
Prognosis
Proportional Hazards Models
ROC Curve
Research Design
Survival Rate

Figure

  • Fig. 1 Calibration plot.

  • Fig. 2 ROC curves for two prediction models. ROC = receiver operating characteristic


Cited by  2 articles

Radiomics and Deep Learning: Hepatic Applications
Hyo Jung Park, Bumwoo Park, Seung Soo Lee
Korean J Radiol. 2020;21(4):387-401.    doi: 10.3348/kjr.2019.0752.

Development and internal validation of a nomogram for predicting outcomes in children with traumatic subdural hematoma
Anukoon Kaewborisutsakul, Thara Tunthanathip
Acute Crit Care. 2022;37(3):429-437.    doi: 10.4266/acc.2021.01795.


Reference

1. Steyerberg EW. a practical approach to development, validation, and updating. New York: Springer;2009.
2. D'Agostino RB Sr, Vasan RS, Pencina MJ, Wolf PA, Cobain M, Massaro JM, et al. General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation. 2008; 117:743–753.
3. Yang HI, Yuen MF, Chan HL, Han KH, Chen PJ, Kim DY, et al. Risk estimation for hepatocellular carcinoma in chronic hepatitis B (REACH-B): development and validation of a predictive score. Lancet Oncol. 2011; 12:568–574.
4. Kwak JY, Jung I, Baek JH, Baek SM, Choi N, Choi YJ, et al. Image reporting and characterization system for ultrasound features of thyroid nodules: multicentric Korean retrospective study. Korean J Radiol. 2013; 14:110–117.
5. Kim SY, Lee HJ, Kim YJ, Hur J, Hong YJ, Yoo KJ, et al. Coronary computed tomography angiography for selecting coronary artery bypass graft surgery candidates. Ann Thorac Surg. 2013; 95:1340–1346.
6. Yoon YE, Lim TH. Current roles and future applications of cardiac CT: risk stratification of coronary artery disease. Korean J Radiol. 2014; 15:4–11.
7. Shaw LJ, Giambrone AE, Blaha MJ, Knapper JT, Berman DS, Bellam N, et al. Long-term prognosis after coronary artery calcification testing in asymptomatic patients: a cohort study. Ann Intern Med. 2015; 163:14–21.
8. Lee K, Hur J, Hong SR, Suh YJ, Im DJ, Kim YJ, et al. Predictors of recurrent stroke in patients with ischemic stroke: comparison study between transesophageal echocardiography and cardiac CT. Radiology. 2015; 276:381–389.
9. Suh YJ, Hong YJ, Lee HJ, Hur J, Kim YJ, Lee HS, et al. Prognostic value of SYNTAX score based on coronary computed tomography angiography. Int J Cardiol. 2015; 199:460–466.
10. Sunshine JH, Applegate KE. Technology assessment for radiologists. Radiology. 2004; 230:309–314.
11. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med. 2015; 162:55–63.
12. Bossuyt PM, Leeflang MM. Chapter 6: Developing Criteria for Including Studies. Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy Version 0.4 [updated September 2008]. Oxford: The Cochrane Collaboration;2008.
13. Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996; 49:1373–1379.
14. Peduzzi P, Concato J, Feinstein AR, Holford TR. Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. J Clin Epidemiol. 1995; 48:1503–1510.
15. Vittinghoff E, McCulloch CE. Relaxing the rule of ten events per variable in logistic and Cox regression. Am J Epidemiol. 2007; 165:710–718.
16. Steyerberg EW, Harrell FE Jr, Borsboom GJ, Eijkemans MJ, Vergouwe Y, Habbema JD. Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol. 2001; 54:774–781.
17. Royston P, Altman DG, Sauerbrei W. Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med. 2006; 25:127–141.
18. Altman DG, Bland JM. Absence of evidence is not evidence of absence. BMJ. 1995; 311:485.
19. Austin PC, Tu JV. Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. J Clin Epidemiol. 2004; 57:1138–1146.
20. Austin PC, Tu JV. Bootstrap methods for developing predictive models. Am Stat. 2004; 58:131–137.
21. Austin PC. Bootstrap model selection had similar performance for selecting authentic and noise variables compared to backward variable elimination: a simulation study. J Clin Epidemiol. 2008; 61:1009–1017.e1.
22. Little RJA, Rubin DB. Statistical analysis with missing data. 2nd ed. New York: John Wiley & Sons;2014.
23. Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol. 2004; 159:882–890.
24. Nagelkerke NJ. A note on a general definition of the coefficient of determination. Biometrika. 1991; 78:691–692.
25. Tjur T. Coefficients of determination in logistic regression models—A new proposal: the coefficient of discrimination. Am Stat. 2009; 63:366–372.
26. Rufibach K. Use of Brier score to assess binary predictions. J Clin Epidemiol. 2010; 63:938–939. author reply 939
27. Hosmer DW Jr, Lemeshow S. Applied logistic regression. New York: John Wiley & Sons;2004.
28. D'Agostino R, Nam BH. Evaluation of the performance of survival analysis models: discrimination and calibration measures. In : Balakrishnan N, Rao CO, editors. Handbook of statistics: advances in survival analysis. Vol 23. Amsterdam: Elsevier;2004. p. 1–25.
29. Hosmer DW, Hosmer T, Le Cessie S, Lemeshow S. A comparison of goodness-of-fit tests for the logistic regression model. Stat Med. 1997; 16:965–980.
30. Park SH, Goo JM, Jo CH. Receiver operating characteristic (ROC) curve: practical review for radiologists. Korean J Radiol. 2004; 5:11–18.
31. Harrell FE. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. New York: Springer;2001.
32. Pencina MJ, D'Agostino RB Sr, Song L. Quantifying discrimination of Framingham risk functions with different survival C statistics. Stat Med. 2012; 31:1543–1553.
33. Van Calster B, Van Belle V, Vergouwe Y, Timmerman D, Van Huffel S, Steyerberg EW. Extending the c-statistic to nominal polytomous outcomes: the Polytomous Discrimination Index. Stat Med. 2012; 31:2610–2626.
34. Van Oirbeek R, Lesaffre E. An application of Harrell's C-index to PH frailty models. Stat Med. 2010; 29:3160–3171.
35. Wolbers M, Blanche P, Koller MT, Witteman JC, Gerds TA. Concordance for prognostic models with competing risks. Biostatistics. 2014; 15:526–539.
36. Vergouwe Y, Steyerberg EW, Eijkemans MJ, Habbema JD. Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. J Clin Epidemiol. 2005; 58:475–483.
37. Collins GS, Ogundimu EO, Altman DG. Sample size considerations for the external validation of a multivariable prognostic model: a resampling study. Stat Med. 2016; 35:214–226.
38. Collins GS, de Groot JA, Dutton S, Omar O, Shanyinde M, Tajar A, et al. External validation of multivariable prediction models: a systematic review of methodological conduct and reporting. BMC Med Res Methodol. 2014; 14:40.
39. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988; 44:837–845.
40. Cook NR. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation. 2007; 115:928–935.
41. Demler OV, Pencina MJ, D'Agostino RB Sr. Misuse of DeLong test to compare AUCs for nested models. Stat Med. 2012; 31:2577–2587.
42. Ware JH. The limitations of risk factors as prognostic tools. N Engl J Med. 2006; 355:2615–2617.
43. Pencina MJ, D'Agostino RB Sr, D'Agostino RB Jr, Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008; 27:157–172. discussion 207-212
44. Pepe MS. Problems with risk reclassification methods for evaluating prediction models. Am J Epidemiol. 2011; 173:1327–1335.
45. Leening MJ, Vedder MM, Witteman JC, Pencina MJ, Steyerberg EW. Net reclassification improvement: computation, interpretation, and controversies: a literature review and clinician's guide. Ann Intern Med. 2014; 160:122–131.
46. Pepe MS, Janes H. Commentary: reporting standards are needed for evaluations of risk reclassification. Int J Epidemiol. 2011; 40:1106–1108.
47. Widera C, Pencina MJ, Bobadilla M, Reimann I, Guba-Quint A, Marquardt I, et al. Incremental prognostic value of biomarkers beyond the GRACE (Global Registry of Acute Coronary Events) score and high-sensitivity cardiac troponin T in non-ST-elevation acute coronary syndrome. Clin Chem. 2013; 59:1497–1505.
48. Pencina MJ, D'Agostino RB Sr, Steyerberg EW. Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med. 2011; 30:11–21.
49. Pepe MS, Kerr KF, Longton G, Wang Z. Testing for improvement in prediction model performance. Stat Med. 2013; 32:1467–1482.
50. Pepe MS, Janes H, Li CI. Net risk reclassification p values: valid or misleading? J Natl Cancer Inst. 2014; 106:dju041.
51. Kerr KF, McClelland RL, Brown ER, Lumley T. Evaluating the incremental value of new biomarkers with integrated discrimination improvement. Am J Epidemiol. 2011; 174:364–374.
52. Pencina MJ, D'Agostino RB, Pencina KM, Janssens AC, Greenland P. Interpreting incremental value of markers added to risk prediction models. Am J Epidemiol. 2012; 176:473–481.
53. Sullivan LM, Massaro JM, D'Agostino RB Sr. Presentation of multivariate data for clinical use: The Framingham Study risk score functions. Stat Med. 2004; 23:1631–1660.
54. Imperiale TF, Monahan PO, Stump TE, Glowinski EA, Ransohoff DF. Derivation and Validation of a Scoring System to Stratify Risk for Advanced Colorectal Neoplasia in Asymptomatic Adults: A Cross-sectional Study. Ann Intern Med. 2015; 163:339–346.
55. Schnabel RB, Sullivan LM, Levy D, Pencina MJ, Massaro JM, D'Agostino RB Sr, et al. Development of a risk score for atrial fibrillation (Framingham Heart Study): a community-based cohort study. Lancet. 2009; 373:739–745.
56. Janssen KJ, Moons KG, Kalkman CJ, Grobbee DE, Vergouwe Y. Updating methods improved the performance of a clinical prediction model in new patients. J Clin Epidemiol. 2008; 61:76–86.
57. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig L, et al. STARD 2015: An Updated List of Essential Items for Reporting Diagnostic Accuracy Studies. Radiology. 2015; 277:826–832.
58. Moons KG, Altman DG, Reitsma JB, Ioannidis JP, Macaskill P, Steyerberg EW, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015; 162:W1–W73.
Full Text Links
  • KJR
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr