Korean J Radiol.  2015 Apr;16(2):286-296. 10.3348/kjr.2015.16.2.286.

Propensity Score Matching: A Conceptual Review for Radiology Researchers

  • 1Department of Clinical Epidemiology and Biostatistics, Asan Medical Center, Seoul 138-736, Korea. hello.hello.hj@gmail.com
  • 2Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul 138-736, Korea.
  • 3Department of Radiology, NYU Langone Medical Center, New York, NY 10016, USA.
  • 4Office of Clinical Research Information, Asan Medical Center, Seoul 138-736, Korea.
  • 5Department of Preventive Medicine, University of Ulsan College of Medicine, Seoul 138-736, Korea.


The propensity score is defined as the probability of each individual study subject being assigned to a group of interest for comparison purposes. Propensity score adjustment is a method of ensuring an even distribution of confounders between groups, thereby increasing between group comparability. Propensity score analysis is therefore an increasingly applied statistical method in observational studies. The purpose of this article was to provide a step-by-step nonmathematical conceptual guide to propensity score analysis with particular emphasis on propensity score matching. A software program code used for propensity score matching was also presented.


Propensity score; Matching; Observational study; Indication bias

MeSH Terms

Middle Aged
*Propensity Score
Research Design
Research Personnel


  • Fig. 1 Distribution of propensity scores. A. Distribution of propensity scores among total study subjects (940 and 470 patients who had liver CT and liver MRI, respectively). B. Distribution of propensity scores after matching for age, gender, body mass index, lesion diameter, and history of cancer (293 pairs of liver CT and liver MRI).

  • Fig. 2 Q-Q plots of each covariate from 2 groups before and after propensity score matching.

  • Fig. 3 Plot of standardized differences in means before and after propensity score matching.


1. Psaty BM, Siscovick DS. Minimizing bias due to confounding by indication in comparative effectiveness research: the importance of restriction. JAMA. 2010; 304:897–898.
2. Primrose JN, Perera R, Gray A, Rose P, Fuller A, Corkhill A, et al. Effect of 3 to 5 years of scheduled CEA and CT follow-up to detect recurrence of colorectal cancer: the FACS randomized clinical trial. JAMA. 2014; 311:263–270.
3. Kim K, Kim YH, Kim SY, Kim S, Lee YJ, Kim KP, et al. Low-dose abdominal CT for evaluating suspected appendicitis. N Engl J Med. 2012; 366:1596–1605.
4. Trinchet JC, Chaffaut C, Bourcier V, Degos F, Henrion J, Fontaine H, et al. Ultrasonographic surveillance of hepatocellular carcinoma in cirrhosis: a randomized trial comparing 3- and 6-month periodicities. Hepatology. 2011; 54:1987–1997.
5. Fischer B, Lassen U, Mortensen J, Larsen S, Loft A, Bertelsen A, et al. Preoperative staging of lung cancer with combined PET-CT. N Engl J Med. 2009; 361:32–39.
6. Righini M, Le Gal G, Aujesky D, Roy PM, Sanchez O, Verschuren F, et al. Diagnosis of pulmonary embolism by multidetector CT alone or combined with venous ultrasonography of the leg: a randomised non-inferiority trial. Lancet. 2008; 371:1343–1352.
7. Rosenberger WF, Lachin JM. Randomization and the clinical trial. In : Rosenberger WF, Lachin JM, editors. Randomization in clinical trials: theory and practice. 1st ed. New York: Wiley-Interscience;2002. p. 1–14.
8. Cha DI, Lee MW, Rhim H, Choi D, Kim YS, Lim HK. Therapeutic efficacy and safety of percutaneous ethanol injection with or without combined radiofrequency ablation for hepatocellular carcinomas in high risk locations. Korean J Radiol. 2013; 14:240–247.
9. Chung SY, Park SH, Lee SS, Lee JH, Kim AY, Park SK, et al. Comparison between CT colonography and double-contrast barium enema for colonic evaluation in patients with renal insufficiency. Korean J Radiol. 2012; 13:290–299.
10. Kim DH, Pickhardt PJ, Taylor AJ, Leung WK, Winter TC, Hinshaw JL, et al. CT colonography versus colonoscopy for the detection of advanced neoplasia. N Engl J Med. 2007; 357:1403–1412.
11. Kim JW, Shin SS, Kim JK, Choi SK, Heo SH, Lim HS, et al. Radiofrequency ablation combined with transcatheter arterial chemoembolization for the treatment of single hepatocellular carcinoma of 2 to 5 cm in diameter: comparison with surgical resection. Korean J Radiol. 2013; 14:626–635.
12. Lee SH, Chung CH, Jung SH, Lee JW, Shin JH, Ko KY, et al. Midterm outcomes of open surgical repair compared with thoracic endovascular repair for isolated descending thoracic aortic disease. Korean J Radiol. 2012; 13:476–482.
13. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983; 70:41–55.
14. Choi GH, Shim JH, Kim MJ, Ryu MH, Ryoo BY, Kang YK, et al. Sorafenib alone versus sorafenib combined with transarterial chemoembolization for advanced-stage hepatocellular carcinoma: results of propensity score analyses. Radiology. 2013; 269:603–611.
15. McDonald JS, McDonald RJ, Fan J, Kallmes DF, Lanzino G, Cloft HJ. Comparative effectiveness of ruptured cerebral aneurysm therapies: propensity score analysis of clipping versus coiling. AJNR Am J Neuroradiol. 2014; 35:164–169.
16. McDonald JS, Kallmes DF, Lanzino G, Cloft HJ. Percutaneous closure devices do not reduce the risk of major access site complications in patients undergoing elective carotid stent placement. J Vasc Interv Radiol. 2013; 24:1057–1062.
17. McDonald RJ, McDonald JS, Bida JP, Carter RE, Fleming CJ, Misra S, et al. Intravenous contrast material-induced nephropathy: causal or coincident phenomenon? Radiology. 2013; 267:106–118.
18. Davenport MS, Khalatbari S, Cohan RH, Dillman JR, Myles JD, Ellis JH. Contrast material-induced nephrotoxicity and intravenous low-osmolality iodinated contrast material: risk stratification by using estimated glomerular filtration rate. Radiology. 2013; 268:719–728.
19. Davenport MS, Khalatbari S, Dillman JR, Cohan RH, Caoili EM, Ellis JH. Contrast material-induced nephrotoxicity and intravenous low-osmolality iodinated contrast material. Radiology. 2013; 267:94–105.
20. Takuma Y, Takabatake H, Morimoto Y, Toshikuni N, Kayahara T, Makino Y, et al. Comparison of combined transcatheter arterial chemoembolization and radiofrequency ablation with surgical resection by using propensity score matching in patients with hepatocellular carcinoma within Milan criteria. Radiology. 2013; 269:927–937.
21. de Haan MC, Boellaard TN, Bossuyt PM, Stoker J. Colon distension, perceived burden and side-effects of CT-colonography for screening using hyoscine butylbromide or glucagon hydrochloride as bowel relaxant. Eur J Radiol. 2012; 81:e910–e916.
22. McDonald RJ, McDonald JS, Kallmes DF, Carter RE. Behind the numbers: propensity score analysis-a primer for the diagnostic radiologist. Radiology. 2013; 269:640–645.
23. Lee J, Cho JY, Lee HJ, Jeong YY, Kim CK, Park BK, et al. Contrast-induced nephropathy in patients undergoing intravenous contrast-enhanced computed tomography in Korea: a multi-institutional study in 101487 patients. Korean J Radiol. 2014; 15:456–463.
24. Altman DG. The scandal of poor medical research. BMJ. 1994; 308:283–284.
25. Salas M, Hofman A, Stricker BH. Confounding by indication: an example of variation in the use of epidemiologic terminology. Am J Epidemiol. 1999; 149:981–983.
26. Sica GT. Bias in research studies. Radiology. 2006; 238:780–789.
27. Gunderman RB. Biases in radiologic reasoning. AJR Am J Roentgenol. 2009; 192:561–564.
28. Ladapo JA, Blecker S, Elashoff MR, Federspiel JJ, Vieira DL, Sharma G, et al. Clinical implications of referral bias in the diagnostic performance of exercise testing for coronary artery disease. J Am Heart Assoc. 2013; 2:e000505.
29. Rubin DB, Thomas N. Matching using estimated propensity scores: relating theory to practice. Biometrics. 1996; 52:249–264.
30. Brookhart MA, Schneeweiss S, Rothman KJ, Glynn RJ, Avorn J, Stürmer T. Variable selection for propensity score models. Am J Epidemiol. 2006; 163:1149–1156.
31. Yang HJ, Lee JH, Lee DH, Yu SJ, Kim YJ, Yoon JH, et al. Small single-nodule hepatocellular carcinoma: comparison of transarterial chemoembolization, radiofrequency ablation, and hepatic resection by using inverse probability weighting. Radiology. 2014; 271:909–918.
32. Halpern EF. Behind the numbers: inverse probability weighting. Radiology. 2014; 271:625–628.
33. Kurth T, Walker AM, Glynn RJ, Chan KA, Gaziano JM, Berger K, et al. Results of multivariable logistic regression, propensity matching, propensity adjustment, and propensity-based weighting under conditions of nonuniform effect. Am J Epidemiol. 2006; 163:262–270.
34. McAfee AT, Ming EE, Seeger JD, Quinn SG, Ng EW, Danielson JD, et al. The comparative safety of rosuvastatin: a retrospective matched cohort study in over 48,000 initiators of statin therapy. Pharmacoepidemiol Drug Saf. 2006; 15:444–453.
35. Austin PC. A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003. Stat Med. 2008; 27:2037–2049.
36. Austin PC. Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies. Pharm Stat. 2011; 10:150–161.
37. Austin PC. Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: a systematic review and suggestions for improvement. J Thorac Cardiovasc Surg. 2007; 134:1128–1135.
38. Gu XS, Rosenbaum PR. Comparison of multivariate matching methods: structures, distances, and algorithms. J Comput Graph Stat. 1993; 2:405–420.
39. Ho DE, Imai K, King G, Stuart EA. Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis. 2007; 15:199–236.
40. Hill J. Discussion of research using propensity-score matching: comments on 'A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003' by Peter Austin, Statistics in Medicine. Stat Med. 2008; 27:2055–2061. discussion 2066-2069.
41. Rubin DB. Using multivariate matched sampling and regression adjustment to control bias in observational studies. J Am Stat Assoc. 1979; 74:318–328.
42. Austin PC, Grootendorst P, Anderson GM. A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: a Monte Carlo study. Stat Med. 2007; 26:734–753.
Full Text Links
  • KJR
export Copy
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr