J Educ Eval Health Prof.  2024;21(1):21. 10.3352/jeehp.2024.21.21.

GPT-4o’s competency in answering the simulated written European Board of Interventional Radiology exam compared to a medical student and experts in Germany and its ability to generate exam items on interventional radiology: a descriptive study

Affiliations
  • 1Department of Diagnostic and Interventional Radiology, University of Leipzig, Leipzig, Germany

Abstract

Purpose
This study aimed to determine whether ChatGPT-4o, a generative artificial intelligence (AI) platform, was able to pass a simulated written European Board of Interventional Radiology (EBIR) exam and whether GPT-4o can be used to train medical students and interventional radiologists of different levels of expertise by generating exam items on interventional radiology.
Methods
GPT-4o was asked to answer 370 simulated exam items of the Cardiovascular and Interventional Radiology Society of Europe (CIRSE) for EBIR preparation (CIRSE Prep). Subsequently, GPT-4o was requested to generate exam items on interventional radiology topics at levels of difficulty suitable for medical students and the EBIR exam. Those generated items were answered by 4 participants, including a medical student, a resident, a consultant, and an EBIR holder. The correctly answered items were counted. One investigator checked the answers and items generated by GPT-4o for correctness and relevance. This work was done from April to July 2024.
Results
GPT-4o correctly answered 248 of the 370 CIRSE Prep items (67.0%). For 50 CIRSE Prep items, the medical student answered 46.0%, the resident 42.0%, the consultant 50.0%, and the EBIR holder 74.0% correctly. All participants answered 82.0% to 92.0% of the 50 GPT-4o generated items at the student level correctly. For the 50 GPT-4o items at the EBIR level, the medical student answered 32.0%, the resident 44.0%, the consultant 48.0%, and the EBIR holder 66.0% correctly. All participants could pass the GPT-4o-generated items for the student level; while the EBIR holder could pass the GPT-4o-generated items for the EBIR level. Two items (0.3%) out of 150 generated by the GPT-4o were assessed as implausible.
Conclusion
GPT-4o could pass the simulated written EBIR exam and create exam items of varying difficulty to train medical students and interventional radiologists.

Keyword

Artificial intelligence; Consultants; Europe; Interventional radiology; Medical students
Full Text Links
  • JEEHP
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2025 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr