Purpose This study aimed to determine whether ChatGPT-4o, a generative artificial intelligence (AI) platform, was able to pass a simulated written European Board of Interventional Radiology (EBIR) exam and whether GPT-4o can be used to train medical students and interventional radiologists of different levels of expertise by generating exam items on interventional radiology.
Methods GPT-4o was asked to answer 370 simulated exam items of the Cardiovascular and Interventional Radiology Society of Europe (CIRSE) for EBIR preparation (CIRSE Prep). Subsequently, GPT-4o was requested to generate exam items on interventional radiology topics at levels of difficulty suitable for medical students and the EBIR exam. Those generated items were answered by 4 participants, including a medical student, a resident, a consultant, and an EBIR holder. The correctly answered items were counted. One investigator checked the answers and items generated by GPT-4o for correctness and relevance. This work was done from April to July 2024.
Results GPT-4o correctly answered 248 of the 370 CIRSE Prep items (67.0%). For 50 CIRSE Prep items, the medical student answered 46.0%, the resident 42.0%, the consultant 50.0%, and the EBIR holder 74.0% correctly. All participants answered 82.0% to 92.0% of the 50 GPT-4o generated items at the student level correctly. For the 50 GPT-4o items at the EBIR level, the medical student answered 32.0%, the resident 44.0%, the consultant 48.0%, and the EBIR holder 66.0% correctly. All participants could pass the GPT-4o-generated items for the student level; while the EBIR holder could pass the GPT-4o-generated items for the EBIR level. Two items (0.3%) out of 150 generated by the GPT-4o were assessed as implausible.
Conclusion GPT-4o could pass the simulated written EBIR exam and create exam items of varying difficulty to train medical students and interventional radiologists.
Citations
Citations to this article as recorded by
Evaluating the performance of ChatGPT in patient consultation and image-based preliminary diagnosis in thyroid eye disease Yue Wang, Shuo Yang, Chengcheng Zeng, Yingwei Xie, Ya Shen, Jian Li, Xiao Huang, Ruili Wei, Yuqing Chen Frontiers in Medicine.2025;[Epub] CrossRef
Solving Complex Pediatric Surgical Case Studies: A Comparative Analysis of Copilot, ChatGPT-4, and Experienced Pediatric Surgeons' Performance Richard Gnatzy, Martin Lacher, Michael Berger, Michael Boettcher, Oliver J. Deffaa, Joachim Kübler, Omid Madadi-Sanjani, Illya Martynov, Steffi Mayer, Mikko P. Pakarinen, Richard Wagner, Tomas Wester, Augusto Zani, Ophelia Aubert European Journal of Pediatric Surgery.2025;[Epub] CrossRef
Preliminary assessment of large language models’ performance in answering questions on developmental dysplasia of the hip Shiwei Li, Jun Jiang, Xiaodong Yang Journal of Children's Orthopaedics.2025;[Epub] CrossRef
AI and Interventional Radiology: A Narrative Review of Reviews on Opportunities, Challenges, and Future Directions Andrea Lastrucci, Nicola Iosca, Yannick Wandael, Angelo Barra, Graziano Lepri, Nevio Forini, Renzo Ricci, Vittorio Miele, Daniele Giansanti Diagnostics.2025; 15(7): 893. CrossRef
From GPT-3.5 to GPT-4.o: A Leap in AI’s Medical Exam Performance Markus Kipp Information.2024; 15(9): 543. CrossRef
Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study Yikai Chen, Xiujie Huang, Fangjie Yang, Haiming Lin, Haoyu Lin, Zhuoqun Zheng, Qifeng Liang, Jinhai Zhang, Xinxin Li BMC Medical Education.2024;[Epub] CrossRef
Purpose Frontline healthcare professionals are well positioned to improve the systems in which they work. Educational curricula, however, have not always equipped healthcare professionals with the skills or knowledge to implement and evaluate improvements. It is important to have a robust and standardized framework in order to evaluate the impact of such education in terms of improvement, both within and across European countries. The results of such evaluations will enhance the further development and delivery of healthcare improvement science (HIS) education. We aimed to describe the development and piloting of a framework for prospectively evaluating the impact of HIS education and learning.
Methods The evaluation framework was designed collaboratively and piloted in 7 European countries following a qualitative methodology. The present study used mixed methods to gather data from students and educators. The framework took the Kirkpatrick model of evaluation as a theoretical reference.
Results The framework was found to be feasible and acceptable for use across differing European higher education contexts according to the pilot study and the participants’ consensus. It can be used effectively to evaluate and develop HIS education across European higher education institutions.
Conclusion We offer a new evaluation framework to capture the impact of HIS education. The implementation of this tool has the potential to facilitate the continuous development of HIS education.
Citations
Citations to this article as recorded by
Evaluation of cost-effectiveness of single-credit traffic safety course based on Kirkpatrick model: a case study of Iran Mina Golestani, Homayoun Sadeghi-bazargani, Sepideh Harzand-Jadidi, Hamid Soori BMC Medical Education.2024;[Epub] CrossRef
Yemen Advanced Field Epidemiology Training Program: An Impact Evaluation, 2021 Maeen Abduljalil, Abdulhakeem Al Kohlani, Aisha Jumaan, Abdulwahed Al Serouri Epidemiologia.2023; 4(3): 235. CrossRef
How, and under what contexts, do academic–practice partnerships collaborate to implement healthcare improvement education into preregistration nursing curriculums: a realist review protocol Lorraine Armstrong, Chris Moir, Peta Taylor BMJ Open.2023; 13(10): e077784. CrossRef
Developing the American College of Surgeons Quality Improvement Framework to Evaluate Local Surgical Improvement Efforts Clifford Y. Ko, Tejen Shah, Heidi Nelson, Avery B. Nathens JAMA Surgery.2022; 157(8): 737. CrossRef
Kirkpatrick Model: Its Limitations as Used in Higher Education Evaluation Michael CAHAPAY International Journal of Assessment Tools in Education.2021; 8(1): 135. CrossRef
Transforming the Future Healthcare Workforce across Europe through Improvement Science Training: A Qualitative Approach Maria Cristina Sierras-Davo, Manuel Lillo-Crespo, Patricia Verdu, Aimilia Karapostoli International Journal of Environmental Research and Public Health.2021; 18(3): 1298. CrossRef
Qualitative evaluation of an educational intervention about healthcare improvement for nursing students María Cristina Sierras-Davó, Manuel Lillo-Crespo, Patricia Verdú Rodríguez Aquichan.2021; 21(1): 1. CrossRef
Evaluation of Advanced Field Epidemiology Training Programs in the Eastern Mediterranean Region: A Multi-Country Study Mohannad Al Nsour, Yousef Khader, Haitham Bashier, Majd Alsoukhni Frontiers in Public Health.2021;[Epub] CrossRef
The United Kingdom Field Epidemiology Training Programme: meeting programme objectives Paola Dey, Jeremy Brown, John Sandars, Yvonne Young, Ruth Ruggles, Samantha Bracebridge Eurosurveillance.2019;[Epub] CrossRef
Mapping the Status of Healthcare Improvement Science through a Narrative Review in Six European Countries Manuel Lillo-Crespo, Maria Cristina Sierras-Davó, Alan Taylor, Katrina Ritters, Aimilia Karapostoli International Journal of Environmental Research and Public Health.2019; 16(22): 4480. CrossRef