Skip Navigation
Skip to contents

JEEHP : Journal of Educational Evaluation for Health Professions

OPEN ACCESS
SEARCH
Search

Search

Page Path
HOME > Search
2 "Europe"
Filter
Filter
Article category
Keywords
Publication year
Authors
Funded articles
Research articles
GPT-4o’s competency in answering the simulated written European Board of Interventional Radiology exam compared to a medical student and experts in Germany and its ability to generate exam items on interventional radiology: a descriptive study
Sebastian Ebel, Constantin Ehrengut, Timm Denecke, Holger Gößmann, Anne Bettina Beeskow
J Educ Eval Health Prof. 2024;21:21.   Published online August 20, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.21
  • 1,570 View
  • 303 Download
  • 5 Web of Science
  • 6 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to determine whether ChatGPT-4o, a generative artificial intelligence (AI) platform, was able to pass a simulated written European Board of Interventional Radiology (EBIR) exam and whether GPT-4o can be used to train medical students and interventional radiologists of different levels of expertise by generating exam items on interventional radiology.
Methods
GPT-4o was asked to answer 370 simulated exam items of the Cardiovascular and Interventional Radiology Society of Europe (CIRSE) for EBIR preparation (CIRSE Prep). Subsequently, GPT-4o was requested to generate exam items on interventional radiology topics at levels of difficulty suitable for medical students and the EBIR exam. Those generated items were answered by 4 participants, including a medical student, a resident, a consultant, and an EBIR holder. The correctly answered items were counted. One investigator checked the answers and items generated by GPT-4o for correctness and relevance. This work was done from April to July 2024.
Results
GPT-4o correctly answered 248 of the 370 CIRSE Prep items (67.0%). For 50 CIRSE Prep items, the medical student answered 46.0%, the resident 42.0%, the consultant 50.0%, and the EBIR holder 74.0% correctly. All participants answered 82.0% to 92.0% of the 50 GPT-4o generated items at the student level correctly. For the 50 GPT-4o items at the EBIR level, the medical student answered 32.0%, the resident 44.0%, the consultant 48.0%, and the EBIR holder 66.0% correctly. All participants could pass the GPT-4o-generated items for the student level; while the EBIR holder could pass the GPT-4o-generated items for the EBIR level. Two items (0.3%) out of 150 generated by the GPT-4o were assessed as implausible.
Conclusion
GPT-4o could pass the simulated written EBIR exam and create exam items of varying difficulty to train medical students and interventional radiologists.

Citations

Citations to this article as recorded by  
  • Evaluating the performance of ChatGPT in patient consultation and image-based preliminary diagnosis in thyroid eye disease
    Yue Wang, Shuo Yang, Chengcheng Zeng, Yingwei Xie, Ya Shen, Jian Li, Xiao Huang, Ruili Wei, Yuqing Chen
    Frontiers in Medicine.2025;[Epub]     CrossRef
  • Solving Complex Pediatric Surgical Case Studies: A Comparative Analysis of Copilot, ChatGPT-4, and Experienced Pediatric Surgeons' Performance
    Richard Gnatzy, Martin Lacher, Michael Berger, Michael Boettcher, Oliver J. Deffaa, Joachim Kübler, Omid Madadi-Sanjani, Illya Martynov, Steffi Mayer, Mikko P. Pakarinen, Richard Wagner, Tomas Wester, Augusto Zani, Ophelia Aubert
    European Journal of Pediatric Surgery.2025;[Epub]     CrossRef
  • Preliminary assessment of large language models’ performance in answering questions on developmental dysplasia of the hip
    Shiwei Li, Jun Jiang, Xiaodong Yang
    Journal of Children's Orthopaedics.2025;[Epub]     CrossRef
  • AI and Interventional Radiology: A Narrative Review of Reviews on Opportunities, Challenges, and Future Directions
    Andrea Lastrucci, Nicola Iosca, Yannick Wandael, Angelo Barra, Graziano Lepri, Nevio Forini, Renzo Ricci, Vittorio Miele, Daniele Giansanti
    Diagnostics.2025; 15(7): 893.     CrossRef
  • From GPT-3.5 to GPT-4.o: A Leap in AI’s Medical Exam Performance
    Markus Kipp
    Information.2024; 15(9): 543.     CrossRef
  • Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study
    Yikai Chen, Xiujie Huang, Fangjie Yang, Haiming Lin, Haoyu Lin, Zhuoqun Zheng, Qifeng Liang, Jinhai Zhang, Xinxin Li
    BMC Medical Education.2024;[Epub]     CrossRef
Developing a framework for evaluating the impact of Healthcare Improvement Science Education across Europe: a qualitative study  
Manuel Lillo-Crespo, M. Cristina Sierras-Davó, Rhoda MacRae, Kevin Rooney
J Educ Eval Health Prof. 2017;14:28.   Published online November 29, 2017
DOI: https://doi.org/10.3352/jeehp.2017.14.28
  • 35,348 View
  • 433 Download
  • 10 Web of Science
  • 10 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
Frontline healthcare professionals are well positioned to improve the systems in which they work. Educational curricula, however, have not always equipped healthcare professionals with the skills or knowledge to implement and evaluate improvements. It is important to have a robust and standardized framework in order to evaluate the impact of such education in terms of improvement, both within and across European countries. The results of such evaluations will enhance the further development and delivery of healthcare improvement science (HIS) education. We aimed to describe the development and piloting of a framework for prospectively evaluating the impact of HIS education and learning.
Methods
The evaluation framework was designed collaboratively and piloted in 7 European countries following a qualitative methodology. The present study used mixed methods to gather data from students and educators. The framework took the Kirkpatrick model of evaluation as a theoretical reference.
Results
The framework was found to be feasible and acceptable for use across differing European higher education contexts according to the pilot study and the participants’ consensus. It can be used effectively to evaluate and develop HIS education across European higher education institutions.
Conclusion
We offer a new evaluation framework to capture the impact of HIS education. The implementation of this tool has the potential to facilitate the continuous development of HIS education.

Citations

Citations to this article as recorded by  
  • Evaluation of cost-effectiveness of single-credit traffic safety course based on Kirkpatrick model: a case study of Iran
    Mina Golestani, Homayoun Sadeghi-bazargani, Sepideh Harzand-Jadidi, Hamid Soori
    BMC Medical Education.2024;[Epub]     CrossRef
  • Yemen Advanced Field Epidemiology Training Program: An Impact Evaluation, 2021
    Maeen Abduljalil, Abdulhakeem Al Kohlani, Aisha Jumaan, Abdulwahed Al Serouri
    Epidemiologia.2023; 4(3): 235.     CrossRef
  • How, and under what contexts, do academic–practice partnerships collaborate to implement healthcare improvement education into preregistration nursing curriculums: a realist review protocol
    Lorraine Armstrong, Chris Moir, Peta Taylor
    BMJ Open.2023; 13(10): e077784.     CrossRef
  • Developing the American College of Surgeons Quality Improvement Framework to Evaluate Local Surgical Improvement Efforts
    Clifford Y. Ko, Tejen Shah, Heidi Nelson, Avery B. Nathens
    JAMA Surgery.2022; 157(8): 737.     CrossRef
  • Kirkpatrick Model: Its Limitations as Used in Higher Education Evaluation
    Michael CAHAPAY
    International Journal of Assessment Tools in Education.2021; 8(1): 135.     CrossRef
  • Transforming the Future Healthcare Workforce across Europe through Improvement Science Training: A Qualitative Approach
    Maria Cristina Sierras-Davo, Manuel Lillo-Crespo, Patricia Verdu, Aimilia Karapostoli
    International Journal of Environmental Research and Public Health.2021; 18(3): 1298.     CrossRef
  • Qualitative evaluation of an educational intervention about healthcare improvement for nursing students
    María Cristina Sierras-Davó, Manuel Lillo-Crespo, Patricia Verdú Rodríguez
    Aquichan.2021; 21(1): 1.     CrossRef
  • Evaluation of Advanced Field Epidemiology Training Programs in the Eastern Mediterranean Region: A Multi-Country Study
    Mohannad Al Nsour, Yousef Khader, Haitham Bashier, Majd Alsoukhni
    Frontiers in Public Health.2021;[Epub]     CrossRef
  • The United Kingdom Field Epidemiology Training Programme: meeting programme objectives
    Paola Dey, Jeremy Brown, John Sandars, Yvonne Young, Ruth Ruggles, Samantha Bracebridge
    Eurosurveillance.2019;[Epub]     CrossRef
  • Mapping the Status of Healthcare Improvement Science through a Narrative Review in Six European Countries
    Manuel Lillo-Crespo, Maria Cristina Sierras-Davó, Alan Taylor, Katrina Ritters, Aimilia Karapostoli
    International Journal of Environmental Research and Public Health.2019; 16(22): 4480.     CrossRef

JEEHP : Journal of Educational Evaluation for Health Professions
TOP