Skip Navigation
Skip to contents

JEEHP : Journal of Educational Evaluation for Health Professions



Page Path
HOME > Search
2 "Urology"
Article category
Publication year
Funded articles
Research article
Performance of GPT-3.5 and GPT-4 on standardized urology knowledge assessment items in the United States: a descriptive study
Max Samuel Yudovich, Elizaveta Makarova, Christian Michael Hague, Jay Dilip Raman
J Educ Eval Health Prof. 2024;21:17.   Published online July 8, 2024
DOI:    [Epub ahead of print]
  • 129 View
  • 34 Download
AbstractAbstract PDF
This study aimed to evaluate the performance of Chat Generative Pre-Trained Transformer (ChatGPT) with respect to standardized urology multiple-choice items in the United States.
In total, 700 multiple-choice urology board exam-style items were submitted to GPT-3.5 and GPT-4, and responses were recorded. Items were categorized based on topic and question complexity (recall, interpretation, and problem-solving). The accuracy of GPT-3.5 and GPT-4 was compared across item types in February 2024.
GPT-4 answered 44.4% of items correctly compared to 30.9% for GPT-3.5 (P>0.0001). GPT-4 (vs. GPT-3.5) had higher accuracy with urologic oncology (43.8% vs. 33.9%, P=0.03), sexual medicine (44.3% vs. 27.8%, P=0.046), and pediatric urology (47.1% vs. 27.1%, P=0.012) items. Endourology (38.0% vs. 25.7%, P=0.15), reconstruction and trauma (29.0% vs. 21.0%, P=0.41), and neurourology (49.0% vs. 33.3%, P=0.11) items did not show significant differences in performance across versions. GPT-4 also outperformed GPT-3.5 with respect to recall (45.9% vs. 27.4%, P<0.00001), interpretation (45.6% vs. 31.5%, P=0.0005), and problem-solving (41.8% vs. 34.5%, P=0.56) type items. This difference was not significant for the higher-complexity items.
ChatGPT performs relatively poorly on standardized multiple-choice urology board exam-style items, with GPT-4 outperforming GPT-3.5. The accuracy was below the proposed minimum passing standards for the American Board of Urology’s Continuing Urologic Certification knowledge reinforcement activity (60%). As artificial intelligence progresses in complexity, ChatGPT may become more capable and accurate with respect to board examination items. For now, its responses should be scrutinized.
Brief report
Training and implementation of handheld ultrasound technology at Georgetown Public Hospital Corporation in Guyana: a virtual learning cohort study  
Michelle Bui, Adrian Fernandez, Budheshwar Ramsukh, Onika Noel, Chris Prashad, David Bayne
J Educ Eval Health Prof. 2023;20:11.   Published online April 4, 2023
  • 2,522 View
  • 94 Download
  • 2 Web of Science
  • 2 Crossref
AbstractAbstract PDFSupplementary Material
A virtual point-of-care ultrasound (POCUS) education program was initiated to introduce handheld ultrasound technology to Georgetown Public Hospital Corporation in Guyana, a low-resource setting. We studied ultrasound competency and participant satisfaction in a cohort of 20 physicians-in-training through the urology clinic. The program consisted of a training phase, where they learned how to use the Butterfly iQ ultrasound, and a mentored implementation phase, where they applied their skills in the clinic. The assessment was through written exams and an objective structured clinical exam (OSCE). Fourteen students completed the program. The written exam scores were 3.36/5 in the training phase and 3.57/5 in the mentored implementation phase, and all students earned 100% on the OSCE. Students expressed satisfaction with the program. Our POCUS education program demonstrates the potential to teach clinical skills in low-resource settings and the value of virtual global health partnerships in advancing POCUS and minimally invasive diagnostics.


Citations to this article as recorded by  
  • A Clinician’s Guide to the Implementation of Point-of-Care Ultrasound (POCUS) in the Outpatient Practice
    Joshua Overgaard, Bright P. Thilagar, Mohammed Nadir Bhuiyan
    Journal of Primary Care & Community Health.2024;[Epub]     CrossRef
  • Efficacy of Handheld Ultrasound in Medical Education: A Comprehensive Systematic Review and Narrative Analysis
    Mariam Haji-Hassan, Roxana-Denisa Capraș, Sorana D. Bolboacă
    Diagnostics.2023; 13(24): 3665.     CrossRef

JEEHP : Journal of Educational Evaluation for Health Professions