Skip Navigation
Skip to contents

JEEHP : Journal of Educational Evaluation for Health Professions

OPEN ACCESS
SEARCH
Search

Search

Page Path
HOME > Search
19 "Artificial intelligence"
Filter
Filter
Article category
Keywords
Publication year
Authors
Funded articles
Research article
Performance of large language models on Thailand’s national medical licensing examination: a cross-sectional study
Prut Saowaprut, Romen Samuel Wabina, Junwei Yang, Lertboon Siriwat
J Educ Eval Health Prof. 2025;22:16.   Published online May 12, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.16    [Epub ahead of print]
  • 586 View
  • 110 Download
AbstractAbstract PDF
Purpose
This study aimed to evaluate the feasibility of general-purpose large language models (LLMs) in addressing inequities in medical licensure exam preparation for Thailand’s National Medical Licensing Examination (ThaiNLE), which currently lacks standardized public study materials.
Methods
We assessed 4 multi-modal LLMs (GPT-4, Claude 3 Opus, Gemini 1.0/1.5 Pro) using a 304-question ThaiNLE Step 1 mock examination (10.2% image-based), applying deterministic API configurations and 5 inference repetitions per model. Performance was measured via micro- and macro-accuracy metrics compared against historical passing thresholds.
Results
All models exceeded passing scores, with GPT-4 achieving the highest accuracy (88.9%; 95% confidence interval, 88.7–89.1), surpassing Thailand’s national average by more than 2 standard deviations. Claude 3.5 Sonnet (80.1%) and Gemini 1.5 Pro (72.8%) followed hierarchically. Models demonstrated robustness across 17 of 20 medical domains, but variability was noted in genetics (74.0%) and cardiovascular topics (58.3%). While models demonstrated proficiency with images (Gemini 1.0 Pro: +9.9% vs. text), text-only accuracy remained superior (GPT-4o: 90.0% vs. 82.6%).
Conclusion
General-purpose LLMs show promise as equitable preparatory tools for ThaiNLE Step 1. However, domain-specific knowledge gaps and inconsistent multi-modal integration warrant refinement before clinical deployment.
Educational/Faculty development material
The role of large language models in the peer-review process: opportunities and challenges for medical journal reviewers and editors  
Jisoo Lee, Jieun Lee, Jeong-Ju Yoo
J Educ Eval Health Prof. 2025;22:4.   Published online January 16, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.4
  • 3,156 View
  • 242 Download
AbstractAbstract PDFSupplementary Material
The peer review process ensures the integrity of scientific research. This is particularly important in the medical field, where research findings directly impact patient care. However, the rapid growth of publications has strained reviewers, causing delays and potential declines in quality. Generative artificial intelligence, especially large language models (LLMs) such as ChatGPT, may assist researchers with efficient, high-quality reviews. This review explores the integration of LLMs into peer review, highlighting their strengths in linguistic tasks and challenges in assessing scientific validity, particularly in clinical medicine. Key points for integration include initial screening, reviewer matching, feedback support, and language review. However, implementing LLMs for these purposes will necessitate addressing biases, privacy concerns, and data confidentiality. We recommend using LLMs as complementary tools under clear guidelines to support, not replace, human expertise in maintaining rigorous peer review standards.
Research article
Effectiveness of ChatGPT-4o in developing continuing professional development plans for graduate radiographers: a descriptive study  
Minh Chau, Elio Stefan Arruzza, Kelly Spuur
J Educ Eval Health Prof. 2024;21:34.   Published online November 18, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.34
  • 1,487 View
  • 201 Download
  • 2 Web of Science
  • 2 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study evaluates the use of ChatGPT-4o in creating tailored continuing professional development (CPD) plans for radiography students, addressing the challenge of aligning CPD with Medical Radiation Practice Board of Australia (MRPBA) requirements. We hypothesized that ChatGPT-4o could support students in CPD planning while meeting regulatory standards.
Methods
A descriptive, experimental design was used to generate 3 unique CPD plans using ChatGPT-4o, each tailored to hypothetical graduate radiographers in varied clinical settings. Each plan followed MRPBA guidelines, focusing on computed tomography specialization by the second year. Three MRPBA-registered academics assessed the plans using criteria of appropriateness, timeliness, relevance, reflection, and completeness from October 2024 to November 2024. Ratings underwent analysis using the Friedman test and intraclass correlation coefficient (ICC) to measure consistency among evaluators.
Results
ChatGPT-4o generated CPD plans generally adhered to regulatory standards across scenarios. The Friedman test indicated no significant differences among raters (P=0.420, 0.761, and 0.807 for each scenario), suggesting consistent scores within scenarios. However, ICC values were low (–0.96, 0.41, and 0.058 for scenarios 1, 2, and 3), revealing variability among raters, particularly in timeliness and completeness criteria, suggesting limitations in the ChatGPT-4o’s ability to address individualized and context-specific needs.
Conclusion
ChatGPT-4o demonstrates the potential to ease the cognitive demands of CPD planning, offering structured support in CPD development. However, human oversight remains essential to ensure plans are contextually relevant and deeply reflective. Future research should focus on enhancing artificial intelligence’s personalization for CPD evaluation, highlighting ChatGPT-4o’s potential and limitations as a tool in professional education.

Citations

Citations to this article as recorded by  
  • Halted medical education and medical residents’ training in Korea, journal metrics, and appreciation to reviewers and volunteers
    Sun Huh
    Journal of Educational Evaluation for Health Professions.2025; 22: 1.     CrossRef
  • The ‘Negotiator’: Assessing artificial intelligence (AI) interview preparation for graduate radiographers
    M. Chau, E. Arruzza, C.L. Singh
    Journal of Medical Imaging and Radiation Sciences.2025; 56(5): 101982.     CrossRef
Educational/Faculty development material
The performance of ChatGPT-4.0o in medical imaging evaluation: a cross-sectional study  
Elio Stefan Arruzza, Carla Marie Evangelista, Minh Chau
J Educ Eval Health Prof. 2024;21:29.   Published online October 31, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.29
  • 2,082 View
  • 273 Download
  • 6 Web of Science
  • 6 Crossref
AbstractAbstract PDFSupplementary Material
This study investigated the performance of ChatGPT-4.0o in evaluating the quality of positioning in radiographic images. Thirty radiographs depicting a variety of knee, elbow, ankle, hand, pelvis, and shoulder projections were produced using anthropomorphic phantoms and uploaded to ChatGPT-4.0o. The model was prompted to provide a solution to identify any positioning errors with justification and offer improvements. A panel of radiographers assessed the solutions for radiographic quality based on established positioning criteria, with a grading scale of 1–5. In only 20% of projections, ChatGPT-4.0o correctly recognized all errors with justifications and offered correct suggestions for improvement. The most commonly occurring score was 3 (9 cases, 30%), wherein the model recognized at least 1 specific error and provided a correct improvement. The mean score was 2.9. Overall, low accuracy was demonstrated, with most projections receiving only partially correct solutions. The findings reinforce the importance of robust radiography education and clinical experience.

Citations

Citations to this article as recorded by  
  • Evaluating Large Language Models for Burning Mouth Syndrome Diagnosis
    Takayuki Suga, Osamu Uehara, Yoshihiro Abiko, Akira Toyofuku
    Journal of Pain Research.2025; Volume 18: 1387.     CrossRef
  • Evaluating the performance of GPT-3.5, GPT-4, and GPT-4o in the Chinese National Medical Licensing Examination
    Dingyuan Luo, Mengke Liu, Runyuan Yu, Yulian Liu, Wenjun Jiang, Qi Fan, Naifeng Kuang, Qiang Gao, Tao Yin, Zuncheng Zheng
    Scientific Reports.2025;[Epub]     CrossRef
  • The ‘Negotiator’: Assessing artificial intelligence (AI) interview preparation for graduate radiographers
    M. Chau, E. Arruzza, C.L. Singh
    Journal of Medical Imaging and Radiation Sciences.2025; 56(5): 101982.     CrossRef
  • Transforming behavioral intention and academic performance: ChatGPT-4.0 insights through SEM, ANN, and cIPMA analysis
    Fazeelat Aziz, Cai Li, Asad Ullah Khan
    Information Development.2025;[Epub]     CrossRef
  • Conversational LLM Chatbot ChatGPT-4 for Colonoscopy Boston Bowel Preparation Scoring: An Artificial Intelligence-to-Head Concordance Analysis
    Raffaele Pellegrino, Alessandro Federico, Antonietta Gerarda Gravina
    Diagnostics.2024; 14(22): 2537.     CrossRef
  • Effectiveness of ChatGPT-4o in developing continuing professional development plans for graduate radiographers: a descriptive study
    Minh Chau, Elio Stefan Arruzza, Kelly Spuur
    Journal of Educational Evaluation for Health Professions.2024; 21: 34.     CrossRef
Research articles
GPT-4o’s competency in answering the simulated written European Board of Interventional Radiology exam compared to a medical student and experts in Germany and its ability to generate exam items on interventional radiology: a descriptive study
Sebastian Ebel, Constantin Ehrengut, Timm Denecke, Holger Gößmann, Anne Bettina Beeskow
J Educ Eval Health Prof. 2024;21:21.   Published online August 20, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.21
  • 1,771 View
  • 313 Download
  • 7 Web of Science
  • 8 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to determine whether ChatGPT-4o, a generative artificial intelligence (AI) platform, was able to pass a simulated written European Board of Interventional Radiology (EBIR) exam and whether GPT-4o can be used to train medical students and interventional radiologists of different levels of expertise by generating exam items on interventional radiology.
Methods
GPT-4o was asked to answer 370 simulated exam items of the Cardiovascular and Interventional Radiology Society of Europe (CIRSE) for EBIR preparation (CIRSE Prep). Subsequently, GPT-4o was requested to generate exam items on interventional radiology topics at levels of difficulty suitable for medical students and the EBIR exam. Those generated items were answered by 4 participants, including a medical student, a resident, a consultant, and an EBIR holder. The correctly answered items were counted. One investigator checked the answers and items generated by GPT-4o for correctness and relevance. This work was done from April to July 2024.
Results
GPT-4o correctly answered 248 of the 370 CIRSE Prep items (67.0%). For 50 CIRSE Prep items, the medical student answered 46.0%, the resident 42.0%, the consultant 50.0%, and the EBIR holder 74.0% correctly. All participants answered 82.0% to 92.0% of the 50 GPT-4o generated items at the student level correctly. For the 50 GPT-4o items at the EBIR level, the medical student answered 32.0%, the resident 44.0%, the consultant 48.0%, and the EBIR holder 66.0% correctly. All participants could pass the GPT-4o-generated items for the student level; while the EBIR holder could pass the GPT-4o-generated items for the EBIR level. Two items (0.3%) out of 150 generated by the GPT-4o were assessed as implausible.
Conclusion
GPT-4o could pass the simulated written EBIR exam and create exam items of varying difficulty to train medical students and interventional radiologists.

Citations

Citations to this article as recorded by  
  • Evaluating the performance of ChatGPT in patient consultation and image-based preliminary diagnosis in thyroid eye disease
    Yue Wang, Shuo Yang, Chengcheng Zeng, Yingwei Xie, Ya Shen, Jian Li, Xiao Huang, Ruili Wei, Yuqing Chen
    Frontiers in Medicine.2025;[Epub]     CrossRef
  • Solving Complex Pediatric Surgical Case Studies: A Comparative Analysis of Copilot, ChatGPT-4, and Experienced Pediatric Surgeons' Performance
    Richard Gnatzy, Martin Lacher, Michael Berger, Michael Boettcher, Oliver J. Deffaa, Joachim Kübler, Omid Madadi-Sanjani, Illya Martynov, Steffi Mayer, Mikko P. Pakarinen, Richard Wagner, Tomas Wester, Augusto Zani, Ophelia Aubert
    European Journal of Pediatric Surgery.2025;[Epub]     CrossRef
  • Preliminary assessment of large language models’ performance in answering questions on developmental dysplasia of the hip
    Shiwei Li, Jun Jiang, Xiaodong Yang
    Journal of Children's Orthopaedics.2025;[Epub]     CrossRef
  • AI and Interventional Radiology: A Narrative Review of Reviews on Opportunities, Challenges, and Future Directions
    Andrea Lastrucci, Nicola Iosca, Yannick Wandael, Angelo Barra, Graziano Lepri, Nevio Forini, Renzo Ricci, Vittorio Miele, Daniele Giansanti
    Diagnostics.2025; 15(7): 893.     CrossRef
  • Evaluating the performance of GPT-3.5, GPT-4, and GPT-4o in the Chinese National Medical Licensing Examination
    Dingyuan Luo, Mengke Liu, Runyuan Yu, Yulian Liu, Wenjun Jiang, Qi Fan, Naifeng Kuang, Qiang Gao, Tao Yin, Zuncheng Zheng
    Scientific Reports.2025;[Epub]     CrossRef
  • Evaluating Large Language Models for Preoperative Patient Education in Superior Capsular Reconstruction: Comparative Study of Claude, GPT, and Gemini
    Yukang Liu, Hua Li, Jianfeng Ouyang, Zhaowen Xue, Min Wang, Hebei He, Bin Song, Xiaofei Zheng, Wenyi Gan
    JMIR Perioperative Medicine.2025; 8: e70047.     CrossRef
  • From GPT-3.5 to GPT-4.o: A Leap in AI’s Medical Exam Performance
    Markus Kipp
    Information.2024; 15(9): 543.     CrossRef
  • Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study
    Yikai Chen, Xiujie Huang, Fangjie Yang, Haiming Lin, Haoyu Lin, Zhuoqun Zheng, Qifeng Liang, Jinhai Zhang, Xinxin Li
    BMC Medical Education.2024;[Epub]     CrossRef
Performance of GPT-3.5 and GPT-4 on standardized urology knowledge assessment items in the United States: a descriptive study
Max Samuel Yudovich, Elizaveta Makarova, Christian Michael Hague, Jay Dilip Raman
J Educ Eval Health Prof. 2024;21:17.   Published online July 8, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.17
  • 3,148 View
  • 332 Download
  • 5 Web of Science
  • 7 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to evaluate the performance of Chat Generative Pre-Trained Transformer (ChatGPT) with respect to standardized urology multiple-choice items in the United States.
Methods
In total, 700 multiple-choice urology board exam-style items were submitted to GPT-3.5 and GPT-4, and responses were recorded. Items were categorized based on topic and question complexity (recall, interpretation, and problem-solving). The accuracy of GPT-3.5 and GPT-4 was compared across item types in February 2024.
Results
GPT-4 answered 44.4% of items correctly compared to 30.9% for GPT-3.5 (P<0.00001). GPT-4 (vs. GPT-3.5) had higher accuracy with urologic oncology (43.8% vs. 33.9%, P=0.03), sexual medicine (44.3% vs. 27.8%, P=0.046), and pediatric urology (47.1% vs. 27.1%, P=0.012) items. Endourology (38.0% vs. 25.7%, P=0.15), reconstruction and trauma (29.0% vs. 21.0%, P=0.41), and neurourology (49.0% vs. 33.3%, P=0.11) items did not show significant differences in performance across versions. GPT-4 also outperformed GPT-3.5 with respect to recall (45.9% vs. 27.4%, P<0.00001), interpretation (45.6% vs. 31.5%, P=0.0005), and problem-solving (41.8% vs. 34.5%, P=0.56) type items. This difference was not significant for the higher-complexity items.
Conclusions
ChatGPT performs relatively poorly on standardized multiple-choice urology board exam-style items, with GPT-4 outperforming GPT-3.5. The accuracy was below the proposed minimum passing standards for the American Board of Urology’s Continuing Urologic Certification knowledge reinforcement activity (60%). As artificial intelligence progresses in complexity, ChatGPT may become more capable and accurate with respect to board examination items. For now, its responses should be scrutinized.

Citations

Citations to this article as recorded by  
  • Evaluating the Performance of ChatGPT4.0 Versus ChatGPT3.5 on the Hand Surgery Self-Assessment Exam: A Comparative Analysis of Performance on Image-Based Questions
    Kiera L Vrindten, Megan Hsu, Yuri Han, Brian Rust, Heili Truumees, Brian M Katt
    Cureus.2025;[Epub]     CrossRef
  • Assessing the performance of large language models (GPT-3.5 and GPT-4) and accurate clinical information for pediatric nephrology
    Nadide Melike Sav
    Pediatric Nephrology.2025;[Epub]     CrossRef
  • Retrieval-augmented generation enhances large language model performance on the Japanese orthopedic board examination
    Juntaro Maruyama, Satoshi Maki, Takeo Furuya, Yuki Nagashima, Kyota Kitagawa, Yasunori Toki, Shuhei Iwata, Megumi Yazaki, Takaki Kitamura, Sho Gushiken, Yuji Noguchi, Masataka Miura, Masahiro Inoue, Yasuhiro Shiga, Kazuhide Inage, Sumihisa Orita, Seiji Oh
    Journal of Orthopaedic Science.2025;[Epub]     CrossRef
  • Advancements in large language model accuracy for answering physical medicine and rehabilitation board review questions
    Jason Bitterman, Alexander D'Angelo, Alexandra Holachek, James E. Eubanks
    PM&R.2025;[Epub]     CrossRef
  • Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis
    Ling Wang, Jinglin Li, Boyang Zhuang, Shasha Huang, Meilin Fang, Cunze Wang, Wen Li, Mohan Zhang, Shurong Gong
    Journal of Medical Internet Research.2025; 27: e64486.     CrossRef
  • From GPT-3.5 to GPT-4.o: A Leap in AI’s Medical Exam Performance
    Markus Kipp
    Information.2024; 15(9): 543.     CrossRef
  • Artificial Intelligence can Facilitate Application of Risk Stratification Algorithms to Bladder Cancer Patient Case Scenarios
    Max S Yudovich, Ahmad N Alzubaidi, Jay D Raman
    Clinical Medicine Insights: Oncology.2024;[Epub]     CrossRef
Review
Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review  
Xiaojun Xu, Yixiao Chen, Jing Miao
J Educ Eval Health Prof. 2024;21:6.   Published online March 15, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.6
  • 9,790 View
  • 685 Download
  • 24 Web of Science
  • 28 Crossref
AbstractAbstract PDFSupplementary Material
Background
ChatGPT is a large language model (LLM) based on artificial intelligence (AI) capable of responding in multiple languages and generating nuanced and highly complex responses. While ChatGPT holds promising applications in medical education, its limitations and potential risks cannot be ignored.
Methods
A scoping review was conducted for English articles discussing ChatGPT in the context of medical education published after 2022. A literature search was performed using PubMed/MEDLINE, Embase, and Web of Science databases, and information was extracted from the relevant studies that were ultimately included.
Results
ChatGPT exhibits various potential applications in medical education, such as providing personalized learning plans and materials, creating clinical practice simulation scenarios, and assisting in writing articles. However, challenges associated with academic integrity, data accuracy, and potential harm to learning were also highlighted in the literature. The paper emphasizes certain recommendations for using ChatGPT, including the establishment of guidelines. Based on the review, 3 key research areas were proposed: cultivating the ability of medical students to use ChatGPT correctly, integrating ChatGPT into teaching activities and processes, and proposing standards for the use of AI by medical students.
Conclusion
ChatGPT has the potential to transform medical education, but careful consideration is required for its full integration. To harness the full potential of ChatGPT in medical education, attention should not only be given to the capabilities of AI but also to its impact on students and teachers.

Citations

Citations to this article as recorded by  
  • AI-assisted patient education: Challenges and solutions in pediatric kidney transplantation
    MZ Ihsan, Dony Apriatama, Pithriani, Riza Amalia
    Patient Education and Counseling.2025; 131: 108575.     CrossRef
  • Exploring predictors of AI chatbot usage intensity among students: Within- and between-person relationships based on the technology acceptance model
    Anne-Kathrin Kleine, Insa Schaffernak, Eva Lermer
    Computers in Human Behavior: Artificial Humans.2025; 3: 100113.     CrossRef
  • AI-powered standardised patients: evaluating ChatGPT-4o’s impact on clinical case management in intern physicians
    Selcen Öncü, Fulya Torun, Hilal Hatice Ülkü
    BMC Medical Education.2025;[Epub]     CrossRef
  • UsmleGPT: An AI application for developing MCQs via multi-agent system
    Zhehan Jiang, Shicong Feng
    Software Impacts.2025; 23: 100742.     CrossRef
  • ChatGPT’s Performance on Portuguese Medical Examination Questions: Comparative Analysis of ChatGPT-3.5 Turbo and ChatGPT-4o Mini
    Filipe Prazeres
    JMIR Medical Education.2025; 11: e65108.     CrossRef
  • Transforming medical education: leveraging large language models to enhance PBL—a proof-of-concept study
    Shoukat Ali Arain, Shahid Akhtar Akhund, Muhammad Abrar Barakzai, Sultan Ayoub Meo
    Advances in Physiology Education.2025; 49(2): 398.     CrossRef
  • Integrating artificial intelligence into pre-clinical medical education: challenges, opportunities, and recommendations
    Birgit Pohn, Lars Mehnen, Sebastian Fitzek, Kyung-Eun (Anna) Choi, Ralf J. Braun, Sepideh Hatamikia
    Frontiers in Education.2025;[Epub]     CrossRef
  • Evaluating the Accuracy and Reliability of Large Language Models (ChatGPT, Claude, DeepSeek, Gemini, Grok, and Le Chat) in Answering Item-Analyzed Multiple-Choice Questions on Blood Physiology
    Mayank Agarwal, Priyanka Sharma, Pinaki Wani
    Cureus.2025;[Epub]     CrossRef
  • Artificial intelligence-assisted academic writing: recommendations for ethical use
    Adam Cheng, Aaron Calhoun, Gabriel Reedy
    Advances in Simulation.2025;[Epub]     CrossRef
  • University Educators Perspectives on ChatGPT: A Technology Acceptance Model-Based Study
    Muna Barakat, Nesreen A. Salim, Malik Sallam
    Open Praxis.2025; 17(1): 129.     CrossRef
  • Knowledge and use, perceptions of benefits and limitations of artificial intelligence chatbots among Italian physiotherapy students: a cross-sectional national study
    Fabio Tortella, Alvisa Palese, Andrea Turolla, Greta Castellini, Paolo Pillastrini, Maria Gabriella Landuzzi, Chad Cook, Giovanni Galeoto, Giuseppe Giovannico, Lia Rodeghiero, Silvia Gianola, Giacomo Rossettini
    BMC Medical Education.2025;[Epub]     CrossRef
  • Digital and Intelligence Education in Medicine: A Bibliometric and Visualization Analysis Using CiteSpace and VOSviewer
    Bing Xiang Yang, FuLing Zhou, Nan Bai, Sichen Zhou, Chunyan Luo, Qing Wang, Arkers Kwan Ching Wong, Frances Lin
    Frontiers of Digital Education.2025;[Epub]     CrossRef
  • Prompts, privacy, and personalized learning: integrating AI into nursing education—a qualitative study
    Mingyan Shen, Yanping Shen, Fangchi Liu, Jiawen Jin
    BMC Nursing.2025;[Epub]     CrossRef
  • The role of ChatGPT-4o in differential diagnosis and management of vertigo-related disorders
    Xu Liu, Suming Shi, Xin Zhang, Qianwen Gao, Wuqing Wang
    Scientific Reports.2025;[Epub]     CrossRef
  • Situating governance and regulatory concerns for generative artificial intelligence and large language models in medical education
    Michael Tran, Chinthaka Balasooriya, Jitendra Jonnagaddala, Gilberto Ka-Kit Leung, Neeraj Mahboobani, Subha Ramani, Joel Rhee, Lambert Schuwirth, Neysan Sedaghat Najafzadeh-Tabrizi, Carolyn Semmler, Zoie SY Wong
    npj Digital Medicine.2025;[Epub]     CrossRef
  • Chatbots in neurology and neuroscience: Interactions with students, patients and neurologists
    Stefano Sandrone
    Brain Disorders.2024; 15: 100145.     CrossRef
  • ChatGPT in education: unveiling frontiers and future directions through systematic literature review and bibliometric analysis
    Buddhini Amarathunga
    Asian Education and Development Studies.2024; 13(5): 412.     CrossRef
  • Evaluating the performance of ChatGPT-3.5 and ChatGPT-4 on the Taiwan plastic surgery board examination
    Ching-Hua Hsieh, Hsiao-Yun Hsieh, Hui-Ping Lin
    Heliyon.2024; 10(14): e34851.     CrossRef
  • Preparing for Artificial General Intelligence (AGI) in Health Professions Education: AMEE Guide No. 172
    Ken Masters, Anne Herrmann-Werner, Teresa Festl-Wietek, David Taylor
    Medical Teacher.2024; 46(10): 1258.     CrossRef
  • A Comparative Analysis of ChatGPT and Medical Faculty Graduates in Medical Specialization Exams: Uncovering the Potential of Artificial Intelligence in Medical Education
    Gülcan Gencer, Kerem Gencer
    Cureus.2024;[Epub]     CrossRef
  • Research ethics and issues regarding the use of ChatGPT-like artificial intelligence platforms by authors and reviewers: a narrative review
    Sang-Jun Kim
    Science Editing.2024; 11(2): 96.     CrossRef
  • Innovation Off the Bat: Bridging the ChatGPT Gap in Digital Competence among English as a Foreign Language Teachers
    Gulsara Urazbayeva, Raisa Kussainova, Aikumis Aibergen, Assel Kaliyeva, Gulnur Kantayeva
    Education Sciences.2024; 14(9): 946.     CrossRef
  • Exploring the perceptions of Chinese pre-service teachers on the integration of generative AI in English language teaching: Benefits, challenges, and educational implications
    Ji Young Chung, Seung-Hoon Jeong
    Online Journal of Communication and Media Technologies.2024; 14(4): e202457.     CrossRef
  • Unveiling the bright side and dark side of AI-based ChatGPT : a bibliographic and thematic approach
    Chandan Kumar Tiwari, Mohd. Abass Bhat, Abel Dula Wedajo, Shagufta Tariq Khan
    Journal of Decision Systems.2024; : 1.     CrossRef
  • Artificial Intelligence in Medical Education and Mentoring in Rehabilitation Medicine
    Julie K. Silver, Mustafa Reha Dodurgali, Nara Gavini
    American Journal of Physical Medicine & Rehabilitation.2024; 103(11): 1039.     CrossRef
  • The Potential of Artificial Intelligence Tools for Reducing Uncertainty in Medicine and Directions for Medical Education
    Sauliha Rabia Alli, Soaad Qahhār Hossain, Sunit Das, Ross Upshur
    JMIR Medical Education.2024; 10: e51446.     CrossRef
  • A Systematic Literature Review of Empirical Research on Applying Generative Artificial Intelligence in Education
    Xin Zhang, Peng Zhang, Yuan Shen, Min Liu, Qiong Wang, Dragan Gašević, Yizhou Fan
    Frontiers of Digital Education.2024; 1(3): 223.     CrossRef
  • Artificial intelligence in medical problem-based learning: opportunities and challenges
    Yaoxing Chen, Hong Qi, Yu Qiu, Juan Li, Liang Zhu, Xiaoling Gao, Hao Wang, Gan Jiang
    Global Medical Education.2024;[Epub]     CrossRef
Research articles
ChatGPT (GPT-4) passed the Japanese National License Examination for Pharmacists in 2022, answering all items including those with diagrams: a descriptive study  
Hiroyasu Sato, Katsuhiko Ogasawara
J Educ Eval Health Prof. 2024;21:4.   Published online February 28, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.4
  • 4,017 View
  • 307 Download
  • 6 Web of Science
  • 10 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
The objective of this study was to assess the performance of ChatGPT (GPT-4) on all items, including those with diagrams, in the Japanese National License Examination for Pharmacists (JNLEP) and compare it with the previous GPT-3.5 model’s performance.
Methods
The 107th JNLEP, conducted in 2022, with 344 items input into the GPT-4 model, was targeted for this study. Separately, 284 items, excluding those with diagrams, were entered into the GPT-3.5 model. The answers were categorized and analyzed to determine accuracy rates based on categories, subjects, and presence or absence of diagrams. The accuracy rates were compared to the main passing criteria (overall accuracy rate ≥62.9%).
Results
The overall accuracy rate for all items in the 107th JNLEP in GPT-4 was 72.5%, successfully meeting all the passing criteria. For the set of items without diagrams, the accuracy rate was 80.0%, which was significantly higher than that of the GPT-3.5 model (43.5%). The GPT-4 model demonstrated an accuracy rate of 36.1% for items that included diagrams.
Conclusion
Advancements that allow GPT-4 to process images have made it possible for LLMs to answer all items in medical-related license examinations. This study’s findings confirm that ChatGPT (GPT-4) possesses sufficient knowledge to meet the passing criteria.

Citations

Citations to this article as recorded by  
  • Performance of ChatGPT‐3.5 and ChatGPT‐4o in the Japanese National Dental Examination
    Osamu Uehara, Tetsuro Morikawa, Fumiya Harada, Nodoka Sugiyama, Yuko Matsuki, Daichi Hiraki, Hinako Sakurai, Takashi Kado, Koki Yoshida, Yukie Murata, Hirofumi Matsuoka, Toshiyuki Nagasawa, Yasushi Furuichi, Yoshihiro Abiko, Hiroko Miura
    Journal of Dental Education.2025; 89(4): 459.     CrossRef
  • Qwen-2.5 Outperforms Other Large Language Models in the Chinese National Nursing Licensing Examination: Retrospective Cross-Sectional Comparative Study
    Shiben Zhu, Wanqin Hu, Zhi Yang, Jiani Yan, Fang Zhang
    JMIR Medical Informatics.2025; 13: e63731.     CrossRef
  • ChatGPT (GPT-4V) Performance on the Healthcare Information Technologist Examination in Japan
    Kai Ishida, Eisuke Hanada
    Cureus.2025;[Epub]     CrossRef
  • Medication counseling for OTC drugs using customized ChatGPT-4: Comparison with ChatGPT-3.5 and ChatGPT-4o
    Keisuke Kiyomiya, Tohru Aomori, Hisakazu Ohtani
    DIGITAL HEALTH.2025;[Epub]     CrossRef
  • Current Use of Generative Artificial Intelligence in Pharmacy Practice: A Literature Mini-review
    Keisuke Kiyomiya, Tohru Aomori, Hitoshi Kawazoe, Hisakazu Ohtani
    Iryo Yakugaku (Japanese Journal of Pharmaceutical Health Care and Sciences).2025; 51(4): 177.     CrossRef
  • Performance evaluation of large language models for the national nursing examination in Japan
    Tomoki Kuribara, Kengo Hirayama, Kenji Hirata
    DIGITAL HEALTH.2025;[Epub]     CrossRef
  • Potential of ChatGPT to Pass the Japanese Medical and Healthcare Professional National Licenses: A Literature Review
    Kai Ishida, Eisuke Hanada
    Cureus.2024;[Epub]     CrossRef
  • Performance of Generative Pre-trained Transformer (GPT)-4 and Gemini Advanced on the First-Class Radiation Protection Supervisor Examination in Japan
    Hiroki Goto, Yoshioki Shiraishi, Seiji Okada
    Cureus.2024;[Epub]     CrossRef
  • An exploratory assessment of GPT-4o and GPT-4 performance on the Japanese National Dental Examination
    Masaki Morishita, Hikaru Fukuda, Shino Yamaguchi, Kosuke Muraoka, Taiji Nakamura, Masanari Hayashi, Izumi Yoshioka, Kentaro Ono, Shuji Awano
    The Saudi Dental Journal.2024; 36(12): 1577.     CrossRef
  • Evaluating the Accuracy of ChatGPT in the Japanese Board-Certified Physiatrist Examination
    Yuki Kato, Kenta Ushida, Ryo Momosaki
    Cureus.2024;[Epub]     CrossRef
Information amount, accuracy, and relevance of generative artificial intelligence platforms’ answers regarding learning objectives of medical arthropodology evaluated in English and Korean queries in December 2023: a descriptive study
Hyunju Lee, Soobin Park
J Educ Eval Health Prof. 2023;20:39.   Published online December 28, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.39
  • 3,470 View
  • 248 Download
  • 3 Web of Science
  • 3 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study assessed the performance of 6 generative artificial intelligence (AI) platforms on the learning objectives of medical arthropodology in a parasitology class in Korea. We examined the AI platforms’ performance by querying in Korean and English to determine their information amount, accuracy, and relevance in prompts in both languages.
Methods
From December 15 to 17, 2023, 6 generative AI platforms—Bard, Bing, Claude, Clova X, GPT-4, and Wrtn—were tested on 7 medical arthropodology learning objectives in English and Korean. Clova X and Wrtn are platforms from Korean companies. Responses were evaluated using specific criteria for the English and Korean queries.
Results
Bard had abundant information but was fourth in accuracy and relevance. GPT-4, with high information content, ranked first in accuracy and relevance. Clova X was 4th in amount but 2nd in accuracy and relevance. Bing provided less information, with moderate accuracy and relevance. Wrtn’s answers were short, with average accuracy and relevance. Claude AI had reasonable information, but lower accuracy and relevance. The responses in English were superior in all aspects. Clova X was notably optimized for Korean, leading in relevance.
Conclusion
In a study of 6 generative AI platforms applied to medical arthropodology, GPT-4 excelled overall, while Clova X, a Korea-based AI product, achieved 100% relevance in Korean queries, the highest among its peers. Utilizing these AI platforms in classrooms improved the authors’ self-efficacy and interest in the subject, offering a positive experience of interacting with generative AI platforms to question and receive information.

Citations

Citations to this article as recorded by  
  • Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
    Xiaojun Xu, Yixiao Chen, Jing Miao
    Journal of Educational Evaluation for Health Professions.2024; 21: 6.     CrossRef
  • The emergence of generative artificial intelligence platforms in 2023, journal metrics, appreciation to reviewers and volunteers, and obituary
    Sun Huh
    Journal of Educational Evaluation for Health Professions.2024; 21: 9.     CrossRef
  • Comparison of the Performance of ChatGPT, Claude and Bard in Support of Myopia Prevention and Control
    Yan Wang, Lihua Liang, Ran Li, Yihua Wang, Changfu Hao
    Journal of Multidisciplinary Healthcare.2024; Volume 17: 3917.     CrossRef
Review
Application of artificial intelligence chatbots, including ChatGPT, in education, scholarly work, programming, and content generation and its prospects: a narrative review
Tae Won Kim
J Educ Eval Health Prof. 2023;20:38.   Published online December 27, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.38
  • 18,925 View
  • 1,304 Download
  • 30 Web of Science
  • 28 Crossref
AbstractAbstract PDFSupplementary Material
This study aims to explore ChatGPT’s (GPT-3.5 version) functionalities, including reinforcement learning, diverse applications, and limitations. ChatGPT is an artificial intelligence (AI) chatbot powered by OpenAI’s Generative Pre-trained Transformer (GPT) model. The chatbot’s applications span education, programming, content generation, and more, demonstrating its versatility. ChatGPT can improve education by creating assignments and offering personalized feedback, as shown by its notable performance in medical exams and the United States Medical Licensing Exam. However, concerns include plagiarism, reliability, and educational disparities. It aids in various research tasks, from design to writing, and has shown proficiency in summarizing and suggesting titles. Its use in scientific writing and language translation is promising, but professional oversight is needed for accuracy and originality. It assists in programming tasks like writing code, debugging, and guiding installation and updates. It offers diverse applications, from cheering up individuals to generating creative content like essays, news articles, and business plans. Unlike search engines, ChatGPT provides interactive, generative responses and understands context, making it more akin to human conversation, in contrast to conventional search engines’ keyword-based, non-interactive nature. ChatGPT has limitations, such as potential bias, dependence on outdated data, and revenue generation challenges. Nonetheless, ChatGPT is considered to be a transformative AI tool poised to redefine the future of generative technology. In conclusion, advancements in AI, such as ChatGPT, are altering how knowledge is acquired and applied, marking a shift from search engines to creativity engines. This transformation highlights the increasing importance of AI literacy and the ability to effectively utilize AI in various domains of life.

Citations

Citations to this article as recorded by  
  • The Development and Validation of an Artificial Intelligence Chatbot Dependence Scale
    Xing Zhang, Mingyue Yin, Mingyang Zhang, Zhaoqian Li, Hansen Li
    Cyberpsychology, Behavior, and Social Networking.2025; 28(2): 126.     CrossRef
  • Readability, quality and accuracy of generative artificial intelligence chatbots for commonly asked questions about labor epidurals: a comparison of ChatGPT and Bard
    D. Lee, M. Brown, J. Hammond, M. Zakowski
    International Journal of Obstetric Anesthesia.2025; 61: 104317.     CrossRef
  • ChatGPT-4 Performance on German Continuing Medical Education—Friend or Foe (Trick or Treat)? Protocol for a Randomized Controlled Trial
    Christian Burisch, Abhav Bellary, Frank Breuckmann, Jan Ehlers, Serge C Thal, Timur Sellmann, Daniel Gödde
    JMIR Research Protocols.2025; 14: e63887.     CrossRef
  • The effect of incorporating large language models into the teaching on critical thinking disposition: An “AI + Constructivism Learning Theory” attempt
    Peng Wang, Kexin Yin, Mingzhu Zhang, Yuanxin Zheng, Tong Zhang, Yanjun Kang, Xun Feng
    Education and Information Technologies.2025;[Epub]     CrossRef
  • The Impact of Adaptive Learning Technologies, Personalized Feedback, and Interactive AI Tools on Student Engagement: The Moderating Role of Digital Literacy
    Husam Yaseen, Abdelaziz Saleh Mohammad, Najwa Ashal, Hesham Abusaimeh, Ahmad Ali, Abdel-Aziz Ahmad Sharabati
    Sustainability.2025; 17(3): 1133.     CrossRef
  • Artificial Intelligence in Nursing: New Opportunities and Challenges
    Estel·la Ramírez‐Baraldes, Daniel García‐Gutiérrez, Cristina García‐Salido
    European Journal of Education.2025;[Epub]     CrossRef
  • Can ChatGPT be used as a scientific source of information on tooth extraction?
    Shiori Yamamoto, Masakazu Hamada, Kyoko Nishiyama, Ayako Motoki, Yusei Fujita, Narikazu Uzawa
    Journal of Oral and Maxillofacial Surgery, Medicine, and Pathology.2025;[Epub]     CrossRef
  • Exploring artificial intelligence (AI) Chatbot usage behaviors and their association with mental health outcomes in Chinese university students
    Xing Zhang, Zhaoqian Li, Mingyang Zhang, Mingyue Yin, Zhangyu Yang, Dong Gao, Hansen Li
    Journal of Affective Disorders.2025; 380: 394.     CrossRef
  • The analysis of optimization in music aesthetic education under artificial intelligence
    Yixuan Peng
    Scientific Reports.2025;[Epub]     CrossRef
  • The Role of Artificial Intelligence in Computer Science Education: A Systematic Review with a Focus on Database Instruction
    Alkmini Gaitantzi, Ioannis Kazanidis
    Applied Sciences.2025; 15(7): 3960.     CrossRef
  • A Bibliometric Exposition and Review on Leveraging LLMs for Programming Education
    Joanah Pwanedo Amos, Oluwatosin Ahmed Amodu, Raja Azlina Raja Mahmood, Akanbi Bolakale Abdulqudus, Anies Faziehan Zakaria, Abimbola Rhoda Iyanda, Umar Ali Bukar, Zurina Mohd Hanapi
    IEEE Access.2025; 13: 58364.     CrossRef
  • Can ChatGPT be trusted as a resource for a scholarly article on treatment planning implant-supported prostheses?
    Steven J. Sadowsky
    The Journal of Prosthetic Dentistry.2025;[Epub]     CrossRef
  • Use of machine translation in foreign language education
    Blanka Klimova
    Cogent Arts & Humanities.2025;[Epub]     CrossRef
  • Comparison of triage performance among DRP tool, ChatGPT, and outpatient rehabilitation doctors
    Yucong Zou, Ruixue Ye, Yan Gao, Jing Zhou, Yawei Li, Wenshi Chen, Fubing Zha, Yulong Wang
    Scientific Reports.2025;[Epub]     CrossRef
  • A Training Needs Analysis for AI and Generative AI in Medical Education: Perspectives of Faculty and Students
    Lise McCoy, Natarajan Ganesan, Viswanathan Rajagopalan, Douglas McKell, Diego F. Niño, Mary Claire Swaim
    Journal of Medical Education and Curricular Development.2025;[Epub]     CrossRef
  • Application of ChatGPT 4.0 in radiological dose management: Perceptions of radiographers with varying expertise
    L. Federico, D.D. Fusaro, G.C. Coppola, M. Gregori, S. Durante
    Radiography.2025; 31(4): 102972.     CrossRef
  • Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
    Xiaojun Xu, Yixiao Chen, Jing Miao
    Journal of Educational Evaluation for Health Professions.2024; 21: 6.     CrossRef
  • Artificial Intelligence: Fundamentals and Breakthrough Applications in Epilepsy
    Wesley Kerr, Sandra Acosta, Patrick Kwan, Gregory Worrell, Mohamad A. Mikati
    Epilepsy Currents.2024;[Epub]     CrossRef
  • A Developed Graphical User Interface-Based on Different Generative Pre-trained Transformers Models
    Ekrem Küçük, İpek Balıkçı Çiçek, Zeynep Küçükakçalı, Cihan Yetiş, Cemil Çolak
    ODÜ Tıp Dergisi.2024; 11(1): 18.     CrossRef
  • Art or Artifact: Evaluating the Accuracy, Appeal, and Educational Value of AI-Generated Imagery in DALL·E 3 for Illustrating Congenital Heart Diseases
    Mohamad-Hani Temsah, Abdullah N. Alhuzaimi, Mohammed Almansour, Fadi Aljamaan, Khalid Alhasan, Munirah A. Batarfi, Ibraheem Altamimi, Amani Alharbi, Adel Abdulaziz Alsuhaibani, Leena Alwakeel, Abdulrahman Abdulkhaliq Alzahrani, Khaled B. Alsulaim, Amr Jam
    Journal of Medical Systems.2024;[Epub]     CrossRef
  • Authentic assessment in medical education: exploring AI integration and student-as-partners collaboration
    Syeda Sadia Fatima, Nabeel Ashfaque Sheikh, Athar Osama
    Postgraduate Medical Journal.2024; 100(1190): 959.     CrossRef
  • Comparative performance analysis of large language models: ChatGPT-3.5, ChatGPT-4 and Google Gemini in glucocorticoid-induced osteoporosis
    Linjian Tong, Chaoyang Zhang, Rui Liu, Jia Yang, Zhiming Sun
    Journal of Orthopaedic Surgery and Research.2024;[Epub]     CrossRef
  • Can AI-Generated Clinical Vignettes in Japanese Be Used Medically and Linguistically?
    Yasutaka Yanagita, Daiki Yokokawa, Shun Uchida, Yu Li, Takanori Uehara, Masatomi Ikusaka
    Journal of General Internal Medicine.2024; 39(16): 3282.     CrossRef
  • ChatGPT vs. sleep disorder specialist responses to common sleep queries: Ratings by experts and laypeople
    Jiyoung Kim, Seo-Young Lee, Jee Hyun Kim, Dong-Hyeon Shin, Eun Hye Oh, Jin A Kim, Jae Wook Cho
    Sleep Health.2024; 10(6): 665.     CrossRef
  • Technology integration into Chinese as a foreign language learning in higher education: An integrated bibliometric analysis and systematic review (2000–2024)
    Binze Xu
    Language Teaching Research.2024;[Epub]     CrossRef
  • The Transformative Power of Generative Artificial Intelligence for Achieving the Sustainable Development Goal of Quality Education
    Prema Nedungadi, Kai-Yu Tang, Raghu Raman
    Sustainability.2024; 16(22): 9779.     CrossRef
  • Is AI the new course creator
    Sheri Conklin, Tom Dorgan, Daisyane Barreto
    Discover Education.2024;[Epub]     CrossRef
  • Emergency Medicine Assistants in the Field of Toxicology, Comparison of ChatGPT-3.5 and GEMINI Artificial Intelligence Systems
    Hatice Aslı Bedel, Cihan Bedel, Fatih Selvi, Ökkeş Zortuk, Yusuf Karanci
    Acta medica Lituanica.2024; 31(2): 294.     CrossRef
Brief report
ChatGPT (GPT-3.5) as an assistant tool in microbial pathogenesis studies in Sweden: a cross-sectional comparative study  
Catharina Hultgren, Annica Lindkvist, Volkan Özenci, Sophie Curbo
J Educ Eval Health Prof. 2023;20:32.   Published online November 22, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.32
  • 2,590 View
  • 154 Download
  • 4 Web of Science
  • 4 Crossref
AbstractAbstract PDFSupplementary Material
ChatGPT (GPT-3.5) has entered higher education and there is a need to determine how to use it effectively. This descriptive study compared the ability of GPT-3.5 and teachers to answer questions from dental students and construct detailed intended learning outcomes. When analyzed according to a Likert scale, we found that GPT-3.5 answered the questions from dental students in a similar or even more elaborate way compared to the answers that had previously been provided by a teacher. GPT-3.5 was also asked to construct detailed intended learning outcomes for a course in microbial pathogenesis, and when these were analyzed according to a Likert scale they were, to a large degree, found irrelevant. Since students are using GPT-3.5, it is important that instructors learn how to make the best use of it both to be able to advise students and to benefit from its potential.

Citations

Citations to this article as recorded by  
  • Unlocking learning: exploring take-home examinations and viva voce examinations in microbiology education for biomedical laboratory science students
    Sophie Curbo, Annica Lindkvist, Catharina Hultgren, Jorge Cervantes
    Journal of Microbiology & Biology Education.2025;[Epub]     CrossRef
  • Global Trends in the Use of Artificial Intelligence in Dental Education: A Bibliometric Analysis
    Margarita Iniesta, Juan José Pérez‐Higueras
    European Journal of Dental Education.2025;[Epub]     CrossRef
  • Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
    Xiaojun Xu, Yixiao Chen, Jing Miao
    Journal of Educational Evaluation for Health Professions.2024; 21: 6.     CrossRef
  • Information amount, accuracy, and relevance of generative artificial intelligence platforms’ answers regarding learning objectives of medical arthropodology evaluated in English and Korean queries in December 2023: a descriptive study
    Hyunju Lee, Soobin Park
    Journal of Educational Evaluation for Health Professions.2023; 20: 39.     CrossRef
Research articles
Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical Examination: a cross-sectional study  
Betzy Clariza Torres-Zegarra, Wagner Rios-Garcia, Alvaro Micael Ñaña-Cordova, Karen Fatima Arteaga-Cisneros, Xiomara Cristina Benavente Chalco, Marina Atena Bustamante Ordoñez, Carlos Jesus Gutierrez Rios, Carlos Alberto Ramos Godoy, Kristell Luisa Teresa Panta Quezada, Jesus Daniel Gutierrez-Arratia, Javier Alejandro Flores-Cohaila
J Educ Eval Health Prof. 2023;20:30.   Published online November 20, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.30
  • 4,304 View
  • 255 Download
  • 22 Web of Science
  • 29 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
We aimed to describe the performance and evaluate the educational value of justifications provided by artificial intelligence chatbots, including GPT-3.5, GPT-4, Bard, Claude, and Bing, on the Peruvian National Medical Licensing Examination (P-NLME).
Methods
This was a cross-sectional analytical study. On July 25, 2023, each multiple-choice question (MCQ) from the P-NLME was entered into each chatbot (GPT-3, GPT-4, Bing, Bard, and Claude) 3 times. Then, 4 medical educators categorized the MCQs in terms of medical area, item type, and whether the MCQ required Peru-specific knowledge. They assessed the educational value of the justifications from the 2 top performers (GPT-4 and Bing).
Results
GPT-4 scored 86.7% and Bing scored 82.2%, followed by Bard and Claude, and the historical performance of Peruvian examinees was 55%. Among the factors associated with correct answers, only MCQs that required Peru-specific knowledge had lower odds (odds ratio, 0.23; 95% confidence interval, 0.09–0.61), whereas the remaining factors showed no associations. In assessing the educational value of justifications provided by GPT-4 and Bing, neither showed any significant differences in certainty, usefulness, or potential use in the classroom.
Conclusion
Among chatbots, GPT-4 and Bing were the top performers, with Bing performing better at Peru-specific MCQs. Moreover, the educational value of justifications provided by the GPT-4 and Bing could be deemed appropriate. However, it is essential to start addressing the educational value of these chatbots, rather than merely their performance on examinations.

Citations

Citations to this article as recorded by  
  • PICOT questions and search strategies formulation: A novel approach using artificial intelligence automation
    Lucija Gosak, Gregor Štiglic, Lisiane Pruinelli, Dominika Vrbnjak
    Journal of Nursing Scholarship.2025; 57(1): 5.     CrossRef
  • Using large language models (ChatGPT, Copilot, PaLM, Bard, and Gemini) in Gross Anatomy course: Comparative analysis
    Volodymyr Mavrych, Paul Ganguly, Olena Bolgova
    Clinical Anatomy.2025; 38(2): 200.     CrossRef
  • Capable exam-taker and question-generator: the dual role of generative AI in medical education assessment
    Yihong Qiu, Chang Liu
    Global Medical Education.2025;[Epub]     CrossRef
  • Comparison of artificial intelligence systems in answering prosthodontics questions from the dental specialty exam in Turkey
    Busra Tosun, Zeynep Sen Yilmaz
    Journal of Dental Sciences.2025;[Epub]     CrossRef
  • Benchmarking LLM chatbots’ oncological knowledge with the Turkish Society of Medical Oncology’s annual board examination questions
    Efe Cem Erdat, Engin Eren Kavak
    BMC Cancer.2025;[Epub]     CrossRef
  • Evaluating the Performance of Large Language Models in Anatomy Education Advancing Anatomy Learning with ChatGPT-4o
    Fatma Ok, Burak Karip, Fulya Temizsoy Korkmaz
    European Journal of Therapeutics.2025; 31(1): 35.     CrossRef
  • Large Language Models in Biochemistry Education: Comparative Evaluation of Performance
    Olena Bolgova, Inna Shypilova, Volodymyr Mavrych
    JMIR Medical Education.2025; 11: e67244.     CrossRef
  • Attributional patterns toward students with and without learning disabilities: Artificial intelligence models vs. trainee teachers
    Inbar Levkovich, Eyal Rabin, Rania Hussein Farraj, Zohar Elyoseph
    Research in Developmental Disabilities.2025; 160: 104970.     CrossRef
  • The double-edged sword of generative AI: surpassing an expert or a deceptive “false friend”?
    Franziska C.S. Altorfer, Michael J. Kelly, Fedan Avrumova, Varun Rohatgi, Jiaqi Zhu, Christopher M. Bono, Darren R. Lebl
    The Spine Journal.2025;[Epub]     CrossRef
  • Claude, ChatGPT, Copilot, and Gemini performance versus students in different topics of neuroscience
    Volodymyr Mavrych, Ahmed Yaqinuddin, Olena Bolgova
    Advances in Physiology Education.2025; 49(2): 430.     CrossRef
  • Large Language Models Take on the AAMC Situational Judgment Test: Evaluating Dilemma-Based Scenarios
    Angelo Cadiente, Jamie Chen, Lora J. Kasselman, Bryan Pilkington
    International Journal of Artificial Intelligence in Education.2025;[Epub]     CrossRef
  • Validation of a generative artificial intelligence tool for the critical appraisal of articles on the epidemiology of mental health: Its application in the Middle East and North Africa
    Cheima Moussa, Sarah Altayyar, Marion Vergonjeanne, Thibaut Gelle, Pierre-Marie Preux
    Journal of Epidemiology and Population Health.2025; 73(2): 202990.     CrossRef
  • Temporal Association Between ChatGPT-Generated Diarrhea Synonyms in Internet Search Queries and Emergency Department Visits for Diarrhea-Related Symptoms in South Korea: Exploratory Study
    Jinsoo Kim, Ansun Jeong, Juseong Jin, Sangjun Lee, Do Kyoon Yoon, Soyeoun Kim
    Journal of Medical Internet Research.2025; 27: e65101.     CrossRef
  • Artificial intelligence (AI) performance on pharmacy skills laboratory course assignments
    Vivian Do, Krista L. Donohoe, Apryl N. Peddi, Eleanor Carr, Christina Kim, Virginia Mele, Dhruv Patel, Alexis N. Crawford
    Currents in Pharmacy Teaching and Learning.2025; 17(7): 102367.     CrossRef
  • Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis
    Ling Wang, Jinglin Li, Boyang Zhuang, Shasha Huang, Meilin Fang, Cunze Wang, Wen Li, Mohan Zhang, Shurong Gong
    Journal of Medical Internet Research.2025; 27: e64486.     CrossRef
  • Performance of artificial intelligence chatbots in National dental licensing examination
    Chad Chan-Chia Lin, Jui-Sheng Sun, Chin-Hao Chang, Yu-Han Chang, Jenny Zwei-Chieng Chang
    Journal of Dental Sciences.2025;[Epub]     CrossRef
  • Performance of GPT-4V in Answering the Japanese Otolaryngology Board Certification Examination Questions: Evaluation Study
    Masao Noda, Takayoshi Ueno, Ryota Koshu, Yuji Takaso, Mari Dias Shimada, Chizu Saito, Hisashi Sugimoto, Hiroaki Fushiki, Makoto Ito, Akihiro Nomura, Tomokazu Yoshizaki
    JMIR Medical Education.2024; 10: e57054.     CrossRef
  • Response to Letter to the Editor re: “Artificial Intelligence Versus Expert Plastic Surgeon: Comparative Study Shows ChatGPT ‘Wins' Rhinoplasty Consultations: Should We Be Worried? [1]” by Durairaj et al
    Kay Durairaj, Omer Baker
    Facial Plastic Surgery & Aesthetic Medicine.2024; 26(3): 276.     CrossRef
  • Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
    Xiaojun Xu, Yixiao Chen, Jing Miao
    Journal of Educational Evaluation for Health Professions.2024; 21: 6.     CrossRef
  • Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis
    Mingxin Liu, Tsuyoshi Okuhara, XinYi Chang, Ritsuko Shirabe, Yuriko Nishiie, Hiroko Okada, Takahiro Kiuchi
    Journal of Medical Internet Research.2024; 26: e60807.     CrossRef
  • Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study
    Giacomo Rossettini, Lia Rodeghiero, Federica Corradi, Chad Cook, Paolo Pillastrini, Andrea Turolla, Greta Castellini, Stefania Chiappinotto, Silvia Gianola, Alvisa Palese
    BMC Medical Education.2024;[Epub]     CrossRef
  • Evaluating the competency of ChatGPT in MRCP Part 1 and a systematic literature review of its capabilities in postgraduate medical assessments
    Oliver Vij, Henry Calver, Nikki Myall, Mrinalini Dey, Koushan Kouranloo, Thiago P. Fernandes
    PLOS ONE.2024; 19(7): e0307372.     CrossRef
  • Large Language Models in Pediatric Education: Current Uses and Future Potential
    Srinivasan Suresh, Sanghamitra M. Misra
    Pediatrics.2024;[Epub]     CrossRef
  • Comparison of the Performance of ChatGPT, Claude and Bard in Support of Myopia Prevention and Control
    Yan Wang, Lihua Liang, Ran Li, Yihua Wang, Changfu Hao
    Journal of Multidisciplinary Healthcare.2024; Volume 17: 3917.     CrossRef
  • Evaluating Large Language Models in Dental Anesthesiology: A Comparative Analysis of ChatGPT-4, Claude 3 Opus, and Gemini 1.0 on the Japanese Dental Society of Anesthesiology Board Certification Exam
    Misaki Fujimoto, Hidetaka Kuroda, Tomomi Katayama, Atsuki Yamaguchi, Norika Katagiri, Keita Kagawa, Shota Tsukimoto, Akito Nakano, Uno Imaizumi, Aiji Sato-Boku, Naotaka Kishimoto, Tomoki Itamiya, Kanta Kido, Takuro Sanuki
    Cureus.2024;[Epub]     CrossRef
  • Dermatological Knowledge and Image Analysis Performance of Large Language Models Based on Specialty Certificate Examination in Dermatology
    Ka Siu Fan, Ka Hay Fan
    Dermato.2024; 4(4): 124.     CrossRef
  • ChatGPT and Other Large Language Models in Medical Education — Scoping Literature Review
    Alexandra Aster, Matthias Carl Laupichler, Tamina Rockwell-Kollmann, Gilda Masala, Ebru Bala, Tobias Raupach
    Medical Science Educator.2024; 35(1): 555.     CrossRef
  • Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study
    Yikai Chen, Xiujie Huang, Fangjie Yang, Haiming Lin, Haoyu Lin, Zhuoqun Zheng, Qifeng Liang, Jinhai Zhang, Xinxin Li
    BMC Medical Education.2024;[Epub]     CrossRef
  • Information amount, accuracy, and relevance of generative artificial intelligence platforms’ answers regarding learning objectives of medical arthropodology evaluated in English and Korean queries in December 2023: a descriptive study
    Hyunju Lee, Soobin Park
    Journal of Educational Evaluation for Health Professions.2023; 20: 39.     CrossRef
Medical students’ patterns of using ChatGPT as a feedback tool and perceptions of ChatGPT in a Leadership and Communication course in Korea: a cross-sectional study  
Janghee Park
J Educ Eval Health Prof. 2023;20:29.   Published online November 10, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.29
  • 4,332 View
  • 279 Download
  • 10 Web of Science
  • 12 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to analyze patterns of using ChatGPT before and after group activities and to explore medical students’ perceptions of ChatGPT as a feedback tool in the classroom.
Methods
The study included 99 2nd-year pre-medical students who participated in a “Leadership and Communication” course from March to June 2023. Students engaged in both individual and group activities related to negotiation strategies. ChatGPT was used to provide feedback on their solutions. A survey was administered to assess students’ perceptions of ChatGPT’s feedback, its use in the classroom, and the strengths and challenges of ChatGPT from May 17 to 19, 2023.
Results
The students responded by indicating that ChatGPT’s feedback was helpful, and revised and resubmitted their group answers in various ways after receiving feedback. The majority of respondents expressed agreement with the use of ChatGPT during class. The most common response concerning the appropriate context of using ChatGPT’s feedback was “after the first round of discussion, for revisions.” There was a significant difference in satisfaction with ChatGPT’s feedback, including correctness, usefulness, and ethics, depending on whether or not ChatGPT was used during class, but there was no significant difference according to gender or whether students had previous experience with ChatGPT. The strongest advantages were “providing answers to questions” and “summarizing information,” and the worst disadvantage was “producing information without supporting evidence.”
Conclusion
The students were aware of the advantages and disadvantages of ChatGPT, and they had a positive attitude toward using ChatGPT in the classroom.

Citations

Citations to this article as recorded by  
  • Higher education students’ perceptions of ChatGPT: A global study of early reactions
    Dejan Ravšelj, Damijana Keržič, Nina Tomaževič, Lan Umek, Nejc Brezovar, Noorminshah A. Iahad, Ali Abdulla Abdulla, Anait Akopyan, Magdalena Waleska Aldana Segura, Jehan AlHumaid, Mohamed Farouk Allam, Maria Alló, Raphael Papa Kweku Andoh, Octavian Andron
    PLOS ONE.2025; 20(2): e0315011.     CrossRef
  • Generative AI in Otolaryngology Residency Personal Statement Writing: A Mixed‐Methods Analysis
    Jacob G. J. Wihlidal, Nikolaus E. Wolter, Evan J. Propst, Vincent Lin, Michael Au, Shaunak Amin, Jennifer M. Siu
    The Laryngoscope.2025;[Epub]     CrossRef
  • Applications of Artificial Intelligence for Nonpsychomotor Skills Training in Health Professions Education: A Scoping Review
    Kenya A. Costa-Dookhan, Zachary Adirim, Marta Maslej, Kayle Donner, Terri Rodak, Sophie Soklaridis, Sanjeev Sockalingam, Anupam Thakur
    Academic Medicine.2025; 100(5): 635.     CrossRef
  • Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
    Xiaojun Xu, Yixiao Chen, Jing Miao
    Journal of Educational Evaluation for Health Professions.2024; 21: 6.     CrossRef
  • Embracing ChatGPT for Medical Education: Exploring Its Impact on Doctors and Medical Students
    Yijun Wu, Yue Zheng, Baijie Feng, Yuqi Yang, Kai Kang, Ailin Zhao
    JMIR Medical Education.2024; 10: e52483.     CrossRef
  • Integration of ChatGPT Into a Course for Medical Students: Explorative Study on Teaching Scenarios, Students’ Perception, and Applications
    Anita V Thomae, Claudia M Witt, Jürgen Barth
    JMIR Medical Education.2024; 10: e50545.     CrossRef
  • A cross sectional investigation of ChatGPT-like large language models application among medical students in China
    Guixia Pan, Jing Ni
    BMC Medical Education.2024;[Epub]     CrossRef
  • A Pilot Study of Medical Student Opinions on Large Language Models
    Alan Y Xu, Vincent S Piranio, Skye Speakman, Chelsea D Rosen, Sally Lu, Chris Lamprecht, Robert E Medina, Maisha Corrielus, Ian T Griffin, Corinne E Chatham, Nicolas J Abchee, Daniel Stribling, Phuong B Huynh, Heather Harrell, Benjamin Shickel, Meghan Bre
    Cureus.2024;[Epub]     CrossRef
  • The intent of ChatGPT usage and its robustness in medical proficiency exams: a systematic review
    Tatiana Chaiban, Zeinab Nahle, Ghaith Assi, Michelle Cherfane
    Discover Education.2024;[Epub]     CrossRef
  • Feasibility of a Randomised Controlled Trial of Large Artificial Intelligence-Based Linguistic Models for Clinical Reasoning Training of Physical Therapy Students. A Pilot Study (Preprint)
    Raúl Ferrer Peña, Silvia Di Bonaventura, Alberto Pérez González, Alfredo Lerín Calvo
    JMIR Formative Research.2024;[Epub]     CrossRef
  • ChatGPT and Clinical Training: Perception, Concerns, and Practice of Pharm-D Students
    Mohammed Zawiah, Fahmi Al-Ashwal, Lobna Gharaibeh, Rana Abu Farha, Karem Alzoubi, Khawla Abu Hammour, Qutaiba A Qasim, Fahd Abrah
    Journal of Multidisciplinary Healthcare.2023; Volume 16: 4099.     CrossRef
  • Information amount, accuracy, and relevance of generative artificial intelligence platforms’ answers regarding learning objectives of medical arthropodology evaluated in English and Korean queries in December 2023: a descriptive study
    Hyunju Lee, Soobin Park
    Journal of Educational Evaluation for Health Professions.2023; 20: 39.     CrossRef
Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study  
Aleksandra Ignjatović, Lazar Stevanović
J Educ Eval Health Prof. 2023;20:28.   Published online October 16, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.28
  • 5,392 View
  • 245 Download
  • 14 Web of Science
  • 17 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to assess the performance of ChatGPT (GPT-3.5 and GPT-4) as a study tool in solving biostatistical problems and to identify any potential drawbacks that might arise from using ChatGPT in medical education, particularly in solving practical biostatistical problems.
Methods
ChatGPT was tested to evaluate its ability to solve biostatistical problems from the Handbook of Medical Statistics by Peacock and Peacock in this descriptive study. Tables from the problems were transformed into textual questions. Ten biostatistical problems were randomly chosen and used as text-based input for conversation with ChatGPT (versions 3.5 and 4).
Results
GPT-3.5 solved 5 practical problems in the first attempt, related to categorical data, cross-sectional study, measuring reliability, probability properties, and the t-test. GPT-3.5 failed to provide correct answers regarding analysis of variance, the chi-square test, and sample size within 3 attempts. GPT-4 also solved a task related to the confidence interval in the first attempt and solved all questions within 3 attempts, with precise guidance and monitoring.
Conclusion
The assessment of both versions of ChatGPT performance in 10 biostatistical problems revealed that GPT-3.5 and 4’s performance was below average, with correct response rates of 5 and 6 out of 10 on the first attempt. GPT-4 succeeded in providing all correct answers within 3 attempts. These findings indicate that students must be aware that this tool, even when providing and calculating different statistical analyses, can be wrong, and they should be aware of ChatGPT’s limitations and be careful when incorporating this model into medical education.

Citations

Citations to this article as recorded by  
  • Can Generative AI and ChatGPT Outperform Humans on Cognitive-Demanding Problem-Solving Tasks in Science?
    Xiaoming Zhai, Matthew Nyaaba, Wenchao Ma
    Science & Education.2025; 34(2): 649.     CrossRef
  • From statistics to deep learning: Using large language models in psychiatric research
    Yining Hua, Andrew Beam, Lori B. Chibnik, John Torous
    International Journal of Methods in Psychiatric Research.2025;[Epub]     CrossRef
  • Assessing the Current Limitations of Large Language Models in Advancing Health Care Education
    JaeYong Kim, Bathri Narayan Vajravelu
    JMIR Formative Research.2025; 9: e51319.     CrossRef
  • ChatGPT for Univariate Statistics: Validation of AI-Assisted Data Analysis in Healthcare Research
    Michael R Ruta, Tony Gaidici, Chase Irwin, Jonathan Lifshitz
    Journal of Medical Internet Research.2025; 27: e63550.     CrossRef
  • ChatGPT-Assisted Deep Learning Models for Influenza-Like Illness Prediction in Mainland China (Preprint)
    Weihong Huang, Wudi Wei, Xiaotao He, Baili Zhan, Xiaoting Xie, Meng Zhang, Shiyi Lai, Zongxiang Yuan, Jingzhen Lai, Rongfeng Chen, Junjun Jiang, Li Ye, Hao Liang
    Journal of Medical Internet Research.2025;[Epub]     CrossRef
  • Confirming SPSS Results With ChatGPT-4 and o3-mini Models
    Frederick Strale, Isaac Riddle, Bowen Geng, Blake Oxford, Malia Kah, Robert Sherwin
    Cureus.2025;[Epub]     CrossRef
  • A whole new world, a new fantastic point of view: Charting unexplored territories in consumer research with generative artificial intelligence
    Kiwoong Yoo, Michael Haenlein, Kelly Hewett
    Journal of the Academy of Marketing Science.2025;[Epub]     CrossRef
  • One year in the classroom with ChatGPT: empirical insights and transformative impacts
    Feng Guo, Tian Li, Christopher J. L. Cunningham
    Frontiers in Education.2025;[Epub]     CrossRef
  • Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
    Xiaojun Xu, Yixiao Chen, Jing Miao
    Journal of Educational Evaluation for Health Professions.2024; 21: 6.     CrossRef
  • Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom’s Taxonomy
    Ambadasu Bharatha, Nkemcho Ojeh, Ahbab Mohammad Fazle Rabbi, Michael Campbell, Kandamaran Krishnamurthy, Rhaheem Layne-Yarde, Alok Kumar, Dale Springer, Kenneth Connell, Md Anwarul Majumder
    Advances in Medical Education and Practice.2024; Volume 15: 393.     CrossRef
  • Revolutionizing Cardiology With Words: Unveiling the Impact of Large Language Models in Medical Science Writing
    Abhijit Bhattaru, Naveena Yanamala, Partho P. Sengupta
    Canadian Journal of Cardiology.2024; 40(10): 1950.     CrossRef
  • ChatGPT in medicine: prospects and challenges: a review article
    Songtao Tan, Xin Xin, Di Wu
    International Journal of Surgery.2024; 110(6): 3701.     CrossRef
  • In-depth analysis of ChatGPT’s performance based on specific signaling words and phrases in the question stem of 2377 USMLE step 1 style questions
    Leonard Knoedler, Samuel Knoedler, Cosima C. Hoch, Lukas Prantl, Konstantin Frank, Laura Soiderer, Sebastian Cotofana, Amir H. Dorafshar, Thilo Schenck, Felix Vollbach, Giuseppe Sofo, Michael Alfertshofer
    Scientific Reports.2024;[Epub]     CrossRef
  • Evaluating the quality of responses generated by ChatGPT
    Danimir Mandić, Gordana Miščević, Ljiljana Bujišić
    Metodicka praksa.2024; 27(1): 5.     CrossRef
  • A Comparative Evaluation of Statistical Product and Service Solutions (SPSS) and ChatGPT-4 in Statistical Analyses
    Al Imran Shahrul, Alizae Marny F Syed Mohamed
    Cureus.2024;[Epub]     CrossRef
  • ChatGPT and Other Large Language Models in Medical Education — Scoping Literature Review
    Alexandra Aster, Matthias Carl Laupichler, Tamina Rockwell-Kollmann, Gilda Masala, Ebru Bala, Tobias Raupach
    Medical Science Educator.2024; 35(1): 555.     CrossRef
  • Exploring the potential of large language models for integration into an academic statistical consulting service–the EXPOLS study protocol
    Urs Alexander Fichtner, Jochen Knaus, Erika Graf, Georg Koch, Jörg Sahlmann, Dominikus Stelzer, Martin Wolkewitz, Harald Binder, Susanne Weber, Bekalu Tadesse Moges
    PLOS ONE.2024; 19(12): e0308375.     CrossRef
Brief report
Comparing ChatGPT’s ability to rate the degree of stereotypes and the consistency of stereotype attribution with those of medical students in New Zealand in developing a similarity rating test: a methodological study  
Chao-Cheng Lin, Zaine Akuhata-Huntington, Che-Wei Hsu
J Educ Eval Health Prof. 2023;20:17.   Published online June 12, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.17
  • 3,891 View
  • 167 Download
  • 4 Web of Science
  • 6 Crossref
AbstractAbstract PDFSupplementary Material
Learning about one’s implicit bias is crucial for improving one’s cultural competency and thereby reducing health inequity. To evaluate bias among medical students following a previously developed cultural training program targeting New Zealand Māori, we developed a text-based, self-evaluation tool called the Similarity Rating Test (SRT). The development process of the SRT was resource-intensive, limiting its generalizability and applicability. Here, we explored the potential of ChatGPT, an automated chatbot, to assist in the development process of the SRT by comparing ChatGPT’s and students’ evaluations of the SRT. Despite results showing non-significant equivalence and difference between ChatGPT’s and students’ ratings, ChatGPT’s ratings were more consistent than students’ ratings. The consistency rate was higher for non-stereotypical than for stereotypical statements, regardless of rater type. Further studies are warranted to validate ChatGPT’s potential for assisting in SRT development for implementation in medical education and evaluation of ethnic stereotypes and related topics.

Citations

Citations to this article as recorded by  
  • Applications of Artificial Intelligence in Medical Education: A Systematic Review
    Eric Hallquist, Ishank Gupta, Michael Montalbano, Marios Loukas
    Cureus.2025;[Epub]     CrossRef
  • One year in the classroom with ChatGPT: empirical insights and transformative impacts
    Feng Guo, Tian Li, Christopher J. L. Cunningham
    Frontiers in Education.2025;[Epub]     CrossRef
  • The Performance of ChatGPT on Short-answer Questions in a Psychiatry Examination: A Pilot Study
    Chao-Cheng Lin, Kobus du Plooy, Andrew Gray, Deirdre Brown, Linda Hobbs, Tess Patterson, Valerie Tan, Daniel Fridberg, Che-Wei Hsu
    Taiwanese Journal of Psychiatry.2024; 38(2): 94.     CrossRef
  • ChatGPT and Other Large Language Models in Medical Education — Scoping Literature Review
    Alexandra Aster, Matthias Carl Laupichler, Tamina Rockwell-Kollmann, Gilda Masala, Ebru Bala, Tobias Raupach
    Medical Science Educator.2024; 35(1): 555.     CrossRef
  • Psychiatric Care, Training and Research in Aotearoa New Zealand
    Chao-Cheng (Chris) Lin, Charlotte Mentzel, Maria Luz C. Querubin
    Taiwanese Journal of Psychiatry.2024; 38(4): 161.     CrossRef
  • Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study
    Aleksandra Ignjatović, Lazar Stevanović
    Journal of Educational Evaluation for Health Professions.2023; 20: 28.     CrossRef

JEEHP : Journal of Educational Evaluation for Health Professions
TOP