Skip Navigation
Skip to contents

JEEHP : Journal of Educational Evaluation for Health Professions

OPEN ACCESS
SEARCH
Search

Search

Page Path
HOME > Search
18 "Artificial intelligence"
Filter
Filter
Article category
Keywords
Publication year
Authors
Funded articles
Educational/Faculty development material
The role of large language models in the peer-review process: opportunities and challenges for medical journal reviewers and editors
Jisoo Lee, Jieun Lee, Jeong-Ju Yoo
J Educ Eval Health Prof. 2025;22:4.   Published online January 16, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.4    [Epub ahead of print]
  • 809 View
  • 117 Download
AbstractAbstract PDF
The peer review process ensures the integrity of scientific research. This is particularly important in the medical field, where research findings directly impact patient care. However, the rapid growth of publications has strained reviewers, causing delays and potential declines in quality. Generative artificial intelligence, especially large language models (LLMs) such as ChatGPT, may assist researchers with efficient, high-quality reviews. This review explores the integration of LLMs into peer review, highlighting their strengths in linguistic tasks and challenges in assessing scientific validity, particularly in clinical medicine. Key points for integration include initial screening, reviewer matching, feedback support, and language review. However, implementing LLMs for these purposes will necessitate addressing biases, privacy concerns, and data confidentiality. We recommend using LLMs as complementary tools under clear guidelines to support, not replace, human expertise in maintaining rigorous peer review standards.
Research article
Effectiveness of ChatGPT-4o in developing continuing professional development plans for graduate radiographers: a descriptive study  
Minh Chau, Elio Stefan Arruzza, Kelly Spuur
J Educ Eval Health Prof. 2024;21:34.   Published online November 18, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.34
  • 954 View
  • 163 Download
  • 1 Web of Science
  • 1 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study evaluates the use of ChatGPT-4o in creating tailored continuing professional development (CPD) plans for radiography students, addressing the challenge of aligning CPD with Medical Radiation Practice Board of Australia (MRPBA) requirements. We hypothesized that ChatGPT-4o could support students in CPD planning while meeting regulatory standards.
Methods
A descriptive, experimental design was used to generate 3 unique CPD plans using ChatGPT-4o, each tailored to hypothetical graduate radiographers in varied clinical settings. Each plan followed MRPBA guidelines, focusing on computed tomography specialization by the second year. Three MRPBA-registered academics assessed the plans using criteria of appropriateness, timeliness, relevance, reflection, and completeness from October 2024 to November 2024. Ratings underwent analysis using the Friedman test and intraclass correlation coefficient (ICC) to measure consistency among evaluators.
Results
ChatGPT-4o generated CPD plans generally adhered to regulatory standards across scenarios. The Friedman test indicated no significant differences among raters (P=0.420, 0.761, and 0.807 for each scenario), suggesting consistent scores within scenarios. However, ICC values were low (–0.96, 0.41, and 0.058 for scenarios 1, 2, and 3), revealing variability among raters, particularly in timeliness and completeness criteria, suggesting limitations in the ChatGPT-4o’s ability to address individualized and context-specific needs.
Conclusion
ChatGPT-4o demonstrates the potential to ease the cognitive demands of CPD planning, offering structured support in CPD development. However, human oversight remains essential to ensure plans are contextually relevant and deeply reflective. Future research should focus on enhancing artificial intelligence’s personalization for CPD evaluation, highlighting ChatGPT-4o’s potential and limitations as a tool in professional education.

Citations

Citations to this article as recorded by  
  • Halted medical education and medical residents’ training in Korea, journal metrics, and appreciation to reviewers and volunteers
    Sun Huh
    Journal of Educational Evaluation for Health Professions.2025; 22: 1.     CrossRef
Educational/Faculty development material
The performance of ChatGPT-4.0o in medical imaging evaluation: a cross-sectional study  
Elio Stefan Arruzza, Carla Marie Evangelista, Minh Chau
J Educ Eval Health Prof. 2024;21:29.   Published online October 31, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.29
  • 1,380 View
  • 243 Download
  • 2 Web of Science
  • 2 Crossref
AbstractAbstract PDFSupplementary Material
This study investigated the performance of ChatGPT-4.0o in evaluating the quality of positioning in radiographic images. Thirty radiographs depicting a variety of knee, elbow, ankle, hand, pelvis, and shoulder projections were produced using anthropomorphic phantoms and uploaded to ChatGPT-4.0o. The model was prompted to provide a solution to identify any positioning errors with justification and offer improvements. A panel of radiographers assessed the solutions for radiographic quality based on established positioning criteria, with a grading scale of 1–5. In only 20% of projections, ChatGPT-4.0o correctly recognized all errors with justifications and offered correct suggestions for improvement. The most commonly occurring score was 3 (9 cases, 30%), wherein the model recognized at least 1 specific error and provided a correct improvement. The mean score was 2.9. Overall, low accuracy was demonstrated, with most projections receiving only partially correct solutions. The findings reinforce the importance of robust radiography education and clinical experience.

Citations

Citations to this article as recorded by  
  • Conversational LLM Chatbot ChatGPT-4 for Colonoscopy Boston Bowel Preparation Scoring: An Artificial Intelligence-to-Head Concordance Analysis
    Raffaele Pellegrino, Alessandro Federico, Antonietta Gerarda Gravina
    Diagnostics.2024; 14(22): 2537.     CrossRef
  • Effectiveness of ChatGPT-4o in developing continuing professional development plans for graduate radiographers: a descriptive study
    Minh Chau, Elio Stefan Arruzza, Kelly Spuur
    Journal of Educational Evaluation for Health Professions.2024; 21: 34.     CrossRef
Research articles
GPT-4o’s competency in answering the simulated written European Board of Interventional Radiology exam compared to a medical student and experts in Germany and its ability to generate exam items on interventional radiology: a descriptive study
Sebastian Ebel, Constantin Ehrengut, Timm Denecke, Holger Gößmann, Anne Bettina Beeskow
J Educ Eval Health Prof. 2024;21:21.   Published online August 20, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.21
  • 1,372 View
  • 300 Download
  • 3 Web of Science
  • 3 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to determine whether ChatGPT-4o, a generative artificial intelligence (AI) platform, was able to pass a simulated written European Board of Interventional Radiology (EBIR) exam and whether GPT-4o can be used to train medical students and interventional radiologists of different levels of expertise by generating exam items on interventional radiology.
Methods
GPT-4o was asked to answer 370 simulated exam items of the Cardiovascular and Interventional Radiology Society of Europe (CIRSE) for EBIR preparation (CIRSE Prep). Subsequently, GPT-4o was requested to generate exam items on interventional radiology topics at levels of difficulty suitable for medical students and the EBIR exam. Those generated items were answered by 4 participants, including a medical student, a resident, a consultant, and an EBIR holder. The correctly answered items were counted. One investigator checked the answers and items generated by GPT-4o for correctness and relevance. This work was done from April to July 2024.
Results
GPT-4o correctly answered 248 of the 370 CIRSE Prep items (67.0%). For 50 CIRSE Prep items, the medical student answered 46.0%, the resident 42.0%, the consultant 50.0%, and the EBIR holder 74.0% correctly. All participants answered 82.0% to 92.0% of the 50 GPT-4o generated items at the student level correctly. For the 50 GPT-4o items at the EBIR level, the medical student answered 32.0%, the resident 44.0%, the consultant 48.0%, and the EBIR holder 66.0% correctly. All participants could pass the GPT-4o-generated items for the student level; while the EBIR holder could pass the GPT-4o-generated items for the EBIR level. Two items (0.3%) out of 150 generated by the GPT-4o were assessed as implausible.
Conclusion
GPT-4o could pass the simulated written EBIR exam and create exam items of varying difficulty to train medical students and interventional radiologists.

Citations

Citations to this article as recorded by  
  • Evaluating the performance of ChatGPT in patient consultation and image-based preliminary diagnosis in thyroid eye disease
    Yue Wang, Shuo Yang, Chengcheng Zeng, Yingwei Xie, Ya Shen, Jian Li, Xiao Huang, Ruili Wei, Yuqing Chen
    Frontiers in Medicine.2025;[Epub]     CrossRef
  • From GPT-3.5 to GPT-4.o: A Leap in AI’s Medical Exam Performance
    Markus Kipp
    Information.2024; 15(9): 543.     CrossRef
  • Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study
    Yikai Chen, Xiujie Huang, Fangjie Yang, Haiming Lin, Haoyu Lin, Zhuoqun Zheng, Qifeng Liang, Jinhai Zhang, Xinxin Li
    BMC Medical Education.2024;[Epub]     CrossRef
Performance of GPT-3.5 and GPT-4 on standardized urology knowledge assessment items in the United States: a descriptive study
Max Samuel Yudovich, Elizaveta Makarova, Christian Michael Hague, Jay Dilip Raman
J Educ Eval Health Prof. 2024;21:17.   Published online July 8, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.17
  • 2,400 View
  • 319 Download
  • 3 Web of Science
  • 4 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to evaluate the performance of Chat Generative Pre-Trained Transformer (ChatGPT) with respect to standardized urology multiple-choice items in the United States.
Methods
In total, 700 multiple-choice urology board exam-style items were submitted to GPT-3.5 and GPT-4, and responses were recorded. Items were categorized based on topic and question complexity (recall, interpretation, and problem-solving). The accuracy of GPT-3.5 and GPT-4 was compared across item types in February 2024.
Results
GPT-4 answered 44.4% of items correctly compared to 30.9% for GPT-3.5 (P<0.00001). GPT-4 (vs. GPT-3.5) had higher accuracy with urologic oncology (43.8% vs. 33.9%, P=0.03), sexual medicine (44.3% vs. 27.8%, P=0.046), and pediatric urology (47.1% vs. 27.1%, P=0.012) items. Endourology (38.0% vs. 25.7%, P=0.15), reconstruction and trauma (29.0% vs. 21.0%, P=0.41), and neurourology (49.0% vs. 33.3%, P=0.11) items did not show significant differences in performance across versions. GPT-4 also outperformed GPT-3.5 with respect to recall (45.9% vs. 27.4%, P<0.00001), interpretation (45.6% vs. 31.5%, P=0.0005), and problem-solving (41.8% vs. 34.5%, P=0.56) type items. This difference was not significant for the higher-complexity items.
Conclusions
ChatGPT performs relatively poorly on standardized multiple-choice urology board exam-style items, with GPT-4 outperforming GPT-3.5. The accuracy was below the proposed minimum passing standards for the American Board of Urology’s Continuing Urologic Certification knowledge reinforcement activity (60%). As artificial intelligence progresses in complexity, ChatGPT may become more capable and accurate with respect to board examination items. For now, its responses should be scrutinized.

Citations

Citations to this article as recorded by  
  • Evaluating the Performance of ChatGPT4.0 Versus ChatGPT3.5 on the Hand Surgery Self-Assessment Exam: A Comparative Analysis of Performance on Image-Based Questions
    Kiera L Vrindten, Megan Hsu, Yuri Han, Brian Rust, Heili Truumees, Brian M Katt
    Cureus.2025;[Epub]     CrossRef
  • Assessing the performance of large language models (GPT-3.5 and GPT-4) and accurate clinical information for pediatric nephrology
    Nadide Melike Sav
    Pediatric Nephrology.2025;[Epub]     CrossRef
  • From GPT-3.5 to GPT-4.o: A Leap in AI’s Medical Exam Performance
    Markus Kipp
    Information.2024; 15(9): 543.     CrossRef
  • Artificial Intelligence can Facilitate Application of Risk Stratification Algorithms to Bladder Cancer Patient Case Scenarios
    Max S Yudovich, Ahmad N Alzubaidi, Jay D Raman
    Clinical Medicine Insights: Oncology.2024;[Epub]     CrossRef
Review
Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review  
Xiaojun Xu, Yixiao Chen, Jing Miao
J Educ Eval Health Prof. 2024;21:6.   Published online March 15, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.6
  • 7,103 View
  • 604 Download
  • 14 Web of Science
  • 18 Crossref
AbstractAbstract PDFSupplementary Material
Background
ChatGPT is a large language model (LLM) based on artificial intelligence (AI) capable of responding in multiple languages and generating nuanced and highly complex responses. While ChatGPT holds promising applications in medical education, its limitations and potential risks cannot be ignored.
Methods
A scoping review was conducted for English articles discussing ChatGPT in the context of medical education published after 2022. A literature search was performed using PubMed/MEDLINE, Embase, and Web of Science databases, and information was extracted from the relevant studies that were ultimately included.
Results
ChatGPT exhibits various potential applications in medical education, such as providing personalized learning plans and materials, creating clinical practice simulation scenarios, and assisting in writing articles. However, challenges associated with academic integrity, data accuracy, and potential harm to learning were also highlighted in the literature. The paper emphasizes certain recommendations for using ChatGPT, including the establishment of guidelines. Based on the review, 3 key research areas were proposed: cultivating the ability of medical students to use ChatGPT correctly, integrating ChatGPT into teaching activities and processes, and proposing standards for the use of AI by medical students.
Conclusion
ChatGPT has the potential to transform medical education, but careful consideration is required for its full integration. To harness the full potential of ChatGPT in medical education, attention should not only be given to the capabilities of AI but also to its impact on students and teachers.

Citations

Citations to this article as recorded by  
  • AI-assisted patient education: Challenges and solutions in pediatric kidney transplantation
    MZ Ihsan, Dony Apriatama, Pithriani, Riza Amalia
    Patient Education and Counseling.2025; 131: 108575.     CrossRef
  • Exploring predictors of AI chatbot usage intensity among students: Within- and between-person relationships based on the technology acceptance model
    Anne-Kathrin Kleine, Insa Schaffernak, Eva Lermer
    Computers in Human Behavior: Artificial Humans.2025; 3: 100113.     CrossRef
  • AI-powered standardised patients: evaluating ChatGPT-4o’s impact on clinical case management in intern physicians
    Selcen Öncü, Fulya Torun, Hilal Hatice Ülkü
    BMC Medical Education.2025;[Epub]     CrossRef
  • UsmleGPT: An AI application for developing MCQs via multi-agent system
    Zhehan Jiang, Shicong Feng
    Software Impacts.2025; 23: 100742.     CrossRef
  • ChatGPT’s Performance on Portuguese Medical Examination Questions: Comparative Analysis of ChatGPT-3.5 Turbo and ChatGPT-4o Mini
    Filipe Prazeres
    JMIR Medical Education.2025; 11: e65108.     CrossRef
  • Chatbots in neurology and neuroscience: Interactions with students, patients and neurologists
    Stefano Sandrone
    Brain Disorders.2024; 15: 100145.     CrossRef
  • ChatGPT in education: unveiling frontiers and future directions through systematic literature review and bibliometric analysis
    Buddhini Amarathunga
    Asian Education and Development Studies.2024; 13(5): 412.     CrossRef
  • Evaluating the performance of ChatGPT-3.5 and ChatGPT-4 on the Taiwan plastic surgery board examination
    Ching-Hua Hsieh, Hsiao-Yun Hsieh, Hui-Ping Lin
    Heliyon.2024; 10(14): e34851.     CrossRef
  • Preparing for Artificial General Intelligence (AGI) in Health Professions Education: AMEE Guide No. 172
    Ken Masters, Anne Herrmann-Werner, Teresa Festl-Wietek, David Taylor
    Medical Teacher.2024; 46(10): 1258.     CrossRef
  • A Comparative Analysis of ChatGPT and Medical Faculty Graduates in Medical Specialization Exams: Uncovering the Potential of Artificial Intelligence in Medical Education
    Gülcan Gencer, Kerem Gencer
    Cureus.2024;[Epub]     CrossRef
  • Research ethics and issues regarding the use of ChatGPT-like artificial intelligence platforms by authors and reviewers: a narrative review
    Sang-Jun Kim
    Science Editing.2024; 11(2): 96.     CrossRef
  • Innovation Off the Bat: Bridging the ChatGPT Gap in Digital Competence among English as a Foreign Language Teachers
    Gulsara Urazbayeva, Raisa Kussainova, Aikumis Aibergen, Assel Kaliyeva, Gulnur Kantayeva
    Education Sciences.2024; 14(9): 946.     CrossRef
  • Exploring the perceptions of Chinese pre-service teachers on the integration of generative AI in English language teaching: Benefits, challenges, and educational implications
    Ji Young Chung, Seung-Hoon Jeong
    Online Journal of Communication and Media Technologies.2024; 14(4): e202457.     CrossRef
  • Unveiling the bright side and dark side of AI-based ChatGPT : a bibliographic and thematic approach
    Chandan Kumar Tiwari, Mohd. Abass Bhat, Abel Dula Wedajo, Shagufta Tariq Khan
    Journal of Decision Systems.2024; : 1.     CrossRef
  • Artificial Intelligence in Medical Education and Mentoring in Rehabilitation Medicine
    Julie K. Silver, Mustafa Reha Dodurgali, Nara Gavini
    American Journal of Physical Medicine & Rehabilitation.2024; 103(11): 1039.     CrossRef
  • The Potential of Artificial Intelligence Tools for Reducing Uncertainty in Medicine and Directions for Medical Education
    Sauliha Rabia Alli, Soaad Qahhār Hossain, Sunit Das, Ross Upshur
    JMIR Medical Education.2024; 10: e51446.     CrossRef
  • A Systematic Literature Review of Empirical Research on Applying Generative Artificial Intelligence in Education
    Xin Zhang, Peng Zhang, Yuan Shen, Min Liu, Qiong Wang, Dragan Gašević, Yizhou Fan
    Frontiers of Digital Education.2024; 1(3): 223.     CrossRef
  • Artificial intelligence in medical problem-based learning: opportunities and challenges
    Yaoxing Chen, Hong Qi, Yu Qiu, Juan Li, Liang Zhu, Xiaoling Gao, Hao Wang, Gan Jiang
    Global Medical Education.2024;[Epub]     CrossRef
Research articles
ChatGPT (GPT-4) passed the Japanese National License Examination for Pharmacists in 2022, answering all items including those with diagrams: a descriptive study  
Hiroyasu Sato, Katsuhiko Ogasawara
J Educ Eval Health Prof. 2024;21:4.   Published online February 28, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.4
  • 2,938 View
  • 288 Download
  • 5 Web of Science
  • 8 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
The objective of this study was to assess the performance of ChatGPT (GPT-4) on all items, including those with diagrams, in the Japanese National License Examination for Pharmacists (JNLEP) and compare it with the previous GPT-3.5 model’s performance.
Methods
The 107th JNLEP, conducted in 2022, with 344 items input into the GPT-4 model, was targeted for this study. Separately, 284 items, excluding those with diagrams, were entered into the GPT-3.5 model. The answers were categorized and analyzed to determine accuracy rates based on categories, subjects, and presence or absence of diagrams. The accuracy rates were compared to the main passing criteria (overall accuracy rate ≥62.9%).
Results
The overall accuracy rate for all items in the 107th JNLEP in GPT-4 was 72.5%, successfully meeting all the passing criteria. For the set of items without diagrams, the accuracy rate was 80.0%, which was significantly higher than that of the GPT-3.5 model (43.5%). The GPT-4 model demonstrated an accuracy rate of 36.1% for items that included diagrams.
Conclusion
Advancements that allow GPT-4 to process images have made it possible for LLMs to answer all items in medical-related license examinations. This study’s findings confirm that ChatGPT (GPT-4) possesses sufficient knowledge to meet the passing criteria.

Citations

Citations to this article as recorded by  
  • Qwen-2.5 Outperforms Other Large Language Models in the Chinese National Nursing Licensing Examination: Retrospective Cross-Sectional Comparative Study
    Shiben Zhu, Wanqin Hu, Zhi Yang, Jiani Yan, Fang Zhang
    JMIR Medical Informatics.2025; 13: e63731.     CrossRef
  • ChatGPT (GPT-4V) Performance on the Healthcare Information Technologist Examination in Japan
    Kai Ishida, Eisuke Hanada
    Cureus.2025;[Epub]     CrossRef
  • Medication counseling for OTC drugs using customized ChatGPT-4: Comparison with ChatGPT-3.5 and ChatGPT-4o
    Keisuke Kiyomiya, Tohru Aomori, Hisakazu Ohtani
    DIGITAL HEALTH.2025;[Epub]     CrossRef
  • Potential of ChatGPT to Pass the Japanese Medical and Healthcare Professional National Licenses: A Literature Review
    Kai Ishida, Eisuke Hanada
    Cureus.2024;[Epub]     CrossRef
  • Performance of Generative Pre-trained Transformer (GPT)-4 and Gemini Advanced on the First-Class Radiation Protection Supervisor Examination in Japan
    Hiroki Goto, Yoshioki Shiraishi, Seiji Okada
    Cureus.2024;[Epub]     CrossRef
  • Performance of ChatGPT‐3.5 and ChatGPT‐4o in the Japanese National Dental Examination
    Osamu Uehara, Tetsuro Morikawa, Fumiya Harada, Nodoka Sugiyama, Yuko Matsuki, Daichi Hiraki, Hinako Sakurai, Takashi Kado, Koki Yoshida, Yukie Murata, Hirofumi Matsuoka, Toshiyuki Nagasawa, Yasushi Furuichi, Yoshihiro Abiko, Hiroko Miura
    Journal of Dental Education.2024;[Epub]     CrossRef
  • An exploratory assessment of GPT-4o and GPT-4 performance on the Japanese National Dental Examination
    Masaki Morishita, Hikaru Fukuda, Shino Yamaguchi, Kosuke Muraoka, Taiji Nakamura, Masanari Hayashi, Izumi Yoshioka, Kentaro Ono, Shuji Awano
    The Saudi Dental Journal.2024; 36(12): 1577.     CrossRef
  • Evaluating the Accuracy of ChatGPT in the Japanese Board-Certified Physiatrist Examination
    Yuki Kato, Kenta Ushida, Ryo Momosaki
    Cureus.2024;[Epub]     CrossRef
Information amount, accuracy, and relevance of generative artificial intelligence platforms’ answers regarding learning objectives of medical arthropodology evaluated in English and Korean queries in December 2023: a descriptive study
Hyunju Lee, Soobin Park
J Educ Eval Health Prof. 2023;20:39.   Published online December 28, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.39
  • 3,001 View
  • 232 Download
  • 3 Web of Science
  • 3 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study assessed the performance of 6 generative artificial intelligence (AI) platforms on the learning objectives of medical arthropodology in a parasitology class in Korea. We examined the AI platforms’ performance by querying in Korean and English to determine their information amount, accuracy, and relevance in prompts in both languages.
Methods
From December 15 to 17, 2023, 6 generative AI platforms—Bard, Bing, Claude, Clova X, GPT-4, and Wrtn—were tested on 7 medical arthropodology learning objectives in English and Korean. Clova X and Wrtn are platforms from Korean companies. Responses were evaluated using specific criteria for the English and Korean queries.
Results
Bard had abundant information but was fourth in accuracy and relevance. GPT-4, with high information content, ranked first in accuracy and relevance. Clova X was 4th in amount but 2nd in accuracy and relevance. Bing provided less information, with moderate accuracy and relevance. Wrtn’s answers were short, with average accuracy and relevance. Claude AI had reasonable information, but lower accuracy and relevance. The responses in English were superior in all aspects. Clova X was notably optimized for Korean, leading in relevance.
Conclusion
In a study of 6 generative AI platforms applied to medical arthropodology, GPT-4 excelled overall, while Clova X, a Korea-based AI product, achieved 100% relevance in Korean queries, the highest among its peers. Utilizing these AI platforms in classrooms improved the authors’ self-efficacy and interest in the subject, offering a positive experience of interacting with generative AI platforms to question and receive information.

Citations

Citations to this article as recorded by  
  • Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
    Xiaojun Xu, Yixiao Chen, Jing Miao
    Journal of Educational Evaluation for Health Professions.2024; 21: 6.     CrossRef
  • The emergence of generative artificial intelligence platforms in 2023, journal metrics, appreciation to reviewers and volunteers, and obituary
    Sun Huh
    Journal of Educational Evaluation for Health Professions.2024; 21: 9.     CrossRef
  • Comparison of the Performance of ChatGPT, Claude and Bard in Support of Myopia Prevention and Control
    Yan Wang, Lihua Liang, Ran Li, Yihua Wang, Changfu Hao
    Journal of Multidisciplinary Healthcare.2024; Volume 17: 3917.     CrossRef
Review
Application of artificial intelligence chatbots, including ChatGPT, in education, scholarly work, programming, and content generation and its prospects: a narrative review
Tae Won Kim
J Educ Eval Health Prof. 2023;20:38.   Published online December 27, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.38
  • 12,330 View
  • 1,170 Download
  • 18 Web of Science
  • 19 Crossref
AbstractAbstract PDFSupplementary Material
This study aims to explore ChatGPT’s (GPT-3.5 version) functionalities, including reinforcement learning, diverse applications, and limitations. ChatGPT is an artificial intelligence (AI) chatbot powered by OpenAI’s Generative Pre-trained Transformer (GPT) model. The chatbot’s applications span education, programming, content generation, and more, demonstrating its versatility. ChatGPT can improve education by creating assignments and offering personalized feedback, as shown by its notable performance in medical exams and the United States Medical Licensing Exam. However, concerns include plagiarism, reliability, and educational disparities. It aids in various research tasks, from design to writing, and has shown proficiency in summarizing and suggesting titles. Its use in scientific writing and language translation is promising, but professional oversight is needed for accuracy and originality. It assists in programming tasks like writing code, debugging, and guiding installation and updates. It offers diverse applications, from cheering up individuals to generating creative content like essays, news articles, and business plans. Unlike search engines, ChatGPT provides interactive, generative responses and understands context, making it more akin to human conversation, in contrast to conventional search engines’ keyword-based, non-interactive nature. ChatGPT has limitations, such as potential bias, dependence on outdated data, and revenue generation challenges. Nonetheless, ChatGPT is considered to be a transformative AI tool poised to redefine the future of generative technology. In conclusion, advancements in AI, such as ChatGPT, are altering how knowledge is acquired and applied, marking a shift from search engines to creativity engines. This transformation highlights the increasing importance of AI literacy and the ability to effectively utilize AI in various domains of life.

Citations

Citations to this article as recorded by  
  • The Development and Validation of an Artificial Intelligence Chatbot Dependence Scale
    Xing Zhang, Mingyue Yin, Mingyang Zhang, Zhaoqian Li, Hansen Li
    Cyberpsychology, Behavior, and Social Networking.2025; 28(2): 126.     CrossRef
  • Readability, quality and accuracy of generative artificial intelligence chatbots for commonly asked questions about labor epidurals: a comparison of ChatGPT and Bard
    D. Lee, M. Brown, J. Hammond, M. Zakowski
    International Journal of Obstetric Anesthesia.2025; 61: 104317.     CrossRef
  • ChatGPT-4 Performance on German Continuing Medical Education—Friend or Foe (Trick or Treat)? Protocol for a Randomized Controlled Trial
    Christian Burisch, Abhav Bellary, Frank Breuckmann, Jan Ehlers, Serge C Thal, Timur Sellmann, Daniel Gödde
    JMIR Research Protocols.2025; 14: e63887.     CrossRef
  • The effect of incorporating large language models into the teaching on critical thinking disposition: An “AI + Constructivism Learning Theory” attempt
    Peng Wang, Kexin Yin, Mingzhu Zhang, Yuanxin Zheng, Tong Zhang, Yanjun Kang, Xun Feng
    Education and Information Technologies.2025;[Epub]     CrossRef
  • The Impact of Adaptive Learning Technologies, Personalized Feedback, and Interactive AI Tools on Student Engagement: The Moderating Role of Digital Literacy
    Husam Yaseen, Abdelaziz Saleh Mohammad, Najwa Ashal, Hesham Abusaimeh, Ahmad Ali, Abdel-Aziz Ahmad Sharabati
    Sustainability.2025; 17(3): 1133.     CrossRef
  • Artificial Intelligence in Nursing: New Opportunities and Challenges
    Estel·la Ramírez‐Baraldes, Daniel García‐Gutiérrez, Cristina García‐Salido
    European Journal of Education.2025;[Epub]     CrossRef
  • Can ChatGPT be used as a scientific source of information on tooth extraction?
    Shiori Yamamoto, Masakazu Hamada, Kyoko Nishiyama, Ayako Motoki, Yusei Fujita, Narikazu Uzawa
    Journal of Oral and Maxillofacial Surgery, Medicine, and Pathology.2025;[Epub]     CrossRef
  • Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
    Xiaojun Xu, Yixiao Chen, Jing Miao
    Journal of Educational Evaluation for Health Professions.2024; 21: 6.     CrossRef
  • Artificial Intelligence: Fundamentals and Breakthrough Applications in Epilepsy
    Wesley Kerr, Sandra Acosta, Patrick Kwan, Gregory Worrell, Mohamad A. Mikati
    Epilepsy Currents.2024;[Epub]     CrossRef
  • A Developed Graphical User Interface-Based on Different Generative Pre-trained Transformers Models
    Ekrem Küçük, İpek Balıkçı Çiçek, Zeynep Küçükakçalı, Cihan Yetiş, Cemil Çolak
    ODÜ Tıp Dergisi.2024; 11(1): 18.     CrossRef
  • Art or Artifact: Evaluating the Accuracy, Appeal, and Educational Value of AI-Generated Imagery in DALL·E 3 for Illustrating Congenital Heart Diseases
    Mohamad-Hani Temsah, Abdullah N. Alhuzaimi, Mohammed Almansour, Fadi Aljamaan, Khalid Alhasan, Munirah A. Batarfi, Ibraheem Altamimi, Amani Alharbi, Adel Abdulaziz Alsuhaibani, Leena Alwakeel, Abdulrahman Abdulkhaliq Alzahrani, Khaled B. Alsulaim, Amr Jam
    Journal of Medical Systems.2024;[Epub]     CrossRef
  • Authentic assessment in medical education: exploring AI integration and student-as-partners collaboration
    Syeda Sadia Fatima, Nabeel Ashfaque Sheikh, Athar Osama
    Postgraduate Medical Journal.2024; 100(1190): 959.     CrossRef
  • Comparative performance analysis of large language models: ChatGPT-3.5, ChatGPT-4 and Google Gemini in glucocorticoid-induced osteoporosis
    Linjian Tong, Chaoyang Zhang, Rui Liu, Jia Yang, Zhiming Sun
    Journal of Orthopaedic Surgery and Research.2024;[Epub]     CrossRef
  • Can AI-Generated Clinical Vignettes in Japanese Be Used Medically and Linguistically?
    Yasutaka Yanagita, Daiki Yokokawa, Shun Uchida, Yu Li, Takanori Uehara, Masatomi Ikusaka
    Journal of General Internal Medicine.2024; 39(16): 3282.     CrossRef
  • ChatGPT vs. sleep disorder specialist responses to common sleep queries: Ratings by experts and laypeople
    Jiyoung Kim, Seo-Young Lee, Jee Hyun Kim, Dong-Hyeon Shin, Eun Hye Oh, Jin A Kim, Jae Wook Cho
    Sleep Health.2024; 10(6): 665.     CrossRef
  • Technology integration into Chinese as a foreign language learning in higher education: An integrated bibliometric analysis and systematic review (2000–2024)
    Binze Xu
    Language Teaching Research.2024;[Epub]     CrossRef
  • The Transformative Power of Generative Artificial Intelligence for Achieving the Sustainable Development Goal of Quality Education
    Prema Nedungadi, Kai-Yu Tang, Raghu Raman
    Sustainability.2024; 16(22): 9779.     CrossRef
  • Is AI the new course creator
    Sheri Conklin, Tom Dorgan, Daisyane Barreto
    Discover Education.2024;[Epub]     CrossRef
  • Emergency Medicine Assistants in the Field of Toxicology, Comparison of ChatGPT-3.5 and GEMINI Artificial Intelligence Systems
    Hatice Aslı Bedel, Cihan Bedel, Fatih Selvi, Ökkeş Zortuk, Yusuf Karanci
    Acta medica Lituanica.2024; 31(2): 294.     CrossRef
Brief report
ChatGPT (GPT-3.5) as an assistant tool in microbial pathogenesis studies in Sweden: a cross-sectional comparative study  
Catharina Hultgren, Annica Lindkvist, Volkan Özenci, Sophie Curbo
J Educ Eval Health Prof. 2023;20:32.   Published online November 22, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.32
  • 2,271 View
  • 140 Download
  • 3 Web of Science
  • 3 Crossref
AbstractAbstract PDFSupplementary Material
ChatGPT (GPT-3.5) has entered higher education and there is a need to determine how to use it effectively. This descriptive study compared the ability of GPT-3.5 and teachers to answer questions from dental students and construct detailed intended learning outcomes. When analyzed according to a Likert scale, we found that GPT-3.5 answered the questions from dental students in a similar or even more elaborate way compared to the answers that had previously been provided by a teacher. GPT-3.5 was also asked to construct detailed intended learning outcomes for a course in microbial pathogenesis, and when these were analyzed according to a Likert scale they were, to a large degree, found irrelevant. Since students are using GPT-3.5, it is important that instructors learn how to make the best use of it both to be able to advise students and to benefit from its potential.

Citations

Citations to this article as recorded by  
  • Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
    Xiaojun Xu, Yixiao Chen, Jing Miao
    Journal of Educational Evaluation for Health Professions.2024; 21: 6.     CrossRef
  • Unlocking learning: exploring take-home examinations and viva voce examinations in microbiology education for biomedical laboratory science students
    Sophie Curbo, Annica Lindkvist, Catharina Hultgren, Jorge Cervantes
    Journal of Microbiology & Biology Education.2024;[Epub]     CrossRef
  • Information amount, accuracy, and relevance of generative artificial intelligence platforms’ answers regarding learning objectives of medical arthropodology evaluated in English and Korean queries in December 2023: a descriptive study
    Hyunju Lee, Soobin Park
    Journal of Educational Evaluation for Health Professions.2023; 20: 39.     CrossRef
Research articles
Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical Examination: a cross-sectional study  
Betzy Clariza Torres-Zegarra, Wagner Rios-Garcia, Alvaro Micael Ñaña-Cordova, Karen Fatima Arteaga-Cisneros, Xiomara Cristina Benavente Chalco, Marina Atena Bustamante Ordoñez, Carlos Jesus Gutierrez Rios, Carlos Alberto Ramos Godoy, Kristell Luisa Teresa Panta Quezada, Jesus Daniel Gutierrez-Arratia, Javier Alejandro Flores-Cohaila
J Educ Eval Health Prof. 2023;20:30.   Published online November 20, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.30
  • 3,371 View
  • 227 Download
  • 15 Web of Science
  • 18 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
We aimed to describe the performance and evaluate the educational value of justifications provided by artificial intelligence chatbots, including GPT-3.5, GPT-4, Bard, Claude, and Bing, on the Peruvian National Medical Licensing Examination (P-NLME).
Methods
This was a cross-sectional analytical study. On July 25, 2023, each multiple-choice question (MCQ) from the P-NLME was entered into each chatbot (GPT-3, GPT-4, Bing, Bard, and Claude) 3 times. Then, 4 medical educators categorized the MCQs in terms of medical area, item type, and whether the MCQ required Peru-specific knowledge. They assessed the educational value of the justifications from the 2 top performers (GPT-4 and Bing).
Results
GPT-4 scored 86.7% and Bing scored 82.2%, followed by Bard and Claude, and the historical performance of Peruvian examinees was 55%. Among the factors associated with correct answers, only MCQs that required Peru-specific knowledge had lower odds (odds ratio, 0.23; 95% confidence interval, 0.09–0.61), whereas the remaining factors showed no associations. In assessing the educational value of justifications provided by GPT-4 and Bing, neither showed any significant differences in certainty, usefulness, or potential use in the classroom.
Conclusion
Among chatbots, GPT-4 and Bing were the top performers, with Bing performing better at Peru-specific MCQs. Moreover, the educational value of justifications provided by the GPT-4 and Bing could be deemed appropriate. However, it is essential to start addressing the educational value of these chatbots, rather than merely their performance on examinations.

Citations

Citations to this article as recorded by  
  • PICOT questions and search strategies formulation: A novel approach using artificial intelligence automation
    Lucija Gosak, Gregor Štiglic, Lisiane Pruinelli, Dominika Vrbnjak
    Journal of Nursing Scholarship.2025; 57(1): 5.     CrossRef
  • Using large language models (ChatGPT, Copilot, PaLM, Bard, and Gemini) in Gross Anatomy course: Comparative analysis
    Volodymyr Mavrych, Paul Ganguly, Olena Bolgova
    Clinical Anatomy.2025; 38(2): 200.     CrossRef
  • Capable exam-taker and question-generator: the dual role of generative AI in medical education assessment
    Yihong Qiu, Chang Liu
    Global Medical Education.2025;[Epub]     CrossRef
  • Comparison of artificial intelligence systems in answering prosthodontics questions from the dental specialty exam in Turkey
    Busra Tosun, Zeynep Sen Yilmaz
    Journal of Dental Sciences.2025;[Epub]     CrossRef
  • Benchmarking LLM chatbots’ oncological knowledge with the Turkish Society of Medical Oncology’s annual board examination questions
    Efe Cem Erdat, Engin Eren Kavak
    BMC Cancer.2025;[Epub]     CrossRef
  • Performance of GPT-4V in Answering the Japanese Otolaryngology Board Certification Examination Questions: Evaluation Study
    Masao Noda, Takayoshi Ueno, Ryota Koshu, Yuji Takaso, Mari Dias Shimada, Chizu Saito, Hisashi Sugimoto, Hiroaki Fushiki, Makoto Ito, Akihiro Nomura, Tomokazu Yoshizaki
    JMIR Medical Education.2024; 10: e57054.     CrossRef
  • Response to Letter to the Editor re: “Artificial Intelligence Versus Expert Plastic Surgeon: Comparative Study Shows ChatGPT ‘Wins' Rhinoplasty Consultations: Should We Be Worried? [1]” by Durairaj et al
    Kay Durairaj, Omer Baker
    Facial Plastic Surgery & Aesthetic Medicine.2024; 26(3): 276.     CrossRef
  • Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
    Xiaojun Xu, Yixiao Chen, Jing Miao
    Journal of Educational Evaluation for Health Professions.2024; 21: 6.     CrossRef
  • Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis
    Mingxin Liu, Tsuyoshi Okuhara, XinYi Chang, Ritsuko Shirabe, Yuriko Nishiie, Hiroko Okada, Takahiro Kiuchi
    Journal of Medical Internet Research.2024; 26: e60807.     CrossRef
  • Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study
    Giacomo Rossettini, Lia Rodeghiero, Federica Corradi, Chad Cook, Paolo Pillastrini, Andrea Turolla, Greta Castellini, Stefania Chiappinotto, Silvia Gianola, Alvisa Palese
    BMC Medical Education.2024;[Epub]     CrossRef
  • Evaluating the competency of ChatGPT in MRCP Part 1 and a systematic literature review of its capabilities in postgraduate medical assessments
    Oliver Vij, Henry Calver, Nikki Myall, Mrinalini Dey, Koushan Kouranloo, Thiago P. Fernandes
    PLOS ONE.2024; 19(7): e0307372.     CrossRef
  • Large Language Models in Pediatric Education: Current Uses and Future Potential
    Srinivasan Suresh, Sanghamitra M. Misra
    Pediatrics.2024;[Epub]     CrossRef
  • Comparison of the Performance of ChatGPT, Claude and Bard in Support of Myopia Prevention and Control
    Yan Wang, Lihua Liang, Ran Li, Yihua Wang, Changfu Hao
    Journal of Multidisciplinary Healthcare.2024; Volume 17: 3917.     CrossRef
  • Evaluating Large Language Models in Dental Anesthesiology: A Comparative Analysis of ChatGPT-4, Claude 3 Opus, and Gemini 1.0 on the Japanese Dental Society of Anesthesiology Board Certification Exam
    Misaki Fujimoto, Hidetaka Kuroda, Tomomi Katayama, Atsuki Yamaguchi, Norika Katagiri, Keita Kagawa, Shota Tsukimoto, Akito Nakano, Uno Imaizumi, Aiji Sato-Boku, Naotaka Kishimoto, Tomoki Itamiya, Kanta Kido, Takuro Sanuki
    Cureus.2024;[Epub]     CrossRef
  • Dermatological Knowledge and Image Analysis Performance of Large Language Models Based on Specialty Certificate Examination in Dermatology
    Ka Siu Fan, Ka Hay Fan
    Dermato.2024; 4(4): 124.     CrossRef
  • ChatGPT and Other Large Language Models in Medical Education — Scoping Literature Review
    Alexandra Aster, Matthias Carl Laupichler, Tamina Rockwell-Kollmann, Gilda Masala, Ebru Bala, Tobias Raupach
    Medical Science Educator.2024;[Epub]     CrossRef
  • Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study
    Yikai Chen, Xiujie Huang, Fangjie Yang, Haiming Lin, Haoyu Lin, Zhuoqun Zheng, Qifeng Liang, Jinhai Zhang, Xinxin Li
    BMC Medical Education.2024;[Epub]     CrossRef
  • Information amount, accuracy, and relevance of generative artificial intelligence platforms’ answers regarding learning objectives of medical arthropodology evaluated in English and Korean queries in December 2023: a descriptive study
    Hyunju Lee, Soobin Park
    Journal of Educational Evaluation for Health Professions.2023; 20: 39.     CrossRef
Medical students’ patterns of using ChatGPT as a feedback tool and perceptions of ChatGPT in a Leadership and Communication course in Korea: a cross-sectional study  
Janghee Park
J Educ Eval Health Prof. 2023;20:29.   Published online November 10, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.29
  • 3,628 View
  • 242 Download
  • 6 Web of Science
  • 9 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to analyze patterns of using ChatGPT before and after group activities and to explore medical students’ perceptions of ChatGPT as a feedback tool in the classroom.
Methods
The study included 99 2nd-year pre-medical students who participated in a “Leadership and Communication” course from March to June 2023. Students engaged in both individual and group activities related to negotiation strategies. ChatGPT was used to provide feedback on their solutions. A survey was administered to assess students’ perceptions of ChatGPT’s feedback, its use in the classroom, and the strengths and challenges of ChatGPT from May 17 to 19, 2023.
Results
The students responded by indicating that ChatGPT’s feedback was helpful, and revised and resubmitted their group answers in various ways after receiving feedback. The majority of respondents expressed agreement with the use of ChatGPT during class. The most common response concerning the appropriate context of using ChatGPT’s feedback was “after the first round of discussion, for revisions.” There was a significant difference in satisfaction with ChatGPT’s feedback, including correctness, usefulness, and ethics, depending on whether or not ChatGPT was used during class, but there was no significant difference according to gender or whether students had previous experience with ChatGPT. The strongest advantages were “providing answers to questions” and “summarizing information,” and the worst disadvantage was “producing information without supporting evidence.”
Conclusion
The students were aware of the advantages and disadvantages of ChatGPT, and they had a positive attitude toward using ChatGPT in the classroom.

Citations

Citations to this article as recorded by  
  • Higher education students’ perceptions of ChatGPT: A global study of early reactions
    Dejan Ravšelj, Damijana Keržič, Nina Tomaževič, Lan Umek, Nejc Brezovar, Noorminshah A. Iahad, Ali Abdulla Abdulla, Anait Akopyan, Magdalena Waleska Aldana Segura, Jehan AlHumaid, Mohamed Farouk Allam, Maria Alló, Raphael Papa Kweku Andoh, Octavian Andron
    PLOS ONE.2025; 20(2): e0315011.     CrossRef
  • Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
    Xiaojun Xu, Yixiao Chen, Jing Miao
    Journal of Educational Evaluation for Health Professions.2024; 21: 6.     CrossRef
  • Embracing ChatGPT for Medical Education: Exploring Its Impact on Doctors and Medical Students
    Yijun Wu, Yue Zheng, Baijie Feng, Yuqi Yang, Kai Kang, Ailin Zhao
    JMIR Medical Education.2024; 10: e52483.     CrossRef
  • Integration of ChatGPT Into a Course for Medical Students: Explorative Study on Teaching Scenarios, Students’ Perception, and Applications
    Anita V Thomae, Claudia M Witt, Jürgen Barth
    JMIR Medical Education.2024; 10: e50545.     CrossRef
  • A cross sectional investigation of ChatGPT-like large language models application among medical students in China
    Guixia Pan, Jing Ni
    BMC Medical Education.2024;[Epub]     CrossRef
  • A Pilot Study of Medical Student Opinions on Large Language Models
    Alan Y Xu, Vincent S Piranio, Skye Speakman, Chelsea D Rosen, Sally Lu, Chris Lamprecht, Robert E Medina, Maisha Corrielus, Ian T Griffin, Corinne E Chatham, Nicolas J Abchee, Daniel Stribling, Phuong B Huynh, Heather Harrell, Benjamin Shickel, Meghan Bre
    Cureus.2024;[Epub]     CrossRef
  • The intent of ChatGPT usage and its robustness in medical proficiency exams: a systematic review
    Tatiana Chaiban, Zeinab Nahle, Ghaith Assi, Michelle Cherfane
    Discover Education.2024;[Epub]     CrossRef
  • ChatGPT and Clinical Training: Perception, Concerns, and Practice of Pharm-D Students
    Mohammed Zawiah, Fahmi Al-Ashwal, Lobna Gharaibeh, Rana Abu Farha, Karem Alzoubi, Khawla Abu Hammour, Qutaiba A Qasim, Fahd Abrah
    Journal of Multidisciplinary Healthcare.2023; Volume 16: 4099.     CrossRef
  • Information amount, accuracy, and relevance of generative artificial intelligence platforms’ answers regarding learning objectives of medical arthropodology evaluated in English and Korean queries in December 2023: a descriptive study
    Hyunju Lee, Soobin Park
    Journal of Educational Evaluation for Health Professions.2023; 20: 39.     CrossRef
Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study  
Aleksandra Ignjatović, Lazar Stevanović
J Educ Eval Health Prof. 2023;20:28.   Published online October 16, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.28
  • 4,328 View
  • 226 Download
  • 11 Web of Science
  • 13 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to assess the performance of ChatGPT (GPT-3.5 and GPT-4) as a study tool in solving biostatistical problems and to identify any potential drawbacks that might arise from using ChatGPT in medical education, particularly in solving practical biostatistical problems.
Methods
ChatGPT was tested to evaluate its ability to solve biostatistical problems from the Handbook of Medical Statistics by Peacock and Peacock in this descriptive study. Tables from the problems were transformed into textual questions. Ten biostatistical problems were randomly chosen and used as text-based input for conversation with ChatGPT (versions 3.5 and 4).
Results
GPT-3.5 solved 5 practical problems in the first attempt, related to categorical data, cross-sectional study, measuring reliability, probability properties, and the t-test. GPT-3.5 failed to provide correct answers regarding analysis of variance, the chi-square test, and sample size within 3 attempts. GPT-4 also solved a task related to the confidence interval in the first attempt and solved all questions within 3 attempts, with precise guidance and monitoring.
Conclusion
The assessment of both versions of ChatGPT performance in 10 biostatistical problems revealed that GPT-3.5 and 4’s performance was below average, with correct response rates of 5 and 6 out of 10 on the first attempt. GPT-4 succeeded in providing all correct answers within 3 attempts. These findings indicate that students must be aware that this tool, even when providing and calculating different statistical analyses, can be wrong, and they should be aware of ChatGPT’s limitations and be careful when incorporating this model into medical education.

Citations

Citations to this article as recorded by  
  • From statistics to deep learning: Using large language models in psychiatric research
    Yining Hua, Andrew Beam, Lori B. Chibnik, John Torous
    International Journal of Methods in Psychiatric Research.2025;[Epub]     CrossRef
  • Assessing the Current Limitations of Large Language Models in Advancing Health Care Education
    JaeYong Kim, Bathri Narayan Vajravelu
    JMIR Formative Research.2025; 9: e51319.     CrossRef
  • ChatGPT for Univariate Statistics: Validation of AI-Assisted Data Analysis in Healthcare Research
    Michael R Ruta, Tony Gaidici, Chase Irwin, Jonathan Lifshitz
    Journal of Medical Internet Research.2025; 27: e63550.     CrossRef
  • Can Generative AI and ChatGPT Outperform Humans on Cognitive-Demanding Problem-Solving Tasks in Science?
    Xiaoming Zhai, Matthew Nyaaba, Wenchao Ma
    Science & Education.2024;[Epub]     CrossRef
  • Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
    Xiaojun Xu, Yixiao Chen, Jing Miao
    Journal of Educational Evaluation for Health Professions.2024; 21: 6.     CrossRef
  • Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom’s Taxonomy
    Ambadasu Bharatha, Nkemcho Ojeh, Ahbab Mohammad Fazle Rabbi, Michael Campbell, Kandamaran Krishnamurthy, Rhaheem Layne-Yarde, Alok Kumar, Dale Springer, Kenneth Connell, Md Anwarul Majumder
    Advances in Medical Education and Practice.2024; Volume 15: 393.     CrossRef
  • Revolutionizing Cardiology With Words: Unveiling the Impact of Large Language Models in Medical Science Writing
    Abhijit Bhattaru, Naveena Yanamala, Partho P. Sengupta
    Canadian Journal of Cardiology.2024; 40(10): 1950.     CrossRef
  • ChatGPT in medicine: prospects and challenges: a review article
    Songtao Tan, Xin Xin, Di Wu
    International Journal of Surgery.2024; 110(6): 3701.     CrossRef
  • In-depth analysis of ChatGPT’s performance based on specific signaling words and phrases in the question stem of 2377 USMLE step 1 style questions
    Leonard Knoedler, Samuel Knoedler, Cosima C. Hoch, Lukas Prantl, Konstantin Frank, Laura Soiderer, Sebastian Cotofana, Amir H. Dorafshar, Thilo Schenck, Felix Vollbach, Giuseppe Sofo, Michael Alfertshofer
    Scientific Reports.2024;[Epub]     CrossRef
  • Evaluating the quality of responses generated by ChatGPT
    Danimir Mandić, Gordana Miščević, Ljiljana Bujišić
    Metodicka praksa.2024; 27(1): 5.     CrossRef
  • A Comparative Evaluation of Statistical Product and Service Solutions (SPSS) and ChatGPT-4 in Statistical Analyses
    Al Imran Shahrul, Alizae Marny F Syed Mohamed
    Cureus.2024;[Epub]     CrossRef
  • ChatGPT and Other Large Language Models in Medical Education — Scoping Literature Review
    Alexandra Aster, Matthias Carl Laupichler, Tamina Rockwell-Kollmann, Gilda Masala, Ebru Bala, Tobias Raupach
    Medical Science Educator.2024;[Epub]     CrossRef
  • Exploring the potential of large language models for integration into an academic statistical consulting service–the EXPOLS study protocol
    Urs Alexander Fichtner, Jochen Knaus, Erika Graf, Georg Koch, Jörg Sahlmann, Dominikus Stelzer, Martin Wolkewitz, Harald Binder, Susanne Weber, Bekalu Tadesse Moges
    PLOS ONE.2024; 19(12): e0308375.     CrossRef
Brief report
Comparing ChatGPT’s ability to rate the degree of stereotypes and the consistency of stereotype attribution with those of medical students in New Zealand in developing a similarity rating test: a methodological study  
Chao-Cheng Lin, Zaine Akuhata-Huntington, Che-Wei Hsu
J Educ Eval Health Prof. 2023;20:17.   Published online June 12, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.17
  • 3,404 View
  • 159 Download
  • 3 Web of Science
  • 4 Crossref
AbstractAbstract PDFSupplementary Material
Learning about one’s implicit bias is crucial for improving one’s cultural competency and thereby reducing health inequity. To evaluate bias among medical students following a previously developed cultural training program targeting New Zealand Māori, we developed a text-based, self-evaluation tool called the Similarity Rating Test (SRT). The development process of the SRT was resource-intensive, limiting its generalizability and applicability. Here, we explored the potential of ChatGPT, an automated chatbot, to assist in the development process of the SRT by comparing ChatGPT’s and students’ evaluations of the SRT. Despite results showing non-significant equivalence and difference between ChatGPT’s and students’ ratings, ChatGPT’s ratings were more consistent than students’ ratings. The consistency rate was higher for non-stereotypical than for stereotypical statements, regardless of rater type. Further studies are warranted to validate ChatGPT’s potential for assisting in SRT development for implementation in medical education and evaluation of ethnic stereotypes and related topics.

Citations

Citations to this article as recorded by  
  • The Performance of ChatGPT on Short-answer Questions in a Psychiatry Examination: A Pilot Study
    Chao-Cheng Lin, Kobus du Plooy, Andrew Gray, Deirdre Brown, Linda Hobbs, Tess Patterson, Valerie Tan, Daniel Fridberg, Che-Wei Hsu
    Taiwanese Journal of Psychiatry.2024; 38(2): 94.     CrossRef
  • ChatGPT and Other Large Language Models in Medical Education — Scoping Literature Review
    Alexandra Aster, Matthias Carl Laupichler, Tamina Rockwell-Kollmann, Gilda Masala, Ebru Bala, Tobias Raupach
    Medical Science Educator.2024;[Epub]     CrossRef
  • Psychiatric Care, Training and Research in Aotearoa New Zealand
    Chao-Cheng (Chris) Lin, Charlotte Mentzel, Maria Luz C. Querubin
    Taiwanese Journal of Psychiatry.2024; 38(4): 161.     CrossRef
  • Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study
    Aleksandra Ignjatović, Lazar Stevanović
    Journal of Educational Evaluation for Health Professions.2023; 20: 28.     CrossRef
Review
Can an artificial intelligence chatbot be the author of a scholarly article?  
Ju Yoen Lee
J Educ Eval Health Prof. 2023;20:6.   Published online February 27, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.6
  • 12,619 View
  • 824 Download
  • 59 Web of Science
  • 56 Crossref
AbstractAbstract PDFSupplementary Material
At the end of 2022, the appearance of ChatGPT, an artificial intelligence (AI) chatbot with amazing writing ability, caused a great sensation in academia. The chatbot turned out to be very capable, but also capable of deception, and the news broke that several researchers had listed the chatbot (including its earlier version) as co-authors of their academic papers. In response, Nature and Science expressed their position that this chatbot cannot be listed as an author in the papers they publish. Since an AI chatbot is not a human being, in the current legal system, the text automatically generated by an AI chatbot cannot be a copyrighted work; thus, an AI chatbot cannot be an author of a copyrighted work. Current AI chatbots such as ChatGPT are much more advanced than search engines in that they produce original text, but they still remain at the level of a search engine in that they cannot take responsibility for their writing. For this reason, they also cannot be authors from the perspective of research ethics.

Citations

Citations to this article as recorded by  
  • Attitudes and perceptions of medical researchers towards the use of artificial intelligence chatbots in the scientific process: an international cross-sectional survey
    Jeremy Y Ng, Sharleen G Maduranayagam, Nirekah Suthakar, Amy Li, Cynthia Lokker, Alfonso Iorio, R Brian Haynes, David Moher
    The Lancet Digital Health.2025; 7(1): e94.     CrossRef
  • Introducing Our Custom GPT: An Example of the Potential Impact of Personalized GPT Builders on Scientific Writing
    Aymen Kabir, Suraj Shah, Alexander Haddad, Daniel M.S. Raper
    World Neurosurgery.2025; 193: 461.     CrossRef
  • Integrating Artificial Intelligence in Medical Writing: Balancing Technological Innovation and Human Expertise, with Practical Applications in Lower Extremity Wounds Care
    Pak Thaichana, Myo Zin Oo, Gabriel Leiden Thorup, Chayatorn Chansakaow, Supapong Arworn, Kittipan Rerkasem
    The International Journal of Lower Extremity Wounds.2025;[Epub]     CrossRef
  • Ethical issues and violations in using chatbots in academic writing and publishing: the answers from ChatGPT
    Eren Erkılıç, Ibrahim Cifci
    Journal of Multidisciplinary Academic Tourism.2025; 10(1): 111.     CrossRef
  • Risks of abuse of large language models, like ChatGPT, in scientific publishing: Authorship, predatory publishing, and paper mills
    Graham Kendall, Jaime A. Teixeira da Silva
    Learned Publishing.2024; 37(1): 55.     CrossRef
  • Can ChatGPT be an author? A study of artificial intelligence authorship policies in top academic journals
    Brady D. Lund, K.T. Naheem
    Learned Publishing.2024; 37(1): 13.     CrossRef
  • Artificial Intelligence–Generated Scientific Literature: A Critical Appraisal
    Justyna Zybaczynska, Matthew Norris, Sunjay Modi, Jennifer Brennan, Pooja Jhaveri, Timothy J. Craig, Taha Al-Shaikhly
    The Journal of Allergy and Clinical Immunology: In Practice.2024; 12(1): 106.     CrossRef
  • Does Google’s Bard Chatbot perform better than ChatGPT on the European hand surgery exam?
    Goetsch Thibaut, Armaghan Dabbagh, Philippe Liverneaux
    International Orthopaedics.2024; 48(1): 151.     CrossRef
  • ChatGPT in medical writing: A game-changer or a gimmick?
    Shital Sarah Ahaley, Ankita Pandey, Simran Kaur Juneja, Tanvi Suhane Gupta, Sujatha Vijayakumar
    Perspectives in Clinical Research.2024; 15(4): 165.     CrossRef
  • A Brief Review of the Efficacy in Artificial Intelligence and Chatbot-Generated Personalized Fitness Regimens
    Daniel K. Bays, Cole Verble, Kalyn M. Powers Verble
    Strength & Conditioning Journal.2024; 46(4): 485.     CrossRef
  • Academic publisher guidelines on AI usage: A ChatGPT supported thematic analysis
    Mike Perkins, Jasper Roe
    F1000Research.2024; 12: 1398.     CrossRef
  • The Use of Artificial Intelligence in Writing Scientific Review Articles
    Melissa A. Kacena, Lilian I. Plotkin, Jill C. Fehrenbacher
    Current Osteoporosis Reports.2024; 22(1): 115.     CrossRef
  • Using AI to Write a Review Article Examining the Role of the Nervous System on Skeletal Homeostasis and Fracture Healing
    Murad K. Nazzal, Ashlyn J. Morris, Reginald S. Parker, Fletcher A. White, Roman M. Natoli, Jill C. Fehrenbacher, Melissa A. Kacena
    Current Osteoporosis Reports.2024; 22(1): 217.     CrossRef
  • GenAI et al.: Cocreation, Authorship, Ownership, Academic Ethics and Integrity in a Time of Generative AI
    Aras Bozkurt
    Open Praxis.2024; 16(1): 1.     CrossRef
  • An integrative decision-making framework to guide policies on regulating ChatGPT usage
    Umar Ali Bukar, Md Shohel Sayeed, Siti Fatimah Abdul Razak, Sumendra Yogarayan, Oluwatosin Ahmed Amodu
    PeerJ Computer Science.2024; 10: e1845.     CrossRef
  • Artificial Intelligence and Its Role in Medical Research
    Anurag Gola, Ambarish Das, Amar B. Gumataj, S. Amirdhavarshini, J. Venkatachalam
    Current Medical Issues.2024; 22(2): 97.     CrossRef
  • From advancements to ethics: Assessing ChatGPT’s role in writing research paper
    Vasu Gupta, Fnu Anamika, Kinna Parikh, Meet A Patel, Rahul Jain, Rohit Jain
    Turkish Journal of Internal Medicine.2024; 6(2): 74.     CrossRef
  • Yapay Zekânın Edebiyatta Kullanım Serüveni
    Nesime Ceyhan Akça, Serap Aslan Cobutoğlu, Özlem Yeşim Özbek, Mehmet Furkan Akça
    RumeliDE Dil ve Edebiyat Araştırmaları Dergisi.2024; (39): 283.     CrossRef
  • ChatGPT's Gastrointestinal Tumor Board Tango: A limping dance partner?
    Ughur Aghamaliyev, Javad Karimbayli, Clemens Giessen-Jung, Matthias Ilmer, Kristian Unger, Dorian Andrade, Felix O. Hofmann, Maximilian Weniger, Martin K. Angele, C. Benedikt Westphalen, Jens Werner, Bernhard W. Renz
    European Journal of Cancer.2024; 205: 114100.     CrossRef
  • Gout and Gout-Related Comorbidities: Insight and Limitations from Population-Based Registers in Sweden
    Panagiota Drivelegka, Lennart TH Jacobsson, Mats Dehlin
    Gout, Urate, and Crystal Deposition Disease.2024; 2(2): 144.     CrossRef
  • Artificial intelligence in academic cardiothoracic surgery
    Adham AHMED, Irbaz HAMEED
    The Journal of Cardiovascular Surgery.2024;[Epub]     CrossRef
  • The emergence of generative artificial intelligence platforms in 2023, journal metrics, appreciation to reviewers and volunteers, and obituary
    Sun Huh
    Journal of Educational Evaluation for Health Professions.2024; 21: 9.     CrossRef
  • A survey of safety and trustworthiness of large language models through the lens of verification and validation
    Xiaowei Huang, Wenjie Ruan, Wei Huang, Gaojie Jin, Yi Dong, Changshun Wu, Saddek Bensalem, Ronghui Mu, Yi Qi, Xingyu Zhao, Kaiwen Cai, Yanghao Zhang, Sihao Wu, Peipei Xu, Dengyu Wu, Andre Freitas, Mustafa A. Mustafa
    Artificial Intelligence Review.2024;[Epub]     CrossRef
  • Identification of ChatGPT-Generated Abstracts Within Shoulder and Elbow Surgery Poses a Challenge for Reviewers
    Ryan D. Stadler, Suleiman Y. Sudah, Michael A. Moverman, Patrick J. Denard, Xavier A. Duralde, Grant E. Garrigues, Christopher S. Klifto, Jonathan C. Levy, Surena Namdari, Joaquin Sanchez-Sotelo, Mariano E. Menendez
    Arthroscopy: The Journal of Arthroscopic & Related Surgery.2024;[Epub]     CrossRef
  • Decision-Making Framework for the Utilization of Generative Artificial Intelligence in Education: A Case Study of ChatGPT
    Umar Ali Bukar, Md. Shohel Sayeed, Siti Fatimah Abdul Razak, Sumendra Yogarayan, Radhwan Sneesl
    IEEE Access.2024; 12: 95368.     CrossRef
  • ChatGPT or Gemini: Who Makes the Better Scientific Writing Assistant?
    Hatoon S. AlSagri, Faiza Farhat, Shahab Saquib Sohail, Abdul Khader Jilani Saudagar
    Journal of Academic Ethics.2024;[Epub]     CrossRef
  • The Syntax of Smart Writing: Artificial Intelligence Unveiled
    Balaji Arumugam, Arun Murugan, Kirubakaran S., Saranya Rajamanickam
    International Journal of Preventative & Evidence Based Medicine.2024; : 1.     CrossRef
  • Generative artificial intelligence usage by researchers at work: Effects of gender, career stage, type of workplace, and perceived barriers
    Pablo Dorta-González, Alexis Jorge López-Puig, María Isabel Dorta-González, Sara M. González-Betancor
    Telematics and Informatics.2024; 94: 102187.     CrossRef
  • Leveraging Artificial Intelligence In Project-Based Service Learning To Advance Sustainable Development: A Pedagogical Approach For Marketing Education
    C. M. Dubay, Melanie B. Richards
    Marketing Education Review.2024; 34(4): 307.     CrossRef
  • Let stochastic parrots squawk: why academic journals should allow large language models to coauthor articles
    Nicholas J. Abernethy
    AI and Ethics.2024;[Epub]     CrossRef
  • Can ChatGPT be an author? Generative AI creative writing assistance and perceptions of authorship, creatorship, responsibility, and disclosure
    Paul Formosa, Sarah Bankins, Rita Matulionyte, Omid Ghasemi
    AI & SOCIETY.2024;[Epub]     CrossRef
  • Strategies for integrating ChatGPT and generative AI into clinical studies
    Jeong-Moo Lee
    Blood Research.2024;[Epub]     CrossRef
  • Universal skepticism of ChatGPT: a review of early literature on chat generative pre-trained transformer
    Casey Watters, Michal K. Lemanski
    Frontiers in Big Data.2023;[Epub]     CrossRef
  • The importance of human supervision in the use of ChatGPT as a support tool in scientific writing
    William Castillo-González
    Metaverse Basic and Applied Research.2023;[Epub]     CrossRef
  • ChatGPT for Future Medical and Dental Research
    Bader Fatani
    Cureus.2023;[Epub]     CrossRef
  • Chatbots in Medical Research
    Punit Sharma
    Clinical Nuclear Medicine.2023; 48(9): 838.     CrossRef
  • Potential applications of ChatGPT in dermatology
    Nicolas Kluger
    Journal of the European Academy of Dermatology and Venereology.2023;[Epub]     CrossRef
  • The emergent role of artificial intelligence, natural learning processing, and large language models in higher education and research
    Tariq Alqahtani, Hisham A. Badreldin, Mohammed Alrashed, Abdulrahman I. Alshaya, Sahar S. Alghamdi, Khalid bin Saleh, Shuroug A. Alowais, Omar A. Alshaya, Ishrat Rahman, Majed S. Al Yami, Abdulkareem M. Albekairy
    Research in Social and Administrative Pharmacy.2023; 19(8): 1236.     CrossRef
  • ChatGPT Performance on the American Urological Association Self-assessment Study Program and the Potential Influence of Artificial Intelligence in Urologic Training
    Nicholas A. Deebel, Ryan Terlecki
    Urology.2023; 177: 29.     CrossRef
  • Intelligence or artificial intelligence? More hard problems for authors of Biological Psychology, the neurosciences, and everyone else
    Thomas Ritz
    Biological Psychology.2023; 181: 108590.     CrossRef
  • The ethics of disclosing the use of artificial intelligence tools in writing scholarly manuscripts
    Mohammad Hosseini, David B Resnik, Kristi Holmes
    Research Ethics.2023; 19(4): 449.     CrossRef
  • How trustworthy is ChatGPT? The case of bibliometric analyses
    Faiza Farhat, Shahab Saquib Sohail, Dag Øivind Madsen
    Cogent Engineering.2023;[Epub]     CrossRef
  • Disclosing use of Artificial Intelligence: Promoting transparency in publishing
    Parvaiz A. Koul
    Lung India.2023; 40(5): 401.     CrossRef
  • ChatGPT in medical research: challenging time ahead
    Daideepya C Bhargava, Devendra Jadav, Vikas P Meshram, Tanuj Kanchan
    Medico-Legal Journal.2023; 91(4): 223.     CrossRef
  • Academic publisher guidelines on AI usage: A ChatGPT supported thematic analysis
    Mike Perkins, Jasper Roe
    F1000Research.2023; 12: 1398.     CrossRef
  • The Role of AI in Writing an Article and Whether it Can Be a Co-author: What if it Gets Support From 2 Different AIs Like ChatGPT and Google Bard for the Same Theme?
    İlhan Bahşi, Ayşe Balat
    Journal of Craniofacial Surgery.2023;[Epub]     CrossRef
  • Ethical consideration of the use of generative artificial intelligence, including ChatGPT in writing a nursing article
    Sun Huh
    Child Health Nursing Research.2023; 29(4): 249.     CrossRef
  • Artificial Intelligence-Supported Systems in Anesthesiology and Its Standpoint to Date—A Review
    Fiona M. P. Pham
    Open Journal of Anesthesiology.2023; 13(07): 140.     CrossRef
  • ChatGPT as an innovative tool for increasing sales in online stores
    Michał Orzoł, Katarzyna Szopik-Depczyńska
    Procedia Computer Science.2023; 225: 3450.     CrossRef
  • Intelligent Plagiarism as a Misconduct in Academic Integrity
    Jesús Miguel Muñoz-Cantero, Eva Maria Espiñeira-Bellón
    Acta Médica Portuguesa.2023; 37(1): 1.     CrossRef
  • Follow-up of Artificial Intelligence Development and its Controlled Contribution to the Article: Step to the Authorship?
    Ekrem Solmaz
    European Journal of Therapeutics.2023;[Epub]     CrossRef
  • May Artificial Intelligence Be a Co-Author on an Academic Paper?
    Ayşe Balat, İlhan Bahşi
    European Journal of Therapeutics.2023; 29(3): e12.     CrossRef
  • Opportunities and challenges for ChatGPT and large language models in biomedicine and health
    Shubo Tian, Qiao Jin, Lana Yeganova, Po-Ting Lai, Qingqing Zhu, Xiuying Chen, Yifan Yang, Qingyu Chen, Won Kim, Donald C Comeau, Rezarta Islamaj, Aadit Kapoor, Xin Gao, Zhiyong Lu
    Briefings in Bioinformatics.2023;[Epub]     CrossRef
  • ChatGPT: "To be or not to be" ... in academic research. The human mind's analytical rigor and capacity to discriminate between AI bots' truths and hallucinations
    Aurelian Anghelescu, Ilinca Ciobanu, Constantin Munteanu, Lucia Ana Maria Anghelescu, Gelu Onose
    Balneo and PRM Research Journal.2023; 14(Vol.14, no): 614.     CrossRef
  • Editorial policies of Journal of Educational Evaluation for Health Professions on the use of generative artificial intelligence in article writing and peer review
    Sun Huh
    Journal of Educational Evaluation for Health Professions.2023; 20: 40.     CrossRef
  • Should We Wait for Major Frauds to Unveil to Plan an AI Use License?
    Istemihan Coban
    European Journal of Therapeutics.2023; 30(2): 198.     CrossRef

JEEHP : Journal of Educational Evaluation for Health Professions
TOP