Skip Navigation
Skip to contents

JEEHP : Journal of Educational Evaluation for Health Professions

OPEN ACCESS
SEARCH
Search

Search

Page Path
HOME > Search
165 "Medical"
Filter
Filter
Article category
Keywords
Publication year
Authors
Funded articles
Review
Insights into undergraduate medical student selection tools: a systematic review and meta-analysis
Pin-Hsiang Huang, Arash Arianpoor, Silas Taylor, Jenzel Gonzales, Boaz Shulruf
J Educ Eval Health Prof. 2024;21:22.   Published online September 12, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.22    [Epub ahead of print]
  • 185 View
  • 34 Download
AbstractAbstract PDF
Purpose
Evaluating medical school selection tools is vital for evidence-based student selection. With previous reviews revealing knowledge gaps, this meta-analysis offers insights into the effectiveness of these selection tools.
Methods
A systematic review and meta-analysis were conducted applying the following criteria: peer-reviewed articles available in English, published from 2010 and which include empirical data linking performance in selection tools with assessment and dropout outcomes of undergraduate entry medical programs. Systematic reviews, meta-analyses, general opinion pieces, or commentaries were excluded. Effect sizes (ESs) of the predictability of academic and clinical performance within and by the end of the medicine program were extracted, and the pooled ESs were presented.
Results
Sixty-seven out of 2,212 articles were included, which yielded 236 ESs. Previous academic achievement predicted medical program academic performance (Cohen’s d=0.697 in early program; 0.619 in end of program) and clinical exams (0.545 in end of program). Within aptitude tests, verbal reasoning and quantitative reasoning predicted academic achievement in the early program and in the last years (0.704 & 0.643, respectively). Overall aptitude tests predicted academic achievement in both the early and last years (0.550 & 0.371, respectively). Neither panel interviews, multiple mini-interviews, nor situational judgement tests (SJT) yielded statistically significant pooled ES.
Conclusion
Current evidence suggests that learning outcomes are predicted by previous academic achievement and aptitude tests. The predictive value of SJT and topics such as selection algorithms, features of interview (e.g., content of the questions) and the way the interviewers’ reports are used, warrant further research.
Research articles
Impact of a change from A–F grading to honors/pass/fail grading on academic performance at Yonsei University College of Medicine in Korea: a cross-sectional serial mediation analysis  
Min-Kyeong Kim, Hae Won Kim
J Educ Eval Health Prof. 2024;21:20.   Published online August 16, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.20
  • 186 View
  • 142 Download
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to explore how the grading system affected medical students’ academic performance based on their perceptions of the learning environment and intrinsic motivation in the context of changing from norm-referenced A–F grading to criterion-referenced honors/pass/fail grading.
Methods
The study involved 238 second-year medical students from 2014 (n=127, A–F grading) and 2015 (n=111, honors/pass/fail grading) at Yonsei University College of Medicine in Korea. Scores on the Dundee Ready Education Environment Measure, the Academic Motivation Scale, and the Basic Medical Science Examination were used to measure overall learning environment perceptions, intrinsic motivation, and academic performance, respectively. Serial mediation analysis was conducted to examine the pathways between the grading system and academic performance, focusing on the mediating roles of student perceptions and intrinsic motivation.
Results
The honors/pass/fail grading class students reported more positive perceptions of the learning environment, higher intrinsic motivation, and better academic performance than the A–F grading class students. Mediation analysis demonstrated a serial mediation effect between the grading system and academic performance through learning environment perceptions and intrinsic motivation. Student perceptions and intrinsic motivation did not independently mediate the relationship between the grading system and performance.
Conclusion
Reducing the number of grades and eliminating rank-based grading might have created an affirming learning environment that fulfills basic psychological needs and reinforces the intrinsic motivation linked to academic performance. The cumulative effect of these 2 mediators suggests that a comprehensive approach should be used to understand student performance.
Comparison of real data and simulated data analysis of a stopping rule based on the standard error of measurement in computerized adaptive testing for medical examinations in Korea: a psychometric study  
Dong Gi Seo, Jeongwook Choi, Jinha Kim
J Educ Eval Health Prof. 2024;21:18.   Published online July 9, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.18
  • 473 View
  • 223 Download
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to compare and evaluate the efficiency and accuracy of computerized adaptive testing (CAT) under 2 stopping rules (standard error of measurement [SEM]=0.3 and 0.25) using both real and simulated data in medical examinations in Korea.
Methods
This study employed post-hoc simulation and real data analysis to explore the optimal stopping rule for CAT in medical examinations. The real data were obtained from the responses of 3rd-year medical students during examinations in 2020 at Hallym University College of Medicine. Simulated data were generated using estimated parameters from a real item bank in R. Outcome variables included the number of examinees’ passing or failing with SEM values of 0.25 and 0.30, the number of items administered, and the correlation. The consistency of real CAT result was evaluated by examining consistency of pass or fail based on a cut score of 0.0. The efficiency of all CAT designs was assessed by comparing the average number of items administered under both stopping rules.
Results
Both SEM 0.25 and SEM 0.30 provided a good balance between accuracy and efficiency in CAT. The real data showed minimal differences in pass/fail outcomes between the 2 SEM conditions, with a high correlation (r=0.99) between ability estimates. The simulation results confirmed these findings, indicating similar average item numbers between real and simulated data.
Conclusion
The findings suggest that both SEM 0.25 and 0.30 are effective termination criteria in the context of the Rasch model, balancing accuracy and efficiency in CAT.
Performance of GPT-3.5 and GPT-4 on standardized urology knowledge assessment items in the United States: a descriptive study
Max Samuel Yudovich, Elizaveta Makarova, Christian Michael Hague, Jay Dilip Raman
J Educ Eval Health Prof. 2024;21:17.   Published online July 8, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.17
  • 807 View
  • 210 Download
  • 1 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to evaluate the performance of Chat Generative Pre-Trained Transformer (ChatGPT) with respect to standardized urology multiple-choice items in the United States.
Methods
In total, 700 multiple-choice urology board exam-style items were submitted to GPT-3.5 and GPT-4, and responses were recorded. Items were categorized based on topic and question complexity (recall, interpretation, and problem-solving). The accuracy of GPT-3.5 and GPT-4 was compared across item types in February 2024.
Results
GPT-4 answered 44.4% of items correctly compared to 30.9% for GPT-3.5 (P<0.00001). GPT-4 (vs. GPT-3.5) had higher accuracy with urologic oncology (43.8% vs. 33.9%, P=0.03), sexual medicine (44.3% vs. 27.8%, P=0.046), and pediatric urology (47.1% vs. 27.1%, P=0.012) items. Endourology (38.0% vs. 25.7%, P=0.15), reconstruction and trauma (29.0% vs. 21.0%, P=0.41), and neurourology (49.0% vs. 33.3%, P=0.11) items did not show significant differences in performance across versions. GPT-4 also outperformed GPT-3.5 with respect to recall (45.9% vs. 27.4%, P<0.00001), interpretation (45.6% vs. 31.5%, P=0.0005), and problem-solving (41.8% vs. 34.5%, P=0.56) type items. This difference was not significant for the higher-complexity items.
Conclusions
ChatGPT performs relatively poorly on standardized multiple-choice urology board exam-style items, with GPT-4 outperforming GPT-3.5. The accuracy was below the proposed minimum passing standards for the American Board of Urology’s Continuing Urologic Certification knowledge reinforcement activity (60%). As artificial intelligence progresses in complexity, ChatGPT may become more capable and accurate with respect to board examination items. For now, its responses should be scrutinized.

Citations

Citations to this article as recorded by  
  • From GPT-3.5 to GPT-4.o: A Leap in AI’s Medical Exam Performance
    Markus Kipp
    Information.2024; 15(9): 543.     CrossRef
Educational/Faculty development material
The 6 degrees of curriculum integration in medical education in the United States  
Julie Youm, Jennifer Christner, Kevin Hittle, Paul Ko, Cinda Stone, Angela D. Blood, Samara Ginzburg
J Educ Eval Health Prof. 2024;21:15.   Published online June 13, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.15
  • 1,090 View
  • 233 Download
AbstractAbstract PDFSupplementary Material
Despite explicit expectations and accreditation requirements for integrated curriculum, there needs to be more clarity around an accepted common definition, best practices for implementation, and criteria for successful curriculum integration. To address the lack of consensus surrounding integration, we reviewed the literature and herein propose a definition for curriculum integration for the medical education audience. We further believe that medical education is ready to move beyond “horizontal” (1-dimensional) and “vertical” (2-dimensional) integration and propose a model of “6 degrees of curriculum integration” to expand the 2-dimensional concept for future designs of medical education programs and best prepare learners to meet the needs of patients. These 6 degrees include: interdisciplinary, timing and sequencing, instruction and assessment, incorporation of basic and clinical sciences, knowledge and skills-based competency progression, and graduated responsibilities in patient care. We encourage medical educators to look beyond 2-dimensional integration to this holistic and interconnected representation of curriculum integration.
Research articles
Redesigning a faculty development program for clinical teachers in Indonesia: a before-and-after study
Rita Mustika, Nadia Greviana, Dewi Anggraeni Kusumoningrum, Anyta Pinasthika
J Educ Eval Health Prof. 2024;21:14.   Published online June 13, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.14
  • 628 View
  • 217 Download
AbstractAbstract PDFSupplementary Material
Purpose
Faculty development (FD) is important to support teaching, including for clinical teachers. Faculty of Medicine Universitas Indonesia (FMUI) has conducted a clinical teacher training program developed by the medical education department since 2008, both for FMUI teachers and for those at other centers in Indonesia. However, participation is often challenging due to clinical, administrative, and research obligations. The coronavirus disease 2019 pandemic amplified the urge to transform this program. This study aimed to redesign and evaluate an FD program for clinical teachers that focuses on their needs and current situation.
Methods
A 5-step design thinking framework (empathizing, defining, ideating, prototyping, and testing) was used with a pre/post-test design. Design thinking made it possible to develop a participant-focused program, while the pre/post-test design enabled an assessment of the program’s effectiveness.
Results
Seven medical educationalists and 4 senior and 4 junior clinical teachers participated in a group discussion in the empathize phase of design thinking. The research team formed a prototype of a 3-day blended learning course, with an asynchronous component using the Moodle learning management system and a synchronous component using the Zoom platform. Pre-post-testing was done in 2 rounds, with 107 and 330 participants, respectively. Evaluations of the first round provided feedback for improving the prototype for the second round.
Conclusion
Design thinking enabled an innovative-creative process of redesigning FD that emphasized participants’ needs. The pre/post-testing showed that the program was effective. Combining asynchronous and synchronous learning expands access and increases flexibility. This approach could also apply to other FD programs.
Development of examination objectives for the Korean paramedic and emergency medical technician examination: a survey study  
Tai-hwan Uhm, Heakyung Choi, Seok Hwan Hong, Hyungsub Kim, Minju Kang, Keunyoung Kim, Hyejin Seo, Eunyoung Ki, Hyeryeong Lee, Heejeong Ahn, Uk-jin Choi, Sang Woong Park
J Educ Eval Health Prof. 2024;21:13.   Published online June 12, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.13
  • 650 View
  • 185 Download
AbstractAbstract PDFSupplementary Material
Purpose
The duties of paramedics and emergency medical technicians (P&EMTs) are continuously changing due to developments in medical systems. This study presents evaluation goals for P&EMTs by analyzing their work, especially the tasks that new P&EMTs (with less than 3 years’ experience) find difficult, to foster the training of P&EMTs who could adapt to emergency situations after graduation.
Methods
A questionnaire was created based on prior job analyses of P&EMTs. The survey questions were reviewed through focus group interviews, from which 253 task elements were derived. A survey was conducted from July 10, 2023 to October 13, 2023 on the frequency, importance, and difficulty of the 6 occupations in which P&EMTs were employed.
Results
The P&EMTs’ most common tasks involved obtaining patients’ medical histories and measuring vital signs, whereas the most important task was cardiopulmonary resuscitation (CPR). The task elements that the P&EMTs found most difficult were newborn delivery and infant CPR. New paramedics reported that treating patients with fractures, poisoning, and childhood fever was difficult, while new EMTs reported that they had difficulty keeping diaries, managing ambulances, and controlling infection.
Conclusion
Communication was the most important item for P&EMTs, whereas CPR was the most important skill. It is important for P&EMTs to have knowledge of all tasks; however, they also need to master frequently performed tasks and those that pose difficulties in the field. By deriving goals for evaluating P&EMTs, changes could be made to their education, thereby making it possible to train more capable P&EMTs.
Challenges and potential improvements in the Accreditation Standards of the Korean Institute of Medical Education and Evaluation 2019 (ASK2019) derived through meta-evaluation: a cross-sectional study  
Yoonjung Lee, Min-jung Lee, Junmoo Ahn, Chungwon Ha, Ye Ji Kang, Cheol Woong Jung, Dong-Mi Yoo, Jihye Yu, Seung-Hee Lee
J Educ Eval Health Prof. 2024;21:8.   Published online April 2, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.8
  • 1,080 View
  • 297 Download
  • 1 Web of Science
  • 1 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to identify challenges and potential improvements in Korea's medical education accreditation process according to the Accreditation Standards of the Korean Institute of Medical Education and Evaluation 2019 (ASK2019). Meta-evaluation was conducted to survey the experiences and perceptions of stakeholders, including self-assessment committee members, site visit committee members, administrative staff, and medical school professors.
Methods
A cross-sectional study was conducted using surveys sent to 40 medical schools. The 332 participants included self-assessment committee members, site visit team members, administrative staff, and medical school professors. The t-test, one-way analysis of variance and the chi-square test were used to analyze and compare opinions on medical education accreditation between the categories of participants.
Results
Site visit committee members placed greater importance on the necessity of accreditation than faculty members. A shared positive view on accreditation’s role in improving educational quality was seen among self-evaluation committee members and professors. Administrative staff highly regarded the Korean Institute of Medical Education and Evaluation’s reliability and objectivity, unlike the self-evaluation committee members. Site visit committee members positively perceived the clarity of accreditation standards, differing from self-assessment committee members. Administrative staff were most optimistic about implementing standards. However, the accreditation process encountered challenges, especially in duplicating content and preparing self-evaluation reports. Finally, perceptions regarding the accuracy of final site visit reports varied significantly between the self-evaluation committee members and the site visit committee members.
Conclusion
This study revealed diverse views on medical education accreditation, highlighting the need for improved communication, expectation alignment, and stakeholder collaboration to refine the accreditation process and quality.

Citations

Citations to this article as recorded by  
  • The new placement of 2,000 entrants at Korean medical schools in 2025: is the government’s policy evidence-based?
    Sun Huh
    The Ewha Medical Journal.2024;[Epub]     CrossRef
Development and psychometric evaluation of a 360-degree evaluation instrument to assess medical students’ performance in clinical settings at the emergency medicine department in Iran: a methodological study  
Golnaz Azami, Sanaz Aazami, Boshra Ebrahimy, Payam Emami
J Educ Eval Health Prof. 2024;21:7.   Published online April 1, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.7
  • 1,058 View
  • 234 Download
AbstractAbstract PDFSupplementary Material
Background
In the Iranian context, no 360-degree evaluation tool has been developed to assess the performance of prehospital medical emergency students in clinical settings. This article describes the development of a 360-degree evaluation tool and presents its first psychometric evaluation.
Methods
There were 2 steps in this study: step 1 involved developing the instrument (i.e., generating the items) and step 2 constituted the psychometric evaluation of the instrument. We performed exploratory and confirmatory factor analyses and also evaluated the instrument’s face, content, and convergent validity and reliability.
Results
The instrument contains 55 items across 6 domains, including leadership, management, and teamwork (19 items), consciousness and responsiveness (14 items), clinical and interpersonal communication skills (8 items), integrity (7 items), knowledge and accountability (4 items), and loyalty and transparency (3 items). The instrument was confirmed to be a valid measure, as the 6 domains had eigenvalues over Kaiser’s criterion of 1 and in combination explained 60.1% of the variance (Bartlett’s test of sphericity [1,485]=19,867.99, P<0.01). Furthermore, this study provided evidence for the instrument’s convergent validity and internal consistency (α=0.98), suggesting its suitability for assessing student performance.
Conclusion
We found good evidence for the validity and reliability of the instrument. Our instrument can be used to make future evaluations of student performance in the clinical setting more structured, transparent, informative, and comparable.
Review
Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review  
Xiaojun Xu, Yixiao Chen, Jing Miao
J Educ Eval Health Prof. 2024;21:6.   Published online March 15, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.6
  • 2,938 View
  • 440 Download
  • 5 Web of Science
  • 7 Crossref
AbstractAbstract PDFSupplementary Material
Background
ChatGPT is a large language model (LLM) based on artificial intelligence (AI) capable of responding in multiple languages and generating nuanced and highly complex responses. While ChatGPT holds promising applications in medical education, its limitations and potential risks cannot be ignored.
Methods
A scoping review was conducted for English articles discussing ChatGPT in the context of medical education published after 2022. A literature search was performed using PubMed/MEDLINE, Embase, and Web of Science databases, and information was extracted from the relevant studies that were ultimately included.
Results
ChatGPT exhibits various potential applications in medical education, such as providing personalized learning plans and materials, creating clinical practice simulation scenarios, and assisting in writing articles. However, challenges associated with academic integrity, data accuracy, and potential harm to learning were also highlighted in the literature. The paper emphasizes certain recommendations for using ChatGPT, including the establishment of guidelines. Based on the review, 3 key research areas were proposed: cultivating the ability of medical students to use ChatGPT correctly, integrating ChatGPT into teaching activities and processes, and proposing standards for the use of AI by medical students.
Conclusion
ChatGPT has the potential to transform medical education, but careful consideration is required for its full integration. To harness the full potential of ChatGPT in medical education, attention should not only be given to the capabilities of AI but also to its impact on students and teachers.

Citations

Citations to this article as recorded by  
  • Chatbots in neurology and neuroscience: Interactions with students, patients and neurologists
    Stefano Sandrone
    Brain Disorders.2024; 15: 100145.     CrossRef
  • ChatGPT in education: unveiling frontiers and future directions through systematic literature review and bibliometric analysis
    Buddhini Amarathunga
    Asian Education and Development Studies.2024;[Epub]     CrossRef
  • Evaluating the performance of ChatGPT-3.5 and ChatGPT-4 on the Taiwan plastic surgery board examination
    Ching-Hua Hsieh, Hsiao-Yun Hsieh, Hui-Ping Lin
    Heliyon.2024; 10(14): e34851.     CrossRef
  • Preparing for Artificial General Intelligence (AGI) in Health Professions Education: AMEE Guide No. 172
    Ken Masters, Anne Herrmann-Werner, Teresa Festl-Wietek, David Taylor
    Medical Teacher.2024; : 1.     CrossRef
  • A Comparative Analysis of ChatGPT and Medical Faculty Graduates in Medical Specialization Exams: Uncovering the Potential of Artificial Intelligence in Medical Education
    Gülcan Gencer, Kerem Gencer
    Cureus.2024;[Epub]     CrossRef
  • Research ethics and issues regarding the use of ChatGPT-like artificial intelligence platforms by authors and reviewers: a narrative review
    Sang-Jun Kim
    Science Editing.2024; 11(2): 96.     CrossRef
  • Innovation Off the Bat: Bridging the ChatGPT Gap in Digital Competence among English as a Foreign Language Teachers
    Gulsara Urazbayeva, Raisa Kussainova, Aikumis Aibergen, Assel Kaliyeva, Gulnur Kantayeva
    Education Sciences.2024; 14(9): 946.     CrossRef
Research articles
Discovering social learning ecosystems during clinical clerkship from United States medical students’ feedback encounters: a content analysis  
Anna Therese Cianciolo, Heeyoung Han, Lydia Anne Howes, Debra Lee Klamen, Sophia Matos
J Educ Eval Health Prof. 2024;21:5.   Published online February 28, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.5
  • 1,270 View
  • 263 Download
AbstractAbstract PDFSupplementary Material
Purpose
We examined United States medical students’ self-reported feedback encounters during clerkship training to better understand in situ feedback practices. Specifically, we asked: Who do students receive feedback from, about what, when, where, and how do they use it? We explored whether curricular expectations for preceptors’ written commentary aligned with feedback as it occurs naturalistically in the workplace.
Methods
This study occurred from July 2021 to February 2022 at Southern Illinois University School of Medicine. We used qualitative survey-based experience sampling to gather students’ accounts of their feedback encounters in 8 core specialties. We analyzed the who, what, when, where, and why of 267 feedback encounters reported by 11 clerkship students over 30 weeks. Code frequencies were mapped qualitatively to explore patterns in feedback encounters.
Results
Clerkship feedback occurs in patterns apparently related to the nature of clinical work in each specialty. These patterns may be attributable to each specialty’s “social learning ecosystem”—the distinctive learning environment shaped by the social and material aspects of a given specialty’s work, which determine who preceptors are, what students do with preceptors, and what skills or attributes matter enough to preceptors to comment on.
Conclusion
Comprehensive, standardized expectations for written feedback across specialties conflict with the reality of workplace-based learning. Preceptors may be better able—and more motivated—to document student performance that occurs as a natural part of everyday work. Nurturing social learning ecosystems could facilitate workplace-based learning such that, across specialties, students acquire a comprehensive clinical skillset appropriate for graduation.
Negative effects on medical students’ scores for clinical performance during the COVID-19 pandemic in Taiwan: a comparative study  
Eunice Jia-Shiow Yuan, Shiau-Shian Huang, Chia-An Hsu, Jiing-Feng Lirng, Tzu-Hao Li, Chia-Chang Huang, Ying-Ying Yang, Chung-Pin Li, Chen-Huan Chen
J Educ Eval Health Prof. 2023;20:37.   Published online December 26, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.37
  • 1,506 View
  • 95 Download
  • 1 Web of Science
  • 1 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
Coronavirus disease 2019 (COVID-19) has heavily impacted medical clinical education in Taiwan. Medical curricula have been altered to minimize exposure and limit transmission. This study investigated the effect of COVID-19 on Taiwanese medical students’ clinical performance using online standardized evaluation systems and explored the factors influencing medical education during the pandemic.
Methods
Medical students were scored from 0 to 100 based on their clinical performance from 1/1/2018 to 6/31/2021. The students were placed into pre-COVID-19 (before 2/1/2020) and midst-COVID-19 (on and after 2/1/2020) groups. Each group was further categorized into COVID-19-affected specialties (pulmonary, infectious, and emergency medicine) and other specialties. Generalized estimating equations (GEEs) were used to compare and examine the effects of relevant variables on student performance.
Results
In total, 16,944 clinical scores were obtained for COVID-19-affected specialties and other specialties. For the COVID-19-affected specialties, the midst-COVID-19 score (88.513.52) was significantly lower than the pre-COVID-19 score (90.143.55) (P<0.0001). For the other specialties, the midst-COVID-19 score (88.323.68) was also significantly lower than the pre-COVID-19 score (90.063.58) (P<0.0001). There were 1,322 students (837 males and 485 females). Male students had significantly lower scores than female students (89.333.68 vs. 89.993.66, P=0.0017). GEE analysis revealed that the COVID-19 pandemic (unstandardized beta coefficient=-1.99, standard error [SE]=0.13, P<0.0001), COVID-19-affected specialties (B=0.26, SE=0.11, P=0.0184), female students (B=1.10, SE=0.20, P<0.0001), and female attending physicians (B=-0.19, SE=0.08, P=0.0145) were independently associated with students’ scores.
Conclusion
COVID-19 negatively impacted medical students' clinical performance, regardless of their specialty. Female students outperformed male students, irrespective of the pandemic.

Citations

Citations to this article as recorded by  
  • The emergence of generative artificial intelligence platforms in 2023, journal metrics, appreciation to reviewers and volunteers, and obituary
    Sun Huh
    Journal of Educational Evaluation for Health Professions.2024; 21: 9.     CrossRef
Effect of motion-graphic video-based training on the performance of operating room nurse students in cataract surgery in Iran: a randomized controlled study  
Behnaz Fatahi, Samira Fatahi, Sohrab Nosrati, Masood Bagheri
J Educ Eval Health Prof. 2023;20:34.   Published online November 28, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.34
  • 2,373 View
  • 106 Download
AbstractAbstract PDFSupplementary Material
Purpose
The present study was conducted to determine the effect of motion-graphic video-based training on the performance of operating room nurse students in cataract surgery using phacoemulsification at Kermanshah University of Medical Sciences in Iran.
Methods
This was a randomized controlled study conducted among 36 students training to become operating room nurses. The control group only received routine training, and the intervention group received motion-graphic video-based training on the scrub nurse’s performance in cataract surgery in addition to the educator’s training. The performance of the students in both groups as scrub nurses was measured through a researcher-made checklist in a pre-test and a post-test.
Results
The mean scores for performance in the pre-test and post-test were 17.83 and 26.44 in the control group and 18.33 and 50.94 in the intervention group, respectively, and a significant difference was identified between the mean scores of the pre- and post-test in both groups (P=0.001). The intervention also led to a significant increase in the mean performance score in the intervention group compared to the control group (P=0.001).
Conclusion
Considering the significant difference in the performance score of the intervention group compared to the control group, motion-graphic video-based training had a positive effect on the performance of operating room nurse students, and such training can be used to improve clinical training.
Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical Examination: a cross-sectional study  
Betzy Clariza Torres-Zegarra, Wagner Rios-Garcia, Alvaro Micael Ñaña-Cordova, Karen Fatima Arteaga-Cisneros, Xiomara Cristina Benavente Chalco, Marina Atena Bustamante Ordoñez, Carlos Jesus Gutierrez Rios, Carlos Alberto Ramos Godoy, Kristell Luisa Teresa Panta Quezada, Jesus Daniel Gutierrez-Arratia, Javier Alejandro Flores-Cohaila
J Educ Eval Health Prof. 2023;20:30.   Published online November 20, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.30
  • 2,223 View
  • 198 Download
  • 9 Web of Science
  • 9 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
We aimed to describe the performance and evaluate the educational value of justifications provided by artificial intelligence chatbots, including GPT-3.5, GPT-4, Bard, Claude, and Bing, on the Peruvian National Medical Licensing Examination (P-NLME).
Methods
This was a cross-sectional analytical study. On July 25, 2023, each multiple-choice question (MCQ) from the P-NLME was entered into each chatbot (GPT-3, GPT-4, Bing, Bard, and Claude) 3 times. Then, 4 medical educators categorized the MCQs in terms of medical area, item type, and whether the MCQ required Peru-specific knowledge. They assessed the educational value of the justifications from the 2 top performers (GPT-4 and Bing).
Results
GPT-4 scored 86.7% and Bing scored 82.2%, followed by Bard and Claude, and the historical performance of Peruvian examinees was 55%. Among the factors associated with correct answers, only MCQs that required Peru-specific knowledge had lower odds (odds ratio, 0.23; 95% confidence interval, 0.09–0.61), whereas the remaining factors showed no associations. In assessing the educational value of justifications provided by GPT-4 and Bing, neither showed any significant differences in certainty, usefulness, or potential use in the classroom.
Conclusion
Among chatbots, GPT-4 and Bing were the top performers, with Bing performing better at Peru-specific MCQs. Moreover, the educational value of justifications provided by the GPT-4 and Bing could be deemed appropriate. However, it is essential to start addressing the educational value of these chatbots, rather than merely their performance on examinations.

Citations

Citations to this article as recorded by  
  • Performance of GPT-4V in Answering the Japanese Otolaryngology Board Certification Examination Questions: Evaluation Study
    Masao Noda, Takayoshi Ueno, Ryota Koshu, Yuji Takaso, Mari Dias Shimada, Chizu Saito, Hisashi Sugimoto, Hiroaki Fushiki, Makoto Ito, Akihiro Nomura, Tomokazu Yoshizaki
    JMIR Medical Education.2024; 10: e57054.     CrossRef
  • Response to Letter to the Editor re: “Artificial Intelligence Versus Expert Plastic Surgeon: Comparative Study Shows ChatGPT ‘Wins' Rhinoplasty Consultations: Should We Be Worried? [1]” by Durairaj et al
    Kay Durairaj, Omer Baker
    Facial Plastic Surgery & Aesthetic Medicine.2024; 26(3): 276.     CrossRef
  • Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
    Xiaojun Xu, Yixiao Chen, Jing Miao
    Journal of Educational Evaluation for Health Professions.2024; 21: 6.     CrossRef
  • Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis
    Mingxin Liu, Tsuyoshi Okuhara, XinYi Chang, Ritsuko Shirabe, Yuriko Nishiie, Hiroko Okada, Takahiro Kiuchi
    Journal of Medical Internet Research.2024; 26: e60807.     CrossRef
  • Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study
    Giacomo Rossettini, Lia Rodeghiero, Federica Corradi, Chad Cook, Paolo Pillastrini, Andrea Turolla, Greta Castellini, Stefania Chiappinotto, Silvia Gianola, Alvisa Palese
    BMC Medical Education.2024;[Epub]     CrossRef
  • Evaluating the competency of ChatGPT in MRCP Part 1 and a systematic literature review of its capabilities in postgraduate medical assessments
    Oliver Vij, Henry Calver, Nikki Myall, Mrinalini Dey, Koushan Kouranloo, Thiago P. Fernandes
    PLOS ONE.2024; 19(7): e0307372.     CrossRef
  • Large Language Models in Pediatric Education: Current Uses and Future Potential
    Srinivasan Suresh, Sanghamitra M. Misra
    Pediatrics.2024;[Epub]     CrossRef
  • Comparison of the Performance of ChatGPT, Claude and Bard in Support of Myopia Prevention and Control
    Yan Wang, Lihua Liang, Ran Li, Yihua Wang, Changfu Hao
    Journal of Multidisciplinary Healthcare.2024; Volume 17: 3917.     CrossRef
  • Information amount, accuracy, and relevance of generative artificial intelligence platforms’ answers regarding learning objectives of medical arthropodology evaluated in English and Korean queries in December 2023: a descriptive study
    Hyunju Lee, Soobin Park
    Journal of Educational Evaluation for Health Professions.2023; 20: 39.     CrossRef
Medical students’ patterns of using ChatGPT as a feedback tool and perceptions of ChatGPT in a Leadership and Communication course in Korea: a cross-sectional study  
Janghee Park
J Educ Eval Health Prof. 2023;20:29.   Published online November 10, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.29
  • 2,403 View
  • 194 Download
  • 6 Web of Science
  • 6 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to analyze patterns of using ChatGPT before and after group activities and to explore medical students’ perceptions of ChatGPT as a feedback tool in the classroom.
Methods
The study included 99 2nd-year pre-medical students who participated in a “Leadership and Communication” course from March to June 2023. Students engaged in both individual and group activities related to negotiation strategies. ChatGPT was used to provide feedback on their solutions. A survey was administered to assess students’ perceptions of ChatGPT’s feedback, its use in the classroom, and the strengths and challenges of ChatGPT from May 17 to 19, 2023.
Results
The students responded by indicating that ChatGPT’s feedback was helpful, and revised and resubmitted their group answers in various ways after receiving feedback. The majority of respondents expressed agreement with the use of ChatGPT during class. The most common response concerning the appropriate context of using ChatGPT’s feedback was “after the first round of discussion, for revisions.” There was a significant difference in satisfaction with ChatGPT’s feedback, including correctness, usefulness, and ethics, depending on whether or not ChatGPT was used during class, but there was no significant difference according to gender or whether students had previous experience with ChatGPT. The strongest advantages were “providing answers to questions” and “summarizing information,” and the worst disadvantage was “producing information without supporting evidence.”
Conclusion
The students were aware of the advantages and disadvantages of ChatGPT, and they had a positive attitude toward using ChatGPT in the classroom.

Citations

Citations to this article as recorded by  
  • Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
    Xiaojun Xu, Yixiao Chen, Jing Miao
    Journal of Educational Evaluation for Health Professions.2024; 21: 6.     CrossRef
  • Embracing ChatGPT for Medical Education: Exploring Its Impact on Doctors and Medical Students
    Yijun Wu, Yue Zheng, Baijie Feng, Yuqi Yang, Kai Kang, Ailin Zhao
    JMIR Medical Education.2024; 10: e52483.     CrossRef
  • Integration of ChatGPT Into a Course for Medical Students: Explorative Study on Teaching Scenarios, Students’ Perception, and Applications
    Anita V Thomae, Claudia M Witt, Jürgen Barth
    JMIR Medical Education.2024; 10: e50545.     CrossRef
  • A cross sectional investigation of ChatGPT-like large language models application among medical students in China
    Guixia Pan, Jing Ni
    BMC Medical Education.2024;[Epub]     CrossRef
  • ChatGPT and Clinical Training: Perception, Concerns, and Practice of Pharm-D Students
    Mohammed Zawiah, Fahmi Al-Ashwal, Lobna Gharaibeh, Rana Abu Farha, Karem Alzoubi, Khawla Abu Hammour, Qutaiba A Qasim, Fahd Abrah
    Journal of Multidisciplinary Healthcare.2023; Volume 16: 4099.     CrossRef
  • Information amount, accuracy, and relevance of generative artificial intelligence platforms’ answers regarding learning objectives of medical arthropodology evaluated in English and Korean queries in December 2023: a descriptive study
    Hyunju Lee, Soobin Park
    Journal of Educational Evaluation for Health Professions.2023; 20: 39.     CrossRef

JEEHP : Journal of Educational Evaluation for Health Professions
TOP