Journal of Educational Evaluation for Health Professions

Review

Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review: Xiaojun Xu, Yixiao Chen, Jing Miao; J Educ Eval Health Prof. 2024;21:6. Published online March 15, 2024; DOI: https://doi.org/10.3352/jeehp.2024.21.6

213 View
191 Download

Abstract PDF Supplementary Material: Background
ChatGPT is a large language model (LLM) based on artificial intelligence (AI) capable of responding in multiple languages and generating nuanced and highly complex responses. While ChatGPT holds promising applications in medical education, its limitations and potential risks cannot be ignored.
Methods
A scoping review was conducted for English articles discussing ChatGPT in the context of medical education published after 2022. A literature search was performed using PubMed/MEDLINE, Embase, and Web of Science databases, and information was extracted from the relevant studies that were ultimately included.
Results
ChatGPT exhibits various potential applications in medical education, such as providing personalized learning plans and materials, creating clinical practice simulation scenarios, and assisting in writing articles. However, challenges associated with academic integrity, data accuracy, and potential harm to learning were also highlighted in the literature. The paper emphasizes certain recommendations for using ChatGPT, including the establishment of guidelines. Based on the review, 3 key research areas were proposed: cultivating the ability of medical students to use ChatGPT correctly, integrating ChatGPT into teaching activities and processes, and proposing standards for the use of AI by medical students.
Conclusion
ChatGPT has the potential to transform medical education, but careful consideration is required for its full integration. To harness the full potential of ChatGPT in medical education, attention should not only be given to the capabilities of AI but also to its impact on students and teachers.

Research articles

Discovering social learning ecosystems during clinical clerkship from United States medical students’ feedback encounters: a content analysis: Anna Therese Cianciolo, Heeyoung Han, Lydia Anne Howes, Debra Lee Klamen, Sophia Matos; J Educ Eval Health Prof. 2024;21:5. Published online February 28, 2024; DOI: https://doi.org/10.3352/jeehp.2024.21.5

568 View
123 Download

Abstract PDF Supplementary Material: Purpose
We examined United States medical students’ self-reported feedback encounters during clerkship training to better understand in situ feedback practices. Specifically, we asked: Who do students receive feedback from, about what, when, where, and how do they use it? We explored whether curricular expectations for preceptors’ written commentary aligned with feedback as it occurs naturalistically in the workplace.
Methods
This study occurred from July 2021 to February 2022 at Southern Illinois University School of Medicine. We used qualitative survey-based experience sampling to gather students’ accounts of their feedback encounters in 8 core specialties. We analyzed the who, what, when, where, and why of 267 feedback encounters reported by 11 clerkship students over 30 weeks. Code frequencies were mapped qualitatively to explore patterns in feedback encounters.
Results
Clerkship feedback occurs in patterns apparently related to the nature of clinical work in each specialty. These patterns may be attributable to each specialty’s “social learning ecosystem”—the distinctive learning environment shaped by the social and material aspects of a given specialty’s work, which determine who preceptors are, what students do with preceptors, and what skills or attributes matter enough to preceptors to comment on.
Conclusion
Comprehensive, standardized expectations for written feedback across specialties conflict with the reality of workplace-based learning. Preceptors may be better able—and more motivated—to document student performance that occurs as a natural part of everyday work. Nurturing social learning ecosystems could facilitate workplace-based learning such that, across specialties, students acquire a comprehensive clinical skillset appropriate for graduation.

ChatGPT (GPT-4) passed the Japanese National License Examination for Pharmacists in 2022, answering all items including those with diagrams: a descriptive study: Hiroyasu Sato, Katsuhiko Ogasawara; J Educ Eval Health Prof. 2024;21:4. Published online February 28, 2024; DOI: https://doi.org/10.3352/jeehp.2024.21.4

707 View
126 Download

Abstract PDF Supplementary Material: Purpose
The objective of this study was to assess the performance of ChatGPT (GPT-4) on all items, including those with diagrams, in the Japanese National License Examination for Pharmacists (JNLEP) and compare it with the previous GPT-3.5 model’s performance.
Methods
The 107th JNLEP, conducted in 2022, with 344 items input into the GPT-4 model, was targeted for this study. Separately, 284 items, excluding those with diagrams, were entered into the GPT-3.5 model. The answers were categorized and analyzed to determine accuracy rates based on categories, subjects, and presence or absence of diagrams. The accuracy rates were compared to the main passing criteria (overall accuracy rate ≥62.9%).
Results
The overall accuracy rate for all items in the 107th JNLEP in GPT-4 was 72.5%, successfully meeting all the passing criteria. For the set of items without diagrams, the accuracy rate was 80.0%, which was significantly higher than that of the GPT-3.5 model (43.5%). The GPT-4 model demonstrated an accuracy rate of 36.1% for items that included diagrams.
Conclusion
Advancements that allow GPT-4 to process images have made it possible for LLMs to answer all items in medical-related license examinations. This study’s findings confirm that ChatGPT (GPT-4) possesses sufficient knowledge to meet the passing criteria.

Development and validity evidence for the resident-led large group teaching assessment instrument in the United States: a methodological study: Ariel Shana Frey-Vogel, Kristina Dzara, Kimberly Anne Gifford, Yoon Soo Park, Justin Berk, Allison Heinly, Darcy Wolcott, Daniel Adam Hall, Shannon Elliott Scott-Vernaglia, Katherine Anne Sparger, Erica Ye-pyng Chung; J Educ Eval Health Prof. 2024;21:3. Published online February 23, 2024; DOI: https://doi.org/10.3352/jeehp.2024.21.3

371 View
112 Download

Abstract PDF Supplementary Material: Purpose
Despite educational mandates to assess resident teaching competence, limited instruments with validity evidence exist for this purpose. Existing instruments do not allow faculty to assess resident-led teaching in a large group format or whether teaching was interactive. This study gathers validity evidence on the use of the Resident-led Large Group Teaching Assessment Instrument (Relate), an instrument used by faculty to assess resident teaching competency. Relate comprises 23 behaviors divided into 6 elements: learning environment, goals and objectives, content of talk, promotion of understanding and retention, session management, and closure.
Methods
Messick’s unified validity framework was used for this study. Investigators used video recordings of resident-led teaching from 3 pediatric residency programs to develop Relate and a rater guidebook. Faculty were trained on instrument use through frame-of-reference training. Resident teaching at all sites was video-recorded during 2018–2019. Two trained faculty raters assessed each video. Descriptive statistics on performance were obtained. Validity evidence sources include: rater training effect (response process), reliability and variability (internal structure), and impact on Milestones assessment (relations to other variables).
Results
Forty-eight videos, from 16 residents, were analyzed. Rater training improved inter-rater reliability from 0.04 to 0.64. The Φ-coefficient reliability was 0.50. There was a significant correlation between overall Relate performance and the pediatric teaching Milestone (r=0.34, P=0.019).
Conclusion
Relate provides validity evidence with sufficient reliability to measure resident-led large-group teaching competence.

Editorial

Presidential address 2024: the expansion of computer-based testing to numerous health professions licensing examinations in Korea, preparation of computer-based practical tests, and adoption of the medical metaverse: Hyunjoo Pai; J Educ Eval Health Prof. 2024;21:2. Published online February 20, 2024; DOI: https://doi.org/10.3352/jeehp.2024.21.2

389 View
42 Download

PDF

Research article

Importance, performance frequency, and predicted future importance of dietitians’ jobs by practicing dietitians in Korea: a survey study: Cheongmin Sohn, Sooyoun Kwon, Won Gyoung Kim, Kyung-Eun Lee, Sun-Young Lee, Seungmin Lee; J Educ Eval Health Prof. 2024;21:1. Published online January 2, 2024; DOI: https://doi.org/10.3352/jeehp.2024.21.1

768 View
167 Download

Abstract PDF Supplementary Material: Purpose
This study aimed to explore the perceptions held by practicing dietitians of the importance of their tasks performed in current work environments, the frequency at which those tasks are performed, and predictions about the importance of those tasks in future work environments.
Methods
This was a cross-sectional survey study. An online survey was administered to 350 practicing dietitians. They were asked to assess the importance, performance frequency, and predicted changes in the importance of 27 tasks using a 5-point scale. Descriptive statistics were calculated, and the means of the variables were compared across categorized work environments using analysis of variance.
Results
The importance scores of all surveyed tasks were higher than 3.0, except for the marketing management task. Self-development, nutrition education/counseling, menu planning, food safety management, and documentation/data management were all rated higher than 4.0. The highest performance frequency score was related to documentation/data management. The importance scores of all duties, except for professional development, differed significantly by workplace. As for predictions about the future importance of the tasks surveyed, dietitians responded that the importance of all 27 tasks would either remain at current levels or increase in the future.
Conclusion
Twenty-seven tasks were confirmed to represent dietitians’ job functions in various workplaces. These tasks can be used to improve the test specifications of the Korean Dietitian Licensing Examination and the curriculum of dietetic education programs.

Editorial

Editorial policies of Journal of Educational Evaluation for Health Professions on the use of generative artificial intelligence in article writing and peer review: Sun Huh; J Educ Eval Health Prof. 2023;20:40. Published online December 31, 2023; DOI: https://doi.org/10.3352/jeehp.2023.20.40

659 View
98 Download

PDF

Research article

Information amount, accuracy, and relevance of generative artificial intelligence platforms’ answers regarding learning objectives of medical arthropodology evaluated in English and Korean queries in December 2023: a descriptive study: Hyunju Lee, Soobin Park; J Educ Eval Health Prof. 2023;20:39. Published online December 28, 2023; DOI: https://doi.org/10.3352/jeehp.2023.20.39

1,065 View
139 Download
1 Crossref

Abstract PDF Supplementary Material

Purpose
This study assessed the performance of 6 generative artificial intelligence (AI) platforms on the learning objectives of medical arthropodology in a parasitology class in Korea. We examined the AI platforms’ performance by querying in Korean and English to determine their information amount, accuracy, and relevance in prompts in both languages.
Methods
From December 15 to 17, 2023, 6 generative AI platforms—Bard, Bing, Claude, Clova X, GPT-4, and Wrtn—were tested on 7 medical arthropodology learning objectives in English and Korean. Clova X and Wrtn are platforms from Korean companies. Responses were evaluated using specific criteria for the English and Korean queries.
Results
Bard had abundant information but was fourth in accuracy and relevance. GPT-4, with high information content, ranked first in accuracy and relevance. Clova X was 4th in amount but 2nd in accuracy and relevance. Bing provided less information, with moderate accuracy and relevance. Wrtn’s answers were short, with average accuracy and relevance. Claude AI had reasonable information, but lower accuracy and relevance. The responses in English were superior in all aspects. Clova X was notably optimized for Korean, leading in relevance.
Conclusion
In a study of 6 generative AI platforms applied to medical arthropodology, GPT-4 excelled overall, while Clova X, a Korea-based AI product, achieved 100% relevance in Korean queries, the highest among its peers. Utilizing these AI platforms in classrooms improved the authors’ self-efficacy and interest in the subject, offering a positive experience of interacting with generative AI platforms to question and receive information.

Citations

Citations to this article as recorded by

Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
Xiaojun Xu, Yixiao Chen, Jing Miao
Journal of Educational Evaluation for Health Professions.2024; 21: 6. CrossRef

Review

Application of artificial intelligence chatbots, including ChatGPT, in education, scholarly work, programming, and content generation and its prospects: a narrative review: Tae Won Kim; J Educ Eval Health Prof. 2023;20:38. Published online December 27, 2023; DOI: https://doi.org/10.3352/jeehp.2023.20.38

1,601 View
296 Download
1 Web of Science
2 Crossref

Abstract PDF Supplementary Material

This study aims to explore ChatGPT’s (GPT-3.5 version) functionalities, including reinforcement learning, diverse applications, and limitations. ChatGPT is an artificial intelligence (AI) chatbot powered by OpenAI’s Generative Pre-trained Transformer (GPT) model. The chatbot’s applications span education, programming, content generation, and more, demonstrating its versatility. ChatGPT can improve education by creating assignments and offering personalized feedback, as shown by its notable performance in medical exams and the United States Medical Licensing Exam. However, concerns include plagiarism, reliability, and educational disparities. It aids in various research tasks, from design to writing, and has shown proficiency in summarizing and suggesting titles. Its use in scientific writing and language translation is promising, but professional oversight is needed for accuracy and originality. It assists in programming tasks like writing code, debugging, and guiding installation and updates. It offers diverse applications, from cheering up individuals to generating creative content like essays, news articles, and business plans. Unlike search engines, ChatGPT provides interactive, generative responses and understands context, making it more akin to human conversation, in contrast to conventional search engines’ keyword-based, non-interactive nature. ChatGPT has limitations, such as potential bias, dependence on outdated data, and revenue generation challenges. Nonetheless, ChatGPT is considered to be a transformative AI tool poised to redefine the future of generative technology. In conclusion, advancements in AI, such as ChatGPT, are altering how knowledge is acquired and applied, marking a shift from search engines to creativity engines. This transformation highlights the increasing importance of AI literacy and the ability to effectively utilize AI in various domains of life.

Citations

Citations to this article as recorded by

Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
Xiaojun Xu, Yixiao Chen, Jing Miao
Journal of Educational Evaluation for Health Professions.2024; 21: 6. CrossRef
Artificial Intelligence: Fundamentals and Breakthrough Applications in Epilepsy
Wesley Kerr, Sandra Acosta, Patrick Kwan, Gregory Worrell, Mohamad A. Mikati
Epilepsy Currents.2024;[Epub] CrossRef

Research articles

Negative effects on medical students’ scores for clinical performance during the COVID-19 pandemic in Taiwan: a comparative study: Eunice Jia-Shiow Yuan, Shiau-Shian Huang, Chia-An Hsu, Jiing-Feng Lirng, Tzu-Hao Li, Chia-Chang Huang, Ying-Ying Yang, Chung-Pin Li, Chen-Huan Chen; J Educ Eval Health Prof. 2023;20:37. Published online December 26, 2023; DOI: https://doi.org/10.3352/jeehp.2023.20.37

751 View
71 Download

Abstract PDF Supplementary Material: Purpose
Coronavirus disease 2019 (COVID-19) has heavily impacted medical clinical education in Taiwan. Medical curricula have been altered to minimize exposure and limit transmission. This study investigated the effect of COVID-19 on Taiwanese medical students’ clinical performance using online standardized evaluation systems and explored the factors influencing medical education during the pandemic.
Methods
Medical students were scored from 0 to 100 based on their clinical performance from 1/1/2018 to 6/31/2021. The students were placed into pre-COVID-19 (before 2/1/2020) and midst-COVID-19 (on and after 2/1/2020) groups. Each group was further categorized into COVID-19-affected specialties (pulmonary, infectious, and emergency medicine) and other specialties. Generalized estimating equations (GEEs) were used to compare and examine the effects of relevant variables on student performance.
Results
In total, 16,944 clinical scores were obtained for COVID-19-affected specialties and other specialties. For the COVID-19-affected specialties, the midst-COVID-19 score (88.513.52) was significantly lower than the pre-COVID-19 score (90.143.55) (P<0.0001). For the other specialties, the midst-COVID-19 score (88.323.68) was also significantly lower than the pre-COVID-19 score (90.063.58) (P<0.0001). There were 1,322 students (837 males and 485 females). Male students had significantly lower scores than female students (89.333.68 vs. 89.993.66, P=0.0017). GEE analysis revealed that the COVID-19 pandemic (unstandardized beta coefficient=-1.99, standard error [SE]=0.13, P<0.0001), COVID-19-affected specialties (B=0.26, SE=0.11, P=0.0184), female students (B=1.10, SE=0.20, P<0.0001), and female attending physicians (B=-0.19, SE=0.08, P=0.0145) were independently associated with students’ scores.
Conclusion
COVID-19 negatively impacted medical students' clinical performance, regardless of their specialty. Female students outperformed male students, irrespective of the pandemic.

Use of learner-driven, formative, ad-hoc, prospective assessment of competence in physical therapist clinical education in the United States: a prospective cohort study: Carey Holleran, Jeffrey Konrad, Barbara Norton, Tamara Burlis, Steven Ambler; J Educ Eval Health Prof. 2023;20:36. Published online December 8, 2023; DOI: https://doi.org/10.3352/jeehp.2023.20.36

709 View
110 Download

Abstract PDF Supplementary Material: Purpose
The purpose of this project was to implement a process for learner-driven, formative, prospective, ad-hoc, entrustment assessment in Doctor of Physical Therapy clinical education. Our goals were to develop an innovative entrustment assessment tool, and then explore whether the tool detected (1) differences between learners at different stages of development and (2) differences within learners across the course of a clinical education experience. We also investigated whether there was a relationship between the number of assessments and change in performance.
Methods
A prospective, observational, cohort of clinical instructors (CIs) was recruited to perform learner-driven, formative, ad-hoc, prospective, entrustment assessments. Two entrustable professional activities (EPAs) were used: (1) gather a history and perform an examination and (2) implement and modify the plan of care, as needed. CIs provided a rating on the entrustment scale and provided narrative support for their rating.
Results
Forty-nine learners participated across 4 clinical experiences (CEs), resulting in 453 EPA learner-driven assessments. For both EPAs, statistically significant changes were detected both between learners at different stages of development and within learners across the course of a CE. Improvement within each CE was significantly related to the number of feedback opportunities.
Conclusion
The results of this pilot study provide preliminary support for the use of learner-driven, formative, ad-hoc assessments of competence based on EPAs with a novel entrustment scale. The number of formative assessments requested correlated with change on the EPA scale, suggesting that formative feedback may augment performance improvement.

Effect of a transcultural nursing course on improving the cultural competency of nursing graduate students in Korea: a before-and-after study: Kyung Eui Bae, Geum Hee Jeong; J Educ Eval Health Prof. 2023;20:35. Published online December 4, 2023; DOI: https://doi.org/10.3352/jeehp.2023.20.35

794 View
122 Download

Abstract PDF Supplementary Material: Purpose
This study aimed to evaluate the impact of a transcultural nursing course on enhancing the cultural competency of graduate nursing students in Korea. We hypothesized that participants’ cultural competency would significantly improve in areas such as communication, biocultural ecology and family, dietary habits, death rituals, spirituality, equity, and empowerment and intermediation after completing the course. Furthermore, we assessed the participants’ overall satisfaction with the course.
Methods
A before-and-after study was conducted with graduate nursing students at Hallym University, Chuncheon, Korea, from March to June 2023. A transcultural nursing course was developed based on Giger & Haddad’s transcultural nursing model and Purnell’s theoretical model of cultural competence. Data was collected using a cultural competence scale for registered nurses developed by Kim and his colleagues. A total of 18 students participated, and the paired t-test was employed to compare pre-and post-intervention scores.
Results
The study revealed significant improvements in all 7 categories of cultural nursing competence (P<0.01). Specifically, the mean differences in scores (pre–post) ranged from 0.74 to 1.09 across the categories. Additionally, participants expressed high satisfaction with the course, with an average score of 4.72 out of a maximum of 5.0.
Conclusion
The transcultural nursing course effectively enhanced the cultural competency of graduate nursing students. Such courses are imperative to ensure quality care for the increasing multicultural population in Korea.

Effect of motion-graphic video-based training on the performance of operating room nurse students in cataract surgery in Iran: a randomized controlled study: Behnaz Fatahi, Samira Fatahi, Sohrab Nosrati, Masood Bagheri; J Educ Eval Health Prof. 2023;20:34. Published online November 28, 2023; DOI: https://doi.org/10.3352/jeehp.2023.20.34

1,069 View
81 Download

Abstract PDF Supplementary Material: Purpose
The present study was conducted to determine the effect of motion-graphic video-based training on the performance of operating room nurse students in cataract surgery using phacoemulsification at Kermanshah University of Medical Sciences in Iran.
Methods
This was a randomized controlled study conducted among 36 students training to become operating room nurses. The control group only received routine training, and the intervention group received motion-graphic video-based training on the scrub nurse’s performance in cataract surgery in addition to the educator’s training. The performance of the students in both groups as scrub nurses was measured through a researcher-made checklist in a pre-test and a post-test.
Results
The mean scores for performance in the pre-test and post-test were 17.83 and 26.44 in the control group and 18.33 and 50.94 in the intervention group, respectively, and a significant difference was identified between the mean scores of the pre- and post-test in both groups (P=0.001). The intervention also led to a significant increase in the mean performance score in the intervention group compared to the control group (P=0.001).
Conclusion
Considering the significant difference in the performance score of the intervention group compared to the control group, motion-graphic video-based training had a positive effect on the performance of operating room nurse students, and such training can be used to improve clinical training.

Assessment of the viability of integrating virtual reality programs in practical tests for the Korean Radiological Technologists Licensing Examination: a survey study: Hye Min Park, Eun Seong Kim, Deok Mun Kwon, Pyong Kon Cho, Seoung Hwan Kim, Ki Baek Lee, Seong Hu Kim, Moon Il Bong, Won Seok Yang, Jin Eui Kim, Gi Bong Kang, Yong Su Yoon, Jung Su Kim; J Educ Eval Health Prof. 2023;20:33. Published online November 28, 2023; DOI: https://doi.org/10.3352/jeehp.2023.20.33

923 View
89 Download

Abstract PDF Supplementary Material: Purpose
The objective of this study was to assess the feasibility of incorporating virtual reality/augmented reality (VR/AR) programs into practical tests administered as part of the Korean Radiological Technologists Licensing Examination (KRTLE). This evaluation is grounded in a comprehensive survey that targeted enrolled students in departments of radiology across the nation.
Methods
In total, 682 students from radiology departments across the nation were participants in the survey. An online survey platform was used, and the questionnaire was structured into 5 distinct sections and 27 questions. A frequency analysis for each section of the survey was conducted using IBM SPSS ver. 27.0.
Results
Direct or indirect exposure to VR/AR content was reported by 67.7% of all respondents. Furthermore, 55.4% of the respondents expressed that VR/AR could be integrated into their classes, which signified a widespread acknowledgment of VR among the students. With regards to the integration of a VR/AR or mixed reality program into the practical tests for purposes of the KRTLE, a substantial amount of the respondents (57.3%) exhibited a positive inclination and recommended its introduction.
Conclusion
The application of VR/AR programs within practical tests of the KRTLE will be used as an alternative for evaluating clinical examination procedures and validating job skills.

Brief report

ChatGPT (GPT-3.5) as an assistant tool in microbial pathogenesis studies in Sweden: a cross-sectional comparative study: Catharina Hultgren, Annica Lindkvist, Volkan Özenci, Sophie Curbo; J Educ Eval Health Prof. 2023;20:32. Published online November 22, 2023; DOI: https://doi.org/10.3352/jeehp.2023.20.32

820 View
94 Download
1 Web of Science
2 Crossref

Abstract PDF Supplementary Material

ChatGPT (GPT-3.5) has entered higher education and there is a need to determine how to use it effectively. This descriptive study compared the ability of GPT-3.5 and teachers to answer questions from dental students and construct detailed intended learning outcomes. When analyzed according to a Likert scale, we found that GPT-3.5 answered the questions from dental students in a similar or even more elaborate way compared to the answers that had previously been provided by a teacher. GPT-3.5 was also asked to construct detailed intended learning outcomes for a course in microbial pathogenesis, and when these were analyzed according to a Likert scale they were, to a large degree, found irrelevant. Since students are using GPT-3.5, it is important that instructors learn how to make the best use of it both to be able to advise students and to benefit from its potential.

Citations

Citations to this article as recorded by

Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
Xiaojun Xu, Yixiao Chen, Jing Miao
Journal of Educational Evaluation for Health Professions.2024; 21: 6. CrossRef
Information amount, accuracy, and relevance of generative artificial intelligence platforms’ answers regarding learning objectives of medical arthropodology evaluated in English and Korean queries in December 2023: a descriptive study
Hyunju Lee, Soobin Park
Journal of Educational Evaluation for Health Professions.2023; 20: 39. CrossRef

First
Prev
Page of 34
Next
Last

Article category