Skip Navigation
Skip to contents

JEEHP : Journal of Educational Evaluation for Health Professions

OPEN ACCESS
SEARCH
Search

Search

Page Path
HOME > Search
162 "Education"
Filter
Filter
Article category
Keywords
Publication year
Authors
Funded articles
Review
Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review  
Xiaojun Xu, Yixiao Chen, Jing Miao
J Educ Eval Health Prof. 2024;21:6.   Published online March 15, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.6
  • 220 View
  • 193 Download
AbstractAbstract PDFSupplementary Material
Background
ChatGPT is a large language model (LLM) based on artificial intelligence (AI) capable of responding in multiple languages and generating nuanced and highly complex responses. While ChatGPT holds promising applications in medical education, its limitations and potential risks cannot be ignored.
Methods
A scoping review was conducted for English articles discussing ChatGPT in the context of medical education published after 2022. A literature search was performed using PubMed/MEDLINE, Embase, and Web of Science databases, and information was extracted from the relevant studies that were ultimately included.
Results
ChatGPT exhibits various potential applications in medical education, such as providing personalized learning plans and materials, creating clinical practice simulation scenarios, and assisting in writing articles. However, challenges associated with academic integrity, data accuracy, and potential harm to learning were also highlighted in the literature. The paper emphasizes certain recommendations for using ChatGPT, including the establishment of guidelines. Based on the review, 3 key research areas were proposed: cultivating the ability of medical students to use ChatGPT correctly, integrating ChatGPT into teaching activities and processes, and proposing standards for the use of AI by medical students.
Conclusion
ChatGPT has the potential to transform medical education, but careful consideration is required for its full integration. To harness the full potential of ChatGPT in medical education, attention should not only be given to the capabilities of AI but also to its impact on students and teachers.
Research articles
Development and validity evidence for the resident-led large group teaching assessment instrument in the United States: a methodological study  
Ariel Shana Frey-Vogel, Kristina Dzara, Kimberly Anne Gifford, Yoon Soo Park, Justin Berk, Allison Heinly, Darcy Wolcott, Daniel Adam Hall, Shannon Elliott Scott-Vernaglia, Katherine Anne Sparger, Erica Ye-pyng Chung
J Educ Eval Health Prof. 2024;21:3.   Published online February 23, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.3
  • 380 View
  • 113 Download
AbstractAbstract PDFSupplementary Material
Purpose
Despite educational mandates to assess resident teaching competence, limited instruments with validity evidence exist for this purpose. Existing instruments do not allow faculty to assess resident-led teaching in a large group format or whether teaching was interactive. This study gathers validity evidence on the use of the Resident-led Large Group Teaching Assessment Instrument (Relate), an instrument used by faculty to assess resident teaching competency. Relate comprises 23 behaviors divided into 6 elements: learning environment, goals and objectives, content of talk, promotion of understanding and retention, session management, and closure.
Methods
Messick’s unified validity framework was used for this study. Investigators used video recordings of resident-led teaching from 3 pediatric residency programs to develop Relate and a rater guidebook. Faculty were trained on instrument use through frame-of-reference training. Resident teaching at all sites was video-recorded during 2018–2019. Two trained faculty raters assessed each video. Descriptive statistics on performance were obtained. Validity evidence sources include: rater training effect (response process), reliability and variability (internal structure), and impact on Milestones assessment (relations to other variables).
Results
Forty-eight videos, from 16 residents, were analyzed. Rater training improved inter-rater reliability from 0.04 to 0.64. The Φ-coefficient reliability was 0.50. There was a significant correlation between overall Relate performance and the pediatric teaching Milestone (r=0.34, P=0.019).
Conclusion
Relate provides validity evidence with sufficient reliability to measure resident-led large-group teaching competence.
Negative effects on medical students’ scores for clinical performance during the COVID-19 pandemic in Taiwan: a comparative study  
Eunice Jia-Shiow Yuan, Shiau-Shian Huang, Chia-An Hsu, Jiing-Feng Lirng, Tzu-Hao Li, Chia-Chang Huang, Ying-Ying Yang, Chung-Pin Li, Chen-Huan Chen
J Educ Eval Health Prof. 2023;20:37.   Published online December 26, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.37
  • 752 View
  • 71 Download
AbstractAbstract PDFSupplementary Material
Purpose
Coronavirus disease 2019 (COVID-19) has heavily impacted medical clinical education in Taiwan. Medical curricula have been altered to minimize exposure and limit transmission. This study investigated the effect of COVID-19 on Taiwanese medical students’ clinical performance using online standardized evaluation systems and explored the factors influencing medical education during the pandemic.
Methods
Medical students were scored from 0 to 100 based on their clinical performance from 1/1/2018 to 6/31/2021. The students were placed into pre-COVID-19 (before 2/1/2020) and midst-COVID-19 (on and after 2/1/2020) groups. Each group was further categorized into COVID-19-affected specialties (pulmonary, infectious, and emergency medicine) and other specialties. Generalized estimating equations (GEEs) were used to compare and examine the effects of relevant variables on student performance.
Results
In total, 16,944 clinical scores were obtained for COVID-19-affected specialties and other specialties. For the COVID-19-affected specialties, the midst-COVID-19 score (88.513.52) was significantly lower than the pre-COVID-19 score (90.143.55) (P<0.0001). For the other specialties, the midst-COVID-19 score (88.323.68) was also significantly lower than the pre-COVID-19 score (90.063.58) (P<0.0001). There were 1,322 students (837 males and 485 females). Male students had significantly lower scores than female students (89.333.68 vs. 89.993.66, P=0.0017). GEE analysis revealed that the COVID-19 pandemic (unstandardized beta coefficient=-1.99, standard error [SE]=0.13, P<0.0001), COVID-19-affected specialties (B=0.26, SE=0.11, P=0.0184), female students (B=1.10, SE=0.20, P<0.0001), and female attending physicians (B=-0.19, SE=0.08, P=0.0145) were independently associated with students’ scores.
Conclusion
COVID-19 negatively impacted medical students' clinical performance, regardless of their specialty. Female students outperformed male students, irrespective of the pandemic.
Use of learner-driven, formative, ad-hoc, prospective assessment of competence in physical therapist clinical education in the United States: a prospective cohort study  
Carey Holleran, Jeffrey Konrad, Barbara Norton, Tamara Burlis, Steven Ambler
J Educ Eval Health Prof. 2023;20:36.   Published online December 8, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.36
  • 715 View
  • 110 Download
AbstractAbstract PDFSupplementary Material
Purpose
The purpose of this project was to implement a process for learner-driven, formative, prospective, ad-hoc, entrustment assessment in Doctor of Physical Therapy clinical education. Our goals were to develop an innovative entrustment assessment tool, and then explore whether the tool detected (1) differences between learners at different stages of development and (2) differences within learners across the course of a clinical education experience. We also investigated whether there was a relationship between the number of assessments and change in performance.
Methods
A prospective, observational, cohort of clinical instructors (CIs) was recruited to perform learner-driven, formative, ad-hoc, prospective, entrustment assessments. Two entrustable professional activities (EPAs) were used: (1) gather a history and perform an examination and (2) implement and modify the plan of care, as needed. CIs provided a rating on the entrustment scale and provided narrative support for their rating.
Results
Forty-nine learners participated across 4 clinical experiences (CEs), resulting in 453 EPA learner-driven assessments. For both EPAs, statistically significant changes were detected both between learners at different stages of development and within learners across the course of a CE. Improvement within each CE was significantly related to the number of feedback opportunities.
Conclusion
The results of this pilot study provide preliminary support for the use of learner-driven, formative, ad-hoc assessments of competence based on EPAs with a novel entrustment scale. The number of formative assessments requested correlated with change on the EPA scale, suggesting that formative feedback may augment performance improvement.
Effect of a transcultural nursing course on improving the cultural competency of nursing graduate students in Korea: a before-and-after study  
Kyung Eui Bae, Geum Hee Jeong
J Educ Eval Health Prof. 2023;20:35.   Published online December 4, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.35
  • 795 View
  • 122 Download
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to evaluate the impact of a transcultural nursing course on enhancing the cultural competency of graduate nursing students in Korea. We hypothesized that participants’ cultural competency would significantly improve in areas such as communication, biocultural ecology and family, dietary habits, death rituals, spirituality, equity, and empowerment and intermediation after completing the course. Furthermore, we assessed the participants’ overall satisfaction with the course.
Methods
A before-and-after study was conducted with graduate nursing students at Hallym University, Chuncheon, Korea, from March to June 2023. A transcultural nursing course was developed based on Giger & Haddad’s transcultural nursing model and Purnell’s theoretical model of cultural competence. Data was collected using a cultural competence scale for registered nurses developed by Kim and his colleagues. A total of 18 students participated, and the paired t-test was employed to compare pre-and post-intervention scores.
Results
The study revealed significant improvements in all 7 categories of cultural nursing competence (P<0.01). Specifically, the mean differences in scores (pre–post) ranged from 0.74 to 1.09 across the categories. Additionally, participants expressed high satisfaction with the course, with an average score of 4.72 out of a maximum of 5.0.
Conclusion
The transcultural nursing course effectively enhanced the cultural competency of graduate nursing students. Such courses are imperative to ensure quality care for the increasing multicultural population in Korea.
Brief report
ChatGPT (GPT-3.5) as an assistant tool in microbial pathogenesis studies in Sweden: a cross-sectional comparative study  
Catharina Hultgren, Annica Lindkvist, Volkan Özenci, Sophie Curbo
J Educ Eval Health Prof. 2023;20:32.   Published online November 22, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.32
  • 824 View
  • 94 Download
  • 1 Web of Science
  • 2 Crossref
AbstractAbstract PDFSupplementary Material
ChatGPT (GPT-3.5) has entered higher education and there is a need to determine how to use it effectively. This descriptive study compared the ability of GPT-3.5 and teachers to answer questions from dental students and construct detailed intended learning outcomes. When analyzed according to a Likert scale, we found that GPT-3.5 answered the questions from dental students in a similar or even more elaborate way compared to the answers that had previously been provided by a teacher. GPT-3.5 was also asked to construct detailed intended learning outcomes for a course in microbial pathogenesis, and when these were analyzed according to a Likert scale they were, to a large degree, found irrelevant. Since students are using GPT-3.5, it is important that instructors learn how to make the best use of it both to be able to advise students and to benefit from its potential.

Citations

Citations to this article as recorded by  
  • Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
    Xiaojun Xu, Yixiao Chen, Jing Miao
    Journal of Educational Evaluation for Health Professions.2024; 21: 6.     CrossRef
  • Information amount, accuracy, and relevance of generative artificial intelligence platforms’ answers regarding learning objectives of medical arthropodology evaluated in English and Korean queries in December 2023: a descriptive study
    Hyunju Lee, Soobin Park
    Journal of Educational Evaluation for Health Professions.2023; 20: 39.     CrossRef
Technical report
Item difficulty index, discrimination index, and reliability of the 26 health professions licensing examinations in 2022, Korea: a psychometric study
Yoon Hee Kim, Bo Hyun Kim, Joonki Kim, Bokyoung Jung, Sangyoung Bae
J Educ Eval Health Prof. 2023;20:31.   Published online November 22, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.31
  • 645 View
  • 63 Download
AbstractAbstract PDFSupplementary Material
Purpose
This study presents item analysis results of the 26 health personnel licensing examinations managed by the Korea Health Personnel Licensing Examination Institute (KHPLEI) in 2022.
Methods
The item difficulty index, item discrimination index, and reliability were calculated. The item discrimination index was calculated using a discrimination index based on the upper and lower 27% rule and the item-total correlation.
Results
Out of 468,352 total examinees, 418,887 (89.4%) passed. The pass rates ranged from 27.3% for health educators level 1 to 97.1% for oriental medical doctors. Most examinations had a high average difficulty index, albeit to varying degrees, ranging from 61.3% for prosthetists and orthotists to 83.9% for care workers. The average discrimination index based on the upper and lower 27% rule ranged from 0.17 for oriental medical doctors to 0.38 for radiological technologists. The average item-total correlation ranged from 0.20 for oriental medical doctors to 0.38 for radiological technologists. The Cronbach α, as a measure of reliability, ranged from 0.872 for health educators-level 3 to 0.978 for medical technologists. The correlation coefficient between the average difficulty index and average discrimination index was -0.2452 (P=0.1557), that between the average difficulty index and the average item-total correlation was 0.3502 (P=0.0392), and that between the average discrimination index and the average item-total correlation was 0.7944 (P<0.0001).
Conclusion
This technical report presents the item analysis results and reliability of the recent examinations by the KHPLEI, demonstrating an acceptable range of difficulty index and discrimination index values, as well as good reliability.
Research articles
Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical Examination: a cross-sectional study  
Betzy Clariza Torres-Zegarra, Wagner Rios-Garcia, Alvaro Micael Ñaña-Cordova, Karen Fatima Arteaga-Cisneros, Xiomara Cristina Benavente Chalco, Marina Atena Bustamante Ordoñez, Carlos Jesus Gutierrez Rios, Carlos Alberto Ramos Godoy, Kristell Luisa Teresa Panta Quezada, Jesus Daniel Gutierrez-Arratia, Javier Alejandro Flores-Cohaila
J Educ Eval Health Prof. 2023;20:30.   Published online November 20, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.30
  • 1,223 View
  • 159 Download
  • 4 Web of Science
  • 4 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
We aimed to describe the performance and evaluate the educational value of justifications provided by artificial intelligence chatbots, including GPT-3.5, GPT-4, Bard, Claude, and Bing, on the Peruvian National Medical Licensing Examination (P-NLME).
Methods
This was a cross-sectional analytical study. On July 25, 2023, each multiple-choice question (MCQ) from the P-NLME was entered into each chatbot (GPT-3, GPT-4, Bing, Bard, and Claude) 3 times. Then, 4 medical educators categorized the MCQs in terms of medical area, item type, and whether the MCQ required Peru-specific knowledge. They assessed the educational value of the justifications from the 2 top performers (GPT-4 and Bing).
Results
GPT-4 scored 86.7% and Bing scored 82.2%, followed by Bard and Claude, and the historical performance of Peruvian examinees was 55%. Among the factors associated with correct answers, only MCQs that required Peru-specific knowledge had lower odds (odds ratio, 0.23; 95% confidence interval, 0.09–0.61), whereas the remaining factors showed no associations. In assessing the educational value of justifications provided by GPT-4 and Bing, neither showed any significant differences in certainty, usefulness, or potential use in the classroom.
Conclusion
Among chatbots, GPT-4 and Bing were the top performers, with Bing performing better at Peru-specific MCQs. Moreover, the educational value of justifications provided by the GPT-4 and Bing could be deemed appropriate. However, it is essential to start addressing the educational value of these chatbots, rather than merely their performance on examinations.

Citations

Citations to this article as recorded by  
  • Performance of GPT-4V in Answering the Japanese Otolaryngology Board Certification Examination Questions: Evaluation Study
    Masao Noda, Takayoshi Ueno, Ryota Koshu, Yuji Takaso, Mari Dias Shimada, Chizu Saito, Hisashi Sugimoto, Hiroaki Fushiki, Makoto Ito, Akihiro Nomura, Tomokazu Yoshizaki
    JMIR Medical Education.2024; 10: e57054.     CrossRef
  • Response to Letter to the Editor re: “Artificial Intelligence Versus Expert Plastic Surgeon: Comparative Study Shows ChatGPT ‘Wins' Rhinoplasty Consultations: Should We Be Worried? [1]” by Durairaj et al.
    Kay Durairaj, Omer Baker
    Facial Plastic Surgery & Aesthetic Medicine.2024;[Epub]     CrossRef
  • Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
    Xiaojun Xu, Yixiao Chen, Jing Miao
    Journal of Educational Evaluation for Health Professions.2024; 21: 6.     CrossRef
  • Information amount, accuracy, and relevance of generative artificial intelligence platforms’ answers regarding learning objectives of medical arthropodology evaluated in English and Korean queries in December 2023: a descriptive study
    Hyunju Lee, Soobin Park
    Journal of Educational Evaluation for Health Professions.2023; 20: 39.     CrossRef
Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study  
Aleksandra Ignjatović, Lazar Stevanović
J Educ Eval Health Prof. 2023;20:28.   Published online October 16, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.28
  • 1,768 View
  • 167 Download
  • 1 Web of Science
  • 1 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to assess the performance of ChatGPT (GPT-3.5 and GPT-4) as a study tool in solving biostatistical problems and to identify any potential drawbacks that might arise from using ChatGPT in medical education, particularly in solving practical biostatistical problems.
Methods
ChatGPT was tested to evaluate its ability to solve biostatistical problems from the Handbook of Medical Statistics by Peacock and Peacock in this descriptive study. Tables from the problems were transformed into textual questions. Ten biostatistical problems were randomly chosen and used as text-based input for conversation with ChatGPT (versions 3.5 and 4).
Results
GPT-3.5 solved 5 practical problems in the first attempt, related to categorical data, cross-sectional study, measuring reliability, probability properties, and the t-test. GPT-3.5 failed to provide correct answers regarding analysis of variance, the chi-square test, and sample size within 3 attempts. GPT-4 also solved a task related to the confidence interval in the first attempt and solved all questions within 3 attempts, with precise guidance and monitoring.
Conclusion
The assessment of both versions of ChatGPT performance in 10 biostatistical problems revealed that GPT-3.5 and 4’s performance was below average, with correct response rates of 5 and 6 out of 10 on the first attempt. GPT-4 succeeded in providing all correct answers within 3 attempts. These findings indicate that students must be aware that this tool, even when providing and calculating different statistical analyses, can be wrong, and they should be aware of ChatGPT’s limitations and be careful when incorporating this model into medical education.

Citations

Citations to this article as recorded by  
  • Can Generative AI and ChatGPT Outperform Humans on Cognitive-Demanding Problem-Solving Tasks in Science?
    Xiaoming Zhai, Matthew Nyaaba, Wenchao Ma
    Science & Education.2024;[Epub]     CrossRef
Effect of an interprofessional simulation program on patient safety competencies of healthcare professionals in Switzerland: a before and after study  
Sylvain Boloré, Thomas Fassier, Nicolas Guirimand
J Educ Eval Health Prof. 2023;20:25.   Published online August 28, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.25
  • 1,343 View
  • 141 Download
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to identify the effects of a 12-week interprofessional simulation program, operated between February 2020 and January 2021, on the patient safety competencies of healthcare professionals in Switzerland.
Methods
The simulation training was based on 2 scenarios of hospitalized patients with septic shock and respiratory failure, and trainees were expected to demonstrate patient safety competencies. A single-group before and after study was conducted after the intervention—simulation program, using a measurement tool (the Health Professional Education in Patient Safety Survey) to measure the perceived competencies of physicians, nurses, and nursing assistants. Out of 57 participants, 37 answered the questionnaire surveys 4 times: 48 hours before the training, followed by post-surveys at 24 hours, 6 weeks, and 12 weeks after the training. The linear mixed effect model was applied for the analysis.
Results
Four components out of 6 perceived patient safety competencies improved at 6 weeks but returned to a similar level before training at 12 weeks. Competencies of “communicating effectively,” “managing safety risks,” “understanding human and environmental factors that influence patient safety,” and “recognize and respond to remove immediate risks of harm” are statistically significant both overall and in the comparison between before the training and 6 weeks after the training.
Conclusion
Interprofessional simulation programs contributed to developing some areas of patient safety competencies of healthcare professionals, but only for a limited time. Interprofessional simulation programs should be repeated and combined with other forms of support, including case discussions and debriefings, to ensure lasting effects.
Implementation strategy for introducing a clinical skills examination to the Korean Oriental Medicine Licensing Examination: a mixed-method modified Delphi study  
Chan-Young Kwon, Sanghoon Lee, Min Hwangbo, Chungsik Cho, Sangwoo Shin, Dong-Hyeon Kim, Aram Jeong, Hye-Yoon Lee
J Educ Eval Health Prof. 2023;20:23.   Published online July 17, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.23
  • 1,525 View
  • 134 Download
AbstractAbstract PDFSupplementary Material
Purpose
This study investigated the validity of introducing a clinical skills examination (CSE) to the Korean Oriental Medicine Licensing Examination through a mixed-method modified Delphi study.
Methods
A 3-round Delphi study was conducted between September and November 2022. The expert panel comprised 21 oriental medicine education experts who were officially recommended by relevant institutions and organizations. The questionnaires included potential content for the CSE and a detailed implementation strategy. Subcommittees were formed to discuss concerns around the introduction of the CSE, which were collected as open-ended questions. In this study, a 66.7% or greater agreement rate was defined as achieving a consensus.
Results
The expert panel’s evaluation of the proposed clinical presentations and basic clinical skills suggested their priorities. Of the 10 items investigated for building a detailed implementation strategy for the introduction of the CSE to the Korean Oriental Medicine Licensing Examination, a consensus was achieved on 9. However, the agreement rate on the timing of the introduction of the CSE was low. Concerns around 4 clinical topics were discussed in the subcommittees, and potential solutions were proposed.
Conclusion
This study offers preliminary data and raises some concerns that can be used as a reference while discussing the introduction of the CSE to the Korean Oriental Medicine Licensing Examination.
Suggestion for item allocation to 8 nursing activity categories of the Korean Nursing Licensing Examination: a survey-based descriptive study  
Kyunghee Kim, So Young Kang, Younhee Kang, Youngran Kweon, Hyunjung Kim, Youngshin Song, Juyeon Cho, Mi-Young Choi, Hyun Su Lee
J Educ Eval Health Prof. 2023;20:18.   Published online June 12, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.18
  • 1,241 View
  • 103 Download
AbstractAbstract PDFSupplementary Material
Purpose
This study aims to suggest the number of test items in each of 8 nursing activity categories of the Korean Nursing Licensing Examination, which comprises 134 activity statements including 275 items. The examination will be able to evaluate the minimum ability that nursing graduates must have to perform their duties. Methods: Two opinion surveys involving the members of 7 academic societies were conducted from March 19 to May 14, 2021. The survey results were reviewed by members of 4 expert associations from May 21 to June 4, 2021. The results for revised numbers of items in each category were compared with those reported by Tak and his colleagues and the National Council License Examination for Registered Nurses of the United States. Results: Based on 2 opinion surveys and previous studies, the suggestions for item allocation to 8 nursing activity categories of the Korean Nursing Licensing Examination in this study are as follows: 50 items for management of care and improvement of professionalism, 33 items for safety and infection control, 40 items for management of potential risk, 28 items for basic care, 47 items for physiological integrity and maintenance, 33 items for pharmacological and parenteral therapies, 24 items for psychosocial integrity and maintenance, and 20 items for health promotion and maintenance. Twenty other items related to health and medical laws were not included due to their mandatory status. Conclusion: These suggestions for the number of test items for each activity category will be helpful in developing new items for the Korean Nursing Licensing Examination.
Brief report
Comparing ChatGPT’s ability to rate the degree of stereotypes and the consistency of stereotype attribution with those of medical students in New Zealand in developing a similarity rating test: a methodological study  
Chao-Cheng Lin, Zaine Akuhata-Huntington, Che-Wei Hsu
J Educ Eval Health Prof. 2023;20:17.   Published online June 12, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.17
  • 1,722 View
  • 128 Download
  • 1 Web of Science
  • 1 Crossref
AbstractAbstract PDFSupplementary Material
Learning about one’s implicit bias is crucial for improving one’s cultural competency and thereby reducing health inequity. To evaluate bias among medical students following a previously developed cultural training program targeting New Zealand Māori, we developed a text-based, self-evaluation tool called the Similarity Rating Test (SRT). The development process of the SRT was resource-intensive, limiting its generalizability and applicability. Here, we explored the potential of ChatGPT, an automated chatbot, to assist in the development process of the SRT by comparing ChatGPT’s and students’ evaluations of the SRT. Despite results showing non-significant equivalence and difference between ChatGPT’s and students’ ratings, ChatGPT’s ratings were more consistent than students’ ratings. The consistency rate was higher for non-stereotypical than for stereotypical statements, regardless of rater type. Further studies are warranted to validate ChatGPT’s potential for assisting in SRT development for implementation in medical education and evaluation of ethnic stereotypes and related topics.

Citations

Citations to this article as recorded by  
  • Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study
    Aleksandra Ignjatović, Lazar Stevanović
    Journal of Educational Evaluation for Health Professions.2023; 20: 28.     CrossRef
Research articles
Relationships between undergraduate medical students’ attitudes toward communication skills learning and demographics in Zambia: a survey-based descriptive study  
Mercy Ijeoma Okwudili Ezeala, John Volk
J Educ Eval Health Prof. 2023;20:16.   Published online June 1, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.16
  • 1,230 View
  • 84 Download
  • 1 Web of Science
  • 1 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to detect relationships between undergraduate students’ attitudes toward communication skills learning and demographic variables (such as age, academic year, and gender). Understanding these relationships could provide information for communication skills facilitators and curriculum planners on structuring course delivery and integrating communication skills training into the medical curriculum.
Methods
The descriptive study involved a survey of 369 undergraduate students from 2 medical schools in Zambia who participated in communication skills training stratified by academic year using the Communication Skills Attitude Scale. Data were collected between October and December 2021 and analyzed using IBM SPSS for Windows version 28.0.
Results
One-way analysis of variance revealed a significant difference in attitude between at least 5 academic years. There was a significant difference in attitudes between the 2nd and 5th academic years (t=5.95, P˂0.001). No significant difference in attitudes existed among the academic years on the negative subscale; the 2nd and 3rd (t=3.82, P=0.004), 4th (t=3.61, P=0.011), 5th (t=8.36, P˂0.001), and 6th (t=4.20, P=0.001) academic years showed significant differences on the positive subscale. Age showed no correlation with attitudes. There was a more favorable attitude to learning communication skills among the women participants than among the men participants (P=0.006).
Conclusion
Despite positive general attitudes toward learning communication skills, the difference in attitude between the genders, academic years 2 and 5, and the subsequent classes suggest a re-evaluation of the curriculum and teaching methods to facilitate appropriate course structure according to the academic years and a learning process that addressees gender differences.

Citations

Citations to this article as recorded by  
  • Attitudes toward learning communication skills among Iranian medical students
    Naser Yousefzadeh Kandevani, Ali Labaf, Azim Mirzazadeh, Pegah Salimi Pormehr
    BMC Medical Education.2024;[Epub]     CrossRef
Students’ performance of and perspective on an objective structured practical examination for the assessment of preclinical and practical skills in biomedical laboratory science students in Sweden: a 5-year longitudinal study  
Catharina Hultgren, Annica Lindkvist, Sophie Curbo, Maura Heverin
J Educ Eval Health Prof. 2023;20:13.   Published online April 6, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.13
  • 1,272 View
  • 113 Download
AbstractAbstract PDFSupplementary Material
Purpose
It aims to find students’ performance of and perspectives on an objective structured practical examination (OSPE) for assessment of laboratory and preclinical skills in biomedical laboratory science (BLS). It also aims to investigate the perception, acceptability, and usefulness of OSPE from the students’ and examiners’ point of view.
Methods
This was a longitudinal study to implement an OSPE in BLS. The student group consisted of 198 BLS students enrolled in semester 4, 2015–2019 at Karolinska University Hospital Huddinge, Sweden. Fourteen teachers evaluated the performance by completing a checklist and global rating scales. A student survey questionnaire was administered to the participants to evaluate the student perspective. To assess quality, 4 independent observers were included to monitor the examiners.
Results
Almost 50% of the students passed the initial OSPE. During the repeat OSPE, 73% of the students passed the OSPE. There was a statistically significant difference between the first and the second repeat OSPE (P<0.01) but not between the first and the third attempt (P=0.09). The student survey questionnaire was completed by 99 of the 198 students (50%) and only 63 students responded to the free-text questions (32%). According to these responses, some stations were perceived as more difficult, albeit they considered the assessment to be valid. The observers found the assessment protocols and examiner’s instructions assured the objectivity of the examination.
Conclusion
The introduction of an OSPE in the education of biomedical laboratory scientists was a reliable, and useful examination of practical skills.

JEEHP : Journal of Educational Evaluation for Health Professions