ChatGPT (GPT-3.5) has entered higher education and there is a need to determine how to use it effectively. This descriptive study compared the ability of GPT-3.5 and teachers to answer questions from dental students and construct detailed intended learning outcomes. When analyzed according to a Likert scale, we found that GPT-3.5 answered the questions from dental students in a similar or even more elaborate way compared to the answers that had previously been provided by a teacher. GPT-3.5 was also asked to construct detailed intended learning outcomes for a course in microbial pathogenesis, and when these were analyzed according to a Likert scale they were, to a large degree, found irrelevant. Since students are using GPT-3.5, it is important that instructors learn how to make the best use of it both to be able to advise students and to benefit from its potential.
Purpose This study presents item analysis results of the 26 health personnel licensing examinations managed by the Korea Health Personnel Licensing Examination Institute (KHPLEI) in 2022.
Methods The item difficulty index, item discrimination index, and reliability were calculated. The item discrimination index was calculated using a discrimination index based on the upper and lower 27% rule and the item-total correlation.
Results Out of 468,352 total examinees, 418,887 (89.4%) passed. The pass rates ranged from 27.3% for health educators level 1 to 97.1% for oriental medical doctors. Most examinations had a high average difficulty index, albeit to varying degrees, ranging from 61.3% for prosthetists and orthotists to 83.9% for care workers. The average discrimination index based on the upper and lower 27% rule ranged from 0.17 for oriental medical doctors to 0.38 for radiological technologists. The average item-total correlation ranged from 0.20 for oriental medical doctors to 0.38 for radiological technologists. The Cronbach α, as a measure of reliability, ranged from 0.872 for health educators-level 3 to 0.978 for medical technologists. The correlation coefficient between the average difficulty index and average discrimination index was -0.2452 (P=0.1557), that between the average difficulty index and the average item-total correlation was 0.3502 (P=0.0392), and that between the average discrimination index and the average item-total correlation was 0.7944 (P<0.0001).
Conclusion This technical report presents the item analysis results and reliability of the recent examinations by the KHPLEI, demonstrating an acceptable range of difficulty index and discrimination index values, as well as good reliability.
Purpose We aimed to describe the performance and evaluate the educational value of justifications provided by artificial intelligence chatbots, including GPT-3.5, GPT-4, Bard, Claude, and Bing, on the Peruvian National Medical Licensing Examination (P-NLME).
Methods This was a cross-sectional analytical study. On July 25, 2023, each multiple-choice question (MCQ) from the P-NLME was entered into each chatbot (GPT-3, GPT-4, Bing, Bard, and Claude) 3 times. Then, 4 medical educators categorized the MCQs in terms of medical area, item type, and whether the MCQ required Peru-specific knowledge. They assessed the educational value of the justifications from the 2 top performers (GPT-4 and Bing).
Results GPT-4 scored 86.7% and Bing scored 82.2%, followed by Bard and Claude, and the historical performance of Peruvian examinees was 55%. Among the factors associated with correct answers, only MCQs that required Peru-specific knowledge had lower odds (odds ratio, 0.23; 95% confidence interval, 0.09–0.61), whereas the remaining factors showed no associations. In assessing the educational value of justifications provided by GPT-4 and Bing, neither showed any significant differences in certainty, usefulness, or potential use in the classroom.
Conclusion Among chatbots, GPT-4 and Bing were the top performers, with Bing performing better at Peru-specific MCQs. Moreover, the educational value of justifications provided by the GPT-4 and Bing could be deemed appropriate. However, it is essential to start addressing the educational value of these chatbots, rather than merely their performance on examinations.
Purpose This study aimed to analyze patterns of using ChatGPT before and after group activities and to explore medical students’ perceptions of ChatGPT as a feedback tool in the classroom.
Methods The study included 99 2nd-year pre-medical students who participated in a “Leadership and Communication” course from March to June 2023. Students engaged in both individual and group activities related to negotiation strategies. ChatGPT was used to provide feedback on their solutions. A survey was administered to assess students’ perceptions of ChatGPT’s feedback, its use in the classroom, and the strengths and challenges of ChatGPT from May 17 to 19, 2023.
Results The students responded by indicating that ChatGPT’s feedback was helpful, and revised and resubmitted their group answers in various ways after receiving feedback. The majority of respondents expressed agreement with the use of ChatGPT during class. The most common response concerning the appropriate context of using ChatGPT’s feedback was “after the first round of discussion, for revisions.” There was a significant difference in satisfaction with ChatGPT’s feedback, including correctness, usefulness, and ethics, depending on whether or not ChatGPT was used during class, but there was no significant difference according to gender or whether students had previous experience with ChatGPT. The strongest advantages were “providing answers to questions” and “summarizing information,” and the worst disadvantage was “producing information without supporting evidence.”
Conclusion The students were aware of the advantages and disadvantages of ChatGPT, and they had a positive attitude toward using ChatGPT in the classroom.
Purpose This study investigated the prevalence of burnout in physical therapists in the United States and the relationships between burnout and education, mentorship, and self-efficacy.
Methods This was a cross-sectional survey study. An electronic survey was distributed to practicing physical therapists across the United States over a 6-week period from December 2020 to January 2021. The survey was completed by 2,813 physical therapists from all states. The majority were female (68.72%), White or Caucasian (80.13%), and employed full-time (77.14%). Respondents completed questions on demographics, education, mentorship, self-efficacy, and burnout. The Burnout Clinical Subtypes Questionnaire 12 (BCSQ-12) and self-reports were used to quantify burnout, and the General Self-Efficacy Scale (GSES) was used to measure self-efficacy. Descriptive and inferential analyses were performed.
Results Respondents from home health (median BCSQ-12=42.00) and skilled nursing facility settings (median BCSQ-12=42.00) displayed the highest burnout scores. Burnout was significantly lower among those who provided formal mentorship (median BCSQ-12=39.00, P=0.0001) compared to no mentorship (median BCSQ-12=41.00). Respondents who received formal mentorship (median BCSQ-12=38.00, P=0.0028) displayed significantly lower burnout than those who received no mentorship (median BCSQ-12=41.00). A moderate negative correlation (rho=-0.49) was observed between the GSES and burnout scores. A strong positive correlation was found between self-reported burnout status and burnout scores (rrb=0.61).
Conclusion Burnout is prevalent in the physical therapy profession, as almost half of respondents (49.34%) reported burnout. Providing or receiving mentorship and higher self-efficacy were associated with lower burnout. Organizations should consider measuring burnout levels, investing in mentorship programs, and implementing strategies to improve self-efficacy.
Purpose This study aimed to devise a valid measurement for assessing clinical students’ perceptions of teaching practices.
Methods A new tool was developed based on a meta-analysis encompassing effective clinical teaching-learning factors. Seventy-nine items were generated using a frequency (never to always) scale. The tool was applied to the University of New South Wales year 2, 3, and 6 medical students. Exploratory and confirmatory factor analysis (exploratory factor analysis [EFA] and confirmatory factor analysis [CFA], respectively) were conducted to establish the tool’s construct validity and goodness of fit, and Cronbach’s α was used for reliability.
Results In total, 352 students (44.2%) completed the questionnaire. The EFA identified student-centered learning, problem-solving learning, self-directed learning, and visual technology (reliability, 0.77 to 0.89). CFA showed acceptable goodness of fit (chi-square P<0.01, comparative fit index=0.930 and Tucker-Lewis index=0.917, root mean square error of approximation=0.069, standardized root mean square residual=0.06).
Conclusion The established tool—Student Ratings in Clinical Teaching (STRICT)—is a valid and reliable tool that demonstrates how students perceive clinical teaching efficacy. STRICT measures the frequency of teaching practices to mitigate the biases of acquiescence and social desirability. Clinical teachers may use the tool to adapt their teaching practices with more active learning activities and to utilize visual technology to facilitate clinical learning efficacy. Clinical educators may apply STRICT to assess how these teaching practices are implemented in current clinical settings.
Purpose This study aimed to identify the effects of a 12-week interprofessional simulation program, operated between February 2020 and January 2021, on the patient safety competencies of healthcare professionals in Switzerland.
Methods The simulation training was based on 2 scenarios of hospitalized patients with septic shock and respiratory failure, and trainees were expected to demonstrate patient safety competencies. A single-group before and after study was conducted after the intervention—simulation program, using a measurement tool (the Health Professional Education in Patient Safety Survey) to measure the perceived competencies of physicians, nurses, and nursing assistants. Out of 57 participants, 37 answered the questionnaire surveys 4 times: 48 hours before the training, followed by post-surveys at 24 hours, 6 weeks, and 12 weeks after the training. The linear mixed effect model was applied for the analysis.
Results Four components out of 6 perceived patient safety competencies improved at 6 weeks but returned to a similar level before training at 12 weeks. Competencies of “communicating effectively,” “managing safety risks,” “understanding human and environmental factors that influence patient safety,” and “recognize and respond to remove immediate risks of harm” are statistically significant both overall and in the comparison between before the training and 6 weeks after the training.
Conclusion Interprofessional simulation programs contributed to developing some areas of patient safety competencies of healthcare professionals, but only for a limited time. Interprofessional simulation programs should be repeated and combined with other forms of support, including case discussions and debriefings, to ensure lasting effects.
Purpose This study investigated the validity of introducing a clinical skills examination (CSE) to the Korean Oriental Medicine Licensing Examination through a mixed-method modified Delphi study.
Methods A 3-round Delphi study was conducted between September and November 2022. The expert panel comprised 21 oriental medicine education experts who were officially recommended by relevant institutions and organizations. The questionnaires included potential content for the CSE and a detailed implementation strategy. Subcommittees were formed to discuss concerns around the introduction of the CSE, which were collected as open-ended questions. In this study, a 66.7% or greater agreement rate was defined as achieving a consensus.
Results The expert panel’s evaluation of the proposed clinical presentations and basic clinical skills suggested their priorities. Of the 10 items investigated for building a detailed implementation strategy for the introduction of the CSE to the Korean Oriental Medicine Licensing Examination, a consensus was achieved on 9. However, the agreement rate on the timing of the introduction of the CSE was low. Concerns around 4 clinical topics were discussed in the subcommittees, and potential solutions were proposed.
Conclusion This study offers preliminary data and raises some concerns that can be used as a reference while discussing the introduction of the CSE to the Korean Oriental Medicine Licensing Examination.
Purpose There is limited literature related to the assessment of electronic medical record (EMR)-related competencies. To address this gap, this study explored the feasibility of an EMR objective structured clinical examination (OSCE) station to evaluate medical students’ communication skills by psychometric analyses and standardized patients’ (SPs) perspectives on EMR use in an OSCE.
Methods An OSCE station that incorporated the use of an EMR was developed and pilot-tested in March 2020. Students’ communication skills were assessed by SPs and physician examiners. Students’ scores were compared between the EMR station and 9 other stations. A psychometric analysis, including item total correlation, was done. SPs participated in a post-OSCE focus group to discuss their perception of EMRs’ effect on communication.
Results Ninety-nine 3rd-year medical students participated in a 10-station OSCE that included the use of the EMR station. The EMR station had an acceptable item total correlation (0.217). Students who leveraged graphical displays in counseling received higher OSCE station scores from the SPs (P=0.041). The thematic analysis of SPs’ perceptions of students’ EMR use from the focus group revealed the following domains of themes: technology, communication, case design, ownership of health information, and timing of EMR usage.
Conclusion This study demonstrated the feasibility of incorporating EMR in assessing learner communication skills in an OSCE. The EMR station had acceptable psychometric characteristics. Some medical students were able to efficiently use the EMRs as an aid in patient counseling. Teaching students how to be patient-centered even in the presence of technology may promote engagement.
Purpose This study aimed to develop a test scale to measure the character qualities of medical students as a follow-up study on the 8 core character qualities revealed in a previous report.
Methods In total, 160 preliminary items were developed to measure 8 core character qualities. Twenty questions were assigned to each quality, and a questionnaire survey was conducted among 856 students in 5 medical schools in Korea. Using the partial credit model, polytomous item response theory analysis was carried out to analyze the goodness-of-fit, followed by exploratory factor analysis. Finally, confirmatory factor and reliability analyses were conducted with the final selected items.
Results The preliminary items for the 8 core character qualities were administered to the participants. Data from 767 students were included in the final analysis. Of the 160 preliminary items, 25 were removed by classical test theory analysis and 17 more by polytomous item response theory assessment. A total of 118 items and sub-factors were selected for exploratory factor analysis. Finally, 79 items were selected, and the validity and reliability were confirmed through confirmatory factor analysis and intra-item relevance analysis.
Conclusion The character qualities test scale developed through this study can be used to measure the character qualities corresponding to the educational goals and visions of individual medical schools in Korea. Furthermore, this measurement tool can serve as primary data for developing character qualities tools tailored to each medical school’s vision and educational goals.
Purpose This study investigated the effect of evaluations based on the Anesthetic List Management Assessment Tool (ALMAT) form on improving the technical and non-technical skills of final-year nurse anesthesia students at Ahvaz Jundishapur University of Medical Sciences (AJUMS).
Methods This was a semi-experimental study with a pre-test and post-test design. It included 45 final-year nurse anesthesia students of AJUMS and lasted for 3 months. The technical and non-technical skills of the intervention group were assessed at 4 university hospitals using formative-feedback evaluation based on the ALMAT form, from induction of anesthesia until reaching mastery and independence. Finally, the students’ degree of improvement in technical and non-technical skills was compared between the intervention and control groups. Statistical tests (the independent t-test, paired t-test, and Mann-Whitney test) were used to analyze the data.
Results The rate of improvement in post-test scores of technical skills was significantly higher in the intervention group than in the control group (P˂0.0001). Similarly, the students in the intervention group received significantly higher post-test scores for non-technical skills than the students in the control group (P˂0.0001).
Conclusion The findings of this study showed that the use of ALMAT as a formative-feedback evaluation method to evaluate technical and non-technical skills had a significant effect on improving these skills and was effective in helping students learn and reach mastery and independence.
Purpose This study aims to suggest the number of test items in each of 8 nursing activity categories of the Korean Nursing Licensing Examination, which comprises 134 activity statements including 275 items. The examination will be able to evaluate the minimum ability that nursing graduates must have to perform their duties. Methods: Two opinion surveys involving the members of 7 academic societies were conducted from March 19 to May 14, 2021. The survey results were reviewed by members of 4 expert associations from May 21 to June 4, 2021. The results for revised numbers of items in each category were compared with those reported by Tak and his colleagues and the National Council License Examination for Registered Nurses of the United States. Results: Based on 2 opinion surveys and previous studies, the suggestions for item allocation to 8 nursing activity categories of the Korean Nursing Licensing Examination in this study are as follows: 50 items for management of care and improvement of professionalism, 33 items for safety and infection control, 40 items for management of potential risk, 28 items for basic care, 47 items for physiological integrity and maintenance, 33 items for pharmacological and parenteral therapies, 24 items for psychosocial integrity and maintenance, and 20 items for health promotion and maintenance. Twenty other items related to health and medical laws were not included due to their mandatory status. Conclusion: These suggestions for the number of test items for each activity category will be helpful in developing new items for the Korean Nursing Licensing Examination.
Learning about one’s implicit bias is crucial for improving one’s cultural competency and thereby reducing health inequity. To evaluate bias among medical students following a previously developed cultural training program targeting New Zealand Māori, we developed a text-based, self-evaluation tool called the Similarity Rating Test (SRT). The development process of the SRT was resource-intensive, limiting its generalizability and applicability. Here, we explored the potential of ChatGPT, an automated chatbot, to assist in the development process of the SRT by comparing ChatGPT’s and students’ evaluations of the SRT. Despite results showing non-significant equivalence and difference between ChatGPT’s and students’ ratings, ChatGPT’s ratings were more consistent than students’ ratings. The consistency rate was higher for non-stereotypical than for stereotypical statements, regardless of rater type. Further studies are warranted to validate ChatGPT’s potential for assisting in SRT development for implementation in medical education and evaluation of ethnic stereotypes and related topics.
Citations
Citations to this article as recorded by
Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study Aleksandra Ignjatović, Lazar Stevanović Journal of Educational Evaluation for Health Professions.2023; 20: 28. CrossRef
Purpose This study aimed to detect relationships between undergraduate students’ attitudes toward communication skills learning and demographic variables (such as age, academic year, and gender). Understanding these relationships could provide information for communication skills facilitators and curriculum planners on structuring course delivery and integrating communication skills training into the medical curriculum.
Methods The descriptive study involved a survey of 369 undergraduate students from 2 medical schools in Zambia who participated in communication skills training stratified by academic year using the Communication Skills Attitude Scale. Data were collected between October and December 2021 and analyzed using IBM SPSS for Windows version 28.0.
Results One-way analysis of variance revealed a significant difference in attitude between at least 5 academic years. There was a significant difference in attitudes between the 2nd and 5th academic years (t=5.95, P˂0.001). No significant difference in attitudes existed among the academic years on the negative subscale; the 2nd and 3rd (t=3.82, P=0.004), 4th (t=3.61, P=0.011), 5th (t=8.36, P˂0.001), and 6th (t=4.20, P=0.001) academic years showed significant differences on the positive subscale. Age showed no correlation with attitudes. There was a more favorable attitude to learning communication skills among the women participants than among the men participants (P=0.006).
Conclusion Despite positive general attitudes toward learning communication skills, the difference in attitude between the genders, academic years 2 and 5, and the subsequent classes suggest a re-evaluation of the curriculum and teaching methods to facilitate appropriate course structure according to the academic years and a learning process that addressees gender differences.
Purpose The number of Korean midwifery licensing examination applicants has steadily decreased due to the low birth rate and lack of training institutions for midwives. This study aimed to evaluate the adequacy of the examination-based licensing system and the possibility of a training-based licensing system.
Methods A survey questionnaire was developed and dispatched to 230 professionals from December 28, 2022 to January 13, 2023, through an online form using Google Surveys. Descriptive statistics were used to analyze the results.
Results Responses from 217 persons (94.3%) were analyzed after excluding incomplete responses. Out of the 217 participants, 198 (91.2%) agreed with maintaining the current examination-based licensing system; 94 (43.3%) agreed with implementing a training-based licensing system to cover the examination costs due to the decreasing number of applicants; 132 (60.8%) agreed with establishing a midwifery education evaluation center for a training-based licensing system; 163 (75.1%) said that the quality of midwifery might be lowered if midwives were produced only by a training-based licensing system, and 197 (90.8%) said that the training of midwives as birth support personnel should be promoted in Korea.
Conclusion Favorable results were reported for the examination-based licensing system; however, if a training-based licensing system is implemented, it will be necessary to establish a midwifery education evaluation center to manage the quality of midwives. As the annual number of candidates for the Korean midwifery licensing examination has been approximately 10 in recent years, it is necessary to consider more actively granting midwifery licenses through a training-based licensing system.