Purpose To generate Cronbach’s alpha and further mixed methods construct validity evidence for the Blended Learning Usability Evaluation–Questionnaire (BLUE-Q).
Methods Forty interprofessional clinicians completed the BLUE-Q after finishing a 3-month long blended learning professional development program in Ontario, Canada. Reliability was assessed with Cronbach’s α for each of the 3 sections of the BLUE-Q and for all quantitative items together. Construct validity was evaluated through the Grand-Guillaume-Perrenoud et al. framework, which consists of 3 elements: congruence, convergence, and credibility. To compare quantitative and qualitative results, descriptive statistics, including means and standard deviations for each Likert scale item of the BLUE-Q were calculated.
Results Cronbach’s α was 0.95 for the pedagogical usability section, 0.85 for the synchronous modality section, 0.93 for the asynchronous modality section, and 0.96 for all quantitative items together. Mean ratings (with standard deviations) were 4.77 (0.506) for pedagogy, 4.64 (0.654) for synchronous learning, and 4.75 (0.536) for asynchronous learning. Of the 239 qualitative comments received, 178 were identified as substantive, of which 88% were considered congruent and 79% were considered convergent with the high means. Among all congruent responses, 69% were considered confirming statements and 31% were considered clarifying statements, suggesting appropriate credibility. Analysis of the clarifying statements assisted in identifying 5 categories of suggestions for program improvement.
Conclusion The BLUE-Q demonstrates high reliability and appropriate construct validity in the context of a blended learning program with interprofessional clinicians, making it a valuable tool for comprehensive program evaluation, quality improvement, and evaluative research in health professions education.
Purpose This study aimed to develop and validate the 21st Century Skills Assessment Scale (21CSAS) for Thai public health (PH) undergraduate students using the Partnership for 21st Century Skills framework.
Methods A cross-sectional survey was conducted among 727 first- to fourth-year PH undergraduate students from 4 autonomous universities in Thailand. Data were collected using self-administered questionnaires between January and March 2023. Exploratory factor analysis (EFA) was used to explore the underlying dimensions of 21CSAS, while confirmatory factor analysis (CFA) was conducted to test the hypothesized factor structure using Mplus software (Muthén & Muthén). Reliability and item discrimination were assessed using Cronbach’s α and the corrected item-total correlation, respectively.
Results EFA performed on a dataset of 300 students revealed a 20-item scale with a 6-factor structure: (1) creativity and innovation; (2) critical thinking and problem-solving; (3) information, media, and technology; (4) communication and collaboration; (5) initiative and self-direction; and (6) social and cross-cultural skills. The rotated eigenvalues ranged from 2.12 to 1.73. CFA performed on another dataset of 427 students confirmed a good model fit (χ2/degrees of freedom=2.67, comparative fit index=0.93, Tucker-Lewis index=0.91, root mean square error of approximation=0.06, standardized root mean square residual=0.06), explaining 34%–71% of variance in the items. Item loadings ranged from 0.58 to 0.84. The 21CSAS had a Cronbach’s α of 0.92.
Conclusion The 21CSAS proved be a valid and reliable tool for assessing 21st century skills among Thai PH undergraduate students. These findings provide insights for educational system to inform policy, practice, and research regarding 21st-century skills among undergraduate students.
Purpose With the coronavirus disease 2019 pandemic, online high-stakes exams have become a viable alternative. This study evaluated the feasibility of computer-based testing (CBT) for medical residency applications in Brazil and its impacts on item quality and applicants’ access compared to paper-based testing.
Methods In 2020, an online CBT was conducted in a Ribeirao Preto Clinical Hospital in Brazil. In total, 120 multiple-choice question items were constructed. Two years later, the exam was performed as paper-based testing. Item construction processes were similar for both exams. Difficulty and discrimination indexes, point-biserial coefficient, difficulty, discrimination, guessing parameters, and Cronbach’s α coefficient were measured based on the item response and classical test theories. Internet stability for applicants was monitored.
Results In 2020, 4,846 individuals (57.1% female, mean age of 26.64±3.37 years) applied to the residency program, versus 2,196 individuals (55.2% female, mean age of 26.47±3.20 years) in 2022. For CBT, there was an increase of 2,650 applicants (120.7%), albeit with significant differences in demographic characteristics. There was a significant increase in applicants from more distant and lower-income Brazilian regions, such as the North (5.6% vs. 2.7%) and Northeast (16.9% vs. 9.0%). No significant differences were found in difficulty and discrimination indexes, point-biserial coefficients, and Cronbach’s α coefficients between the 2 exams.
Conclusion Online CBT with multiple-choice questions was a viable format for a residency application exam, improving accessibility without compromising exam integrity and quality.
Purpose Immersive simulation is an innovative training approach in health education that enhances student learning. This study examined its impact on engagement, motivation, and academic performance in nursing and midwifery students.
Methods A comprehensive systematic search was meticulously conducted in 4 reputable databases—Scopus, PubMed, Web of Science, and Science Direct—following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. The research protocol was pre-registered in the PROSPERO registry, ensuring transparency and rigor. The quality of the included studies was assessed using the Medical Education Research Study Quality Instrument.
Results Out of 90 identified studies, 11 were included in the present review, involving 1,090 participants. Four out of 5 studies observed high post-test engagement scores in the intervention groups. Additionally, 5 out of 6 studies that evaluated motivation found higher post-test motivational scores in the intervention groups than in control groups using traditional approaches. Furthermore, among the 8 out of 11 studies that evaluated academic performance during immersive simulation training, 5 reported significant differences (P<0.001) in favor of the students in the intervention groups.
Conclusion Immersive simulation, as demonstrated by this study, has a significant potential to enhance student engagement, motivation, and academic performance, surpassing traditional teaching methods. This potential underscores the urgent need for future research in various contexts to better integrate this innovative educational approach into nursing and midwifery education curricula, inspiring hope for improved teaching methods.
Citations
Citations to this article as recorded by
Application of Virtual Reality, Artificial Intelligence, and Other Innovative Technologies in Healthcare Education (Nursing and Midwifery Specialties): Challenges and Strategies Galya Georgieva-Tsaneva, Ivanichka Serbezova, Silvia Beloeva Education Sciences.2024; 15(1): 11. CrossRef
Ariel Shana Frey-Vogel, Kristina Dzara, Kimberly Anne Gifford, Yoon Soo Park, Justin Berk, Allison Heinly, Darcy Wolcott, Daniel Adam Hall, Shannon Elliott Scott-Vernaglia, Katherine Anne Sparger, Erica Ye-pyng Chung
J Educ Eval Health Prof. 2024;21:3. Published online February 23, 2024
Purpose Despite educational mandates to assess resident teaching competence, limited instruments with validity evidence exist for this purpose. Existing instruments do not allow faculty to assess resident-led teaching in a large group format or whether teaching was interactive. This study gathers validity evidence on the use of the Resident-led Large Group Teaching Assessment Instrument (Relate), an instrument used by faculty to assess resident teaching competency. Relate comprises 23 behaviors divided into 6 elements: learning environment, goals and objectives, content of talk, promotion of understanding and retention, session management, and closure.
Methods Messick’s unified validity framework was used for this study. Investigators used video recordings of resident-led teaching from 3 pediatric residency programs to develop Relate and a rater guidebook. Faculty were trained on instrument use through frame-of-reference training. Resident teaching at all sites was video-recorded during 2018–2019. Two trained faculty raters assessed each video. Descriptive statistics on performance were obtained. Validity evidence sources include: rater training effect (response process), reliability and variability (internal structure), and impact on Milestones assessment (relations to other variables).
Results Forty-eight videos, from 16 residents, were analyzed. Rater training improved inter-rater reliability from 0.04 to 0.64. The Φ-coefficient reliability was 0.50. There was a significant correlation between overall Relate performance and the pediatric teaching Milestone (r=0.34, P=0.019).
Conclusion Relate provides validity evidence with sufficient reliability to measure resident-led large-group teaching competence.
Purpose This study presents item analysis results of the 26 health personnel licensing examinations managed by the Korea Health Personnel Licensing Examination Institute (KHPLEI) in 2022.
Methods The item difficulty index, item discrimination index, and reliability were calculated. The item discrimination index was calculated using a discrimination index based on the upper and lower 27% rule and the item-total correlation.
Results Out of 468,352 total examinees, 418,887 (89.4%) passed. The pass rates ranged from 27.3% for health educators level 1 to 97.1% for oriental medical doctors. Most examinations had a high average difficulty index, albeit to varying degrees, ranging from 61.3% for prosthetists and orthotists to 83.9% for care workers. The average discrimination index based on the upper and lower 27% rule ranged from 0.17 for oriental medical doctors to 0.38 for radiological technologists. The average item-total correlation ranged from 0.20 for oriental medical doctors to 0.38 for radiological technologists. The Cronbach α, as a measure of reliability, ranged from 0.872 for health educators-level 3 to 0.978 for medical technologists. The correlation coefficient between the average difficulty index and average discrimination index was -0.2452 (P=0.1557), that between the average difficulty index and the average item-total correlation was 0.3502 (P=0.0392), and that between the average discrimination index and the average item-total correlation was 0.7944 (P<0.0001).
Conclusion This technical report presents the item analysis results and reliability of the recent examinations by the KHPLEI, demonstrating an acceptable range of difficulty index and discrimination index values, as well as good reliability.
Purpose We aimed to describe the performance and evaluate the educational value of justifications provided by artificial intelligence chatbots, including GPT-3.5, GPT-4, Bard, Claude, and Bing, on the Peruvian National Medical Licensing Examination (P-NLME).
Methods This was a cross-sectional analytical study. On July 25, 2023, each multiple-choice question (MCQ) from the P-NLME was entered into each chatbot (GPT-3, GPT-4, Bing, Bard, and Claude) 3 times. Then, 4 medical educators categorized the MCQs in terms of medical area, item type, and whether the MCQ required Peru-specific knowledge. They assessed the educational value of the justifications from the 2 top performers (GPT-4 and Bing).
Results GPT-4 scored 86.7% and Bing scored 82.2%, followed by Bard and Claude, and the historical performance of Peruvian examinees was 55%. Among the factors associated with correct answers, only MCQs that required Peru-specific knowledge had lower odds (odds ratio, 0.23; 95% confidence interval, 0.09–0.61), whereas the remaining factors showed no associations. In assessing the educational value of justifications provided by GPT-4 and Bing, neither showed any significant differences in certainty, usefulness, or potential use in the classroom.
Conclusion Among chatbots, GPT-4 and Bing were the top performers, with Bing performing better at Peru-specific MCQs. Moreover, the educational value of justifications provided by the GPT-4 and Bing could be deemed appropriate. However, it is essential to start addressing the educational value of these chatbots, rather than merely their performance on examinations.
Citations
Citations to this article as recorded by
PICOT questions and search strategies formulation: A novel approach using artificial intelligence automation Lucija Gosak, Gregor Štiglic, Lisiane Pruinelli, Dominika Vrbnjak Journal of Nursing Scholarship.2025; 57(1): 5. CrossRef
Capable exam-taker and question-generator: the dual role of generative AI in medical education assessment Yihong Qiu, Chang Liu Global Medical Education.2025;[Epub] CrossRef
Comparison of artificial intelligence systems in answering prosthodontics questions from the dental specialty exam in Turkey Busra Tosun, Zeynep Sen Yilmaz Journal of Dental Sciences.2025;[Epub] CrossRef
Benchmarking LLM chatbots’ oncological knowledge with the Turkish Society of Medical Oncology’s annual board examination questions Efe Cem Erdat, Engin Eren Kavak BMC Cancer.2025;[Epub] CrossRef
Performance of GPT-4V in Answering the Japanese Otolaryngology Board Certification Examination Questions: Evaluation Study Masao Noda, Takayoshi Ueno, Ryota Koshu, Yuji Takaso, Mari Dias Shimada, Chizu Saito, Hisashi Sugimoto, Hiroaki Fushiki, Makoto Ito, Akihiro Nomura, Tomokazu Yoshizaki JMIR Medical Education.2024; 10: e57054. CrossRef
Response to Letter to the Editor re: “Artificial Intelligence Versus Expert Plastic Surgeon: Comparative Study Shows ChatGPT ‘Wins' Rhinoplasty Consultations: Should We Be Worried? [1]” by Durairaj et al Kay Durairaj, Omer Baker Facial Plastic Surgery & Aesthetic Medicine.2024; 26(3): 276. CrossRef
Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review Xiaojun Xu, Yixiao Chen, Jing Miao Journal of Educational Evaluation for Health Professions.2024; 21: 6. CrossRef
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis Mingxin Liu, Tsuyoshi Okuhara, XinYi Chang, Ritsuko Shirabe, Yuriko Nishiie, Hiroko Okada, Takahiro Kiuchi Journal of Medical Internet Research.2024; 26: e60807. CrossRef
Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study Giacomo Rossettini, Lia Rodeghiero, Federica Corradi, Chad Cook, Paolo Pillastrini, Andrea Turolla, Greta Castellini, Stefania Chiappinotto, Silvia Gianola, Alvisa Palese BMC Medical Education.2024;[Epub] CrossRef
Evaluating the competency of ChatGPT in MRCP Part 1 and a systematic literature review of its capabilities in postgraduate medical assessments Oliver Vij, Henry Calver, Nikki Myall, Mrinalini Dey, Koushan Kouranloo, Thiago P. Fernandes PLOS ONE.2024; 19(7): e0307372. CrossRef
Large Language Models in Pediatric Education: Current Uses and Future Potential Srinivasan Suresh, Sanghamitra M. Misra Pediatrics.2024;[Epub] CrossRef
Comparison of the Performance of ChatGPT, Claude and Bard in Support of Myopia Prevention and Control Yan Wang, Lihua Liang, Ran Li, Yihua Wang, Changfu Hao Journal of Multidisciplinary Healthcare.2024; Volume 17: 3917. CrossRef
Evaluating Large Language Models in Dental Anesthesiology: A Comparative Analysis of ChatGPT-4, Claude 3 Opus, and Gemini 1.0 on the Japanese Dental Society of Anesthesiology Board Certification Exam Misaki Fujimoto, Hidetaka Kuroda, Tomomi Katayama, Atsuki Yamaguchi, Norika Katagiri, Keita Kagawa, Shota Tsukimoto, Akito Nakano, Uno Imaizumi, Aiji Sato-Boku, Naotaka Kishimoto, Tomoki Itamiya, Kanta Kido, Takuro Sanuki Cureus.2024;[Epub] CrossRef
Dermatological Knowledge and Image Analysis Performance of Large Language Models Based on Specialty Certificate Examination in Dermatology Ka Siu Fan, Ka Hay Fan Dermato.2024; 4(4): 124. CrossRef
ChatGPT and Other Large Language Models in Medical Education — Scoping Literature Review Alexandra Aster, Matthias Carl Laupichler, Tamina Rockwell-Kollmann, Gilda Masala, Ebru Bala, Tobias Raupach Medical Science Educator.2024;[Epub] CrossRef
Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study Yikai Chen, Xiujie Huang, Fangjie Yang, Haiming Lin, Haoyu Lin, Zhuoqun Zheng, Qifeng Liang, Jinhai Zhang, Xinxin Li BMC Medical Education.2024;[Epub] CrossRef
Using large language models (ChatGPT, Copilot, PaLM, Bard, and Gemini) in Gross Anatomy course: Comparative analysis Volodymyr Mavrych, Paul Ganguly, Olena Bolgova Clinical Anatomy.2024;[Epub] CrossRef
Information amount, accuracy, and relevance of generative artificial intelligence platforms’ answers regarding learning objectives of medical arthropodology evaluated in English and Korean queries in December 2023: a descriptive study Hyunju Lee, Soobin Park Journal of Educational Evaluation for Health Professions.2023; 20: 39. CrossRef
A virtual point-of-care ultrasound (POCUS) education program was initiated to introduce handheld ultrasound technology to Georgetown Public Hospital Corporation in Guyana, a low-resource setting. We studied ultrasound competency and participant satisfaction in a cohort of 20 physicians-in-training through the urology clinic. The program consisted of a training phase, where they learned how to use the Butterfly iQ ultrasound, and a mentored implementation phase, where they applied their skills in the clinic. The assessment was through written exams and an objective structured clinical exam (OSCE). Fourteen students completed the program. The written exam scores were 3.36/5 in the training phase and 3.57/5 in the mentored implementation phase, and all students earned 100% on the OSCE. Students expressed satisfaction with the program. Our POCUS education program demonstrates the potential to teach clinical skills in low-resource settings and the value of virtual global health partnerships in advancing POCUS and minimally invasive diagnostics.
Citations
Citations to this article as recorded by
A Clinician’s Guide to the Implementation of Point-of-Care Ultrasound (POCUS) in the Outpatient Practice Joshua Overgaard, Bright P. Thilagar, Mohammed Nadir Bhuiyan Journal of Primary Care & Community Health.2024;[Epub] CrossRef
Efficacy of Handheld Ultrasound in Medical Education: A Comprehensive Systematic Review and Narrative Analysis Mariam Haji-Hassan, Roxana-Denisa Capraș, Sorana D. Bolboacă Diagnostics.2023; 13(24): 3665. CrossRef
This study aimed to compare the knowledge and interpretation ability of ChatGPT, a language model of artificial general intelligence, with those of medical students in Korea by administering a parasitology examination to both ChatGPT and medical students. The examination consisted of 79 items and was administered to ChatGPT on January 1, 2023. The examination results were analyzed in terms of ChatGPT’s overall performance score, its correct answer rate by the items’ knowledge level, and the acceptability of its explanations of the items. ChatGPT’s performance was lower than that of the medical students, and ChatGPT’s correct answer rate was not related to the items’ knowledge level. However, there was a relationship between acceptable explanations and correct answers. In conclusion, ChatGPT’s knowledge and interpretation ability for this parasitology examination were not yet comparable to those of medical students in Korea.
Citations
Citations to this article as recorded by
ChatGPT and the AI revolution: a comprehensive investigation of its multidimensional impact and potential Mohd Afjal Library Hi Tech.2025; 43(1): 353. CrossRef
Utility of ChatGPT as a preparation tool for the Orthopaedic In‐Training Examination Dhruv Mendiratta, Isabel Herzog, Rohan Singh, Ashok Para, Tej Joshi, Michael Vosbikian, Neil Kaushal Journal of Experimental Orthopaedics.2025;[Epub] CrossRef
Exploring knowledge, attitudes, and practices of academics in the field of educational sciences towards using ChatGPT Burcu Karafil, Ahmet Uyar Education and Information Technologies.2025;[Epub] CrossRef
Factors influencing Chinese pre-service teachers’ adoption of generative AI in teaching: an empirical study based on UTAUT2 and PLS-SEM Linlin Hu, Hao Wang, Yunfei Xin Education and Information Technologies.2025;[Epub] CrossRef
Integrating AI Technology Into Language Teacher Education: Challenges, Potentials, and Assumptions Rod Case, Leping Liu, Joseph Mintz Computers in the Schools.2025; : 1. CrossRef
Performance of ChatGPT-3.5 and ChatGPT-4 in the Taiwan National Pharmacist Licensing Examination: Comparative Evaluation Study Ying-Mei Wang, Hung-Wei Shen, Tzeng-Ji Chen, Shu-Chiung Chiang, Ting-Guan Lin JMIR Medical Education.2025; 11: e56850. CrossRef
Unveiling the impact of ChatGPT: investigating self-efficacy, anxiety and motivation on student performance in blended learning environments Ridwan Daud Mahande, M. Miftach Fakhri, Irwansyah Suwahyu, Dwi Rezky Anandari Sulaiman Journal of Applied Research in Higher Education.2025;[Epub] CrossRef
Performance of ChatGPT on the India Undergraduate Community Medicine Examination: Cross-Sectional Study Aravind P Gandhi, Felista Karen Joesph, Vineeth Rajagopal, P Aparnavi, Sushma Katkuri, Sonal Dayama, Prakasini Satapathy, Mahalaqua Nazli Khatib, Shilpa Gaidhane, Quazi Syed Zahiruddin, Ashish Behera JMIR Formative Research.2024; 8: e49964. CrossRef
Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT Jad Abi-Rafeh, Hong Hao Xu, Roy Kazan, Ruth Tevlin, Heather Furnas Aesthetic Surgery Journal.2024; 44(3): 329. CrossRef
Redesigning Tertiary Educational Evaluation with AI: A Task-Based Analysis of LIS Students’ Assessment on Written Tests and Utilizing ChatGPT at NSTU Shamima Yesmin Science & Technology Libraries.2024; 43(4): 355. CrossRef
Unveiling the ChatGPT phenomenon: Evaluating the consistency and accuracy of endodontic question answers Ana Suárez, Víctor Díaz‐Flores García, Juan Algar, Margarita Gómez Sánchez, María Llorente de Pedro, Yolanda Freire International Endodontic Journal.2024; 57(1): 108. CrossRef
Bob or Bot: Exploring ChatGPT's Answers to University Computer Science Assessment Mike Richards, Kevin Waugh, Mark Slaymaker, Marian Petre, John Woodthorpe, Daniel Gooch ACM Transactions on Computing Education.2024; 24(1): 1. CrossRef
A systematic review of ChatGPT use in K‐12 education Peng Zhang, Gemma Tur European Journal of Education.2024;[Epub] CrossRef
Evaluating ChatGPT as a self‐learning tool in medical biochemistry: A performance assessment in undergraduate medical university examination Krishna Mohan Surapaneni, Anusha Rajajagadeesan, Lakshmi Goudhaman, Shalini Lakshmanan, Saranya Sundaramoorthi, Dineshkumar Ravi, Kalaiselvi Rajendiran, Porchelvan Swaminathan Biochemistry and Molecular Biology Education.2024; 52(2): 237. CrossRef
Examining the use of ChatGPT in public universities in Hong Kong: a case study of restricted access areas Michelle W. T. Cheng, Iris H. Y. YIM Discover Education.2024;[Epub] CrossRef
Performance of ChatGPT on Ophthalmology-Related Questions Across Various Examination Levels: Observational Study Firas Haddad, Joanna S Saade JMIR Medical Education.2024; 10: e50842. CrossRef
Assessment of Artificial Intelligence Platforms With Regard to Medical Microbiology Knowledge: An Analysis of ChatGPT and Gemini Jai Ranjan, Absar Ahmad, Monalisa Subudhi, Ajay Kumar Cureus.2024;[Epub] CrossRef
A comparative vignette study: Evaluating the potential role of a generative AI model in enhancing clinical decision‐making in nursing Mor Saban, Ilana Dubovi Journal of Advanced Nursing.2024;[Epub] CrossRef
Comparison of the Performance of GPT-3.5 and GPT-4 With That of Medical Students on the Written German Medical Licensing Examination: Observational Study Annika Meyer, Janik Riese, Thomas Streichert JMIR Medical Education.2024; 10: e50965. CrossRef
From hype to insight: Exploring ChatGPT's early footprint in education via altmetrics and bibliometrics Lung‐Hsiang Wong, Hyejin Park, Chee‐Kit Looi Journal of Computer Assisted Learning.2024; 40(4): 1428. CrossRef
A scoping review of artificial intelligence in medical education: BEME Guide No. 84 Morris Gordon, Michelle Daniel, Aderonke Ajiboye, Hussein Uraiby, Nicole Y. Xu, Rangana Bartlett, Janice Hanson, Mary Haas, Maxwell Spadafore, Ciaran Grafton-Clarke, Rayhan Yousef Gasiea, Colin Michie, Janet Corral, Brian Kwan, Diana Dolmans, Satid Thamma Medical Teacher.2024; 46(4): 446. CrossRef
Üniversite Öğrencilerinin ChatGPT 3,5 Deneyimleri: Yapay Zekâyla Yazılmış Masal Varyantları Bilge GÖK, Fahri TEMİZYÜREK, Özlem BAŞ Korkut Ata Türkiyat Araştırmaları Dergisi.2024; (14): 1040. CrossRef
Tracking ChatGPT Research: Insights From the Literature and the Web Omar Mubin, Fady Alnajjar, Zouheir Trabelsi, Luqman Ali, Medha Mohan Ambali Parambil, Zhao Zou IEEE Access.2024; 12: 30518. CrossRef
Potential applications of ChatGPT in obstetrics and gynecology in Korea: a review article YooKyung Lee, So Yun Kim Obstetrics & Gynecology Science.2024; 67(2): 153. CrossRef
Application of generative language models to orthopaedic practice Jessica Caterson, Olivia Ambler, Nicholas Cereceda-Monteoliva, Matthew Horner, Andrew Jones, Arwel Tomos Poacher BMJ Open.2024; 14(3): e076484. CrossRef
Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review Xiaojun Xu, Yixiao Chen, Jing Miao Journal of Educational Evaluation for Health Professions.2024; 21: 6. CrossRef
The advent of ChatGPT: Job Made Easy or Job Loss to Data Analysts Abiola Timothy Owolabi, Oluwaseyi Oluwadamilare Okunlola, Emmanuel Taiwo Adewuyi, Janet Iyabo Idowu, Olasunkanmi James Oladapo WSEAS TRANSACTIONS ON COMPUTERS.2024; 23: 24. CrossRef
ChatGPT in dentomaxillofacial radiology education Hilal Peker Öztürk, Hakan Avsever, Buğra Şenel, Şükran Ayran, Mustafa Çağrı Peker, Hatice Seda Özgedik, Nurten Baysal Journal of Health Sciences and Medicine.2024; 7(2): 224. CrossRef
Performance of ChatGPT on the Korean National Examination for Dental Hygienists Soo-Myoung Bae, Hye-Rim Jeon, Gyoung-Nam Kim, Seon-Hui Kwak, Hyo-Jin Lee Journal of Dental Hygiene Science.2024; 24(1): 62. CrossRef
Medical knowledge of ChatGPT in public health, infectious diseases, COVID-19 pandemic, and vaccines: multiple choice questions examination based performance Sultan Ayoub Meo, Metib Alotaibi, Muhammad Zain Sultan Meo, Muhammad Omair Sultan Meo, Mashhood Hamid Frontiers in Public Health.2024;[Epub] CrossRef
Unlock the potential for Saudi Arabian higher education: a systematic review of the benefits of ChatGPT Eman Faisal Frontiers in Education.2024;[Epub] CrossRef
Does the Information Quality of ChatGPT Meet the Requirements of Orthopedics and Trauma Surgery? Adnan Kasapovic, Thaer Ali, Mari Babasiz, Jessica Bojko, Martin Gathen, Robert Kaczmarczyk, Jonas Roos Cureus.2024;[Epub] CrossRef
Exploring the Profile of University Assessments Flagged as Containing AI-Generated Material Daniel Gooch, Kevin Waugh, Mike Richards, Mark Slaymaker, John Woodthorpe ACM Inroads.2024; 15(2): 39. CrossRef
Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom’s Taxonomy Ambadasu Bharatha, Nkemcho Ojeh, Ahbab Mohammad Fazle Rabbi, Michael Campbell, Kandamaran Krishnamurthy, Rhaheem Layne-Yarde, Alok Kumar, Dale Springer, Kenneth Connell, Md Anwarul Majumder Advances in Medical Education and Practice.2024; Volume 15: 393. CrossRef
The emergence of generative artificial intelligence platforms in 2023, journal metrics, appreciation to reviewers and volunteers, and obituary Sun Huh Journal of Educational Evaluation for Health Professions.2024; 21: 9. CrossRef
ChatGPT, a Friend or a Foe in Medical Education: A Review of Strengths, Challenges, and Opportunities Mahdi Zarei, Maryam Zarei, Sina Hamzehzadeh, Sepehr Shakeri Bavil Oliyaei, Mohammad-Salar Hosseini Shiraz E-Medical Journal.2024;[Epub] CrossRef
Augmenting intensive care unit nursing practice with generative AI: A formative study of diagnostic synergies using simulation‐based clinical cases Chedva Levin, Moriya Suliman, Etti Naimi, Mor Saban Journal of Clinical Nursing.2024;[Epub] CrossRef
Artificial intelligence chatbots for the nutrition management of diabetes and the metabolic syndrome Farah Naja, Mandy Taktouk, Dana Matbouli, Sharfa Khaleel, Ayah Maher, Berna Uzun, Maryam Alameddine, Lara Nasreddine European Journal of Clinical Nutrition.2024; 78(10): 887. CrossRef
Large language models in healthcare: from a systematic review on medical examinations to a comparative analysis on fundamentals of robotic surgery online test Andrea Moglia, Konstantinos Georgiou, Pietro Cerveri, Luca Mainardi, Richard M. Satava, Alfred Cuschieri Artificial Intelligence Review.2024;[Epub] CrossRef
Is ChatGPT Enhancing Youth’s Learning, Engagement and Satisfaction? Christina Sanchita Shah, Smriti Mathur, Sushant Kr. Vishnoi Journal of Computer Information Systems.2024; : 1. CrossRef
Comparison of ChatGPT, Gemini, and Le Chat with physician interpretations of medical laboratory questions from an online health forum Annika Meyer, Ari Soleman, Janik Riese, Thomas Streichert Clinical Chemistry and Laboratory Medicine (CCLM).2024;[Epub] CrossRef
Performance of ChatGPT-3.5 and GPT-4 in national licensing examinations for medicine, pharmacy, dentistry, and nursing: a systematic review and meta-analysis Hye Kyung Jin, Ha Eun Lee, EunYoung Kim BMC Medical Education.2024;[Epub] CrossRef
Role of ChatGPT in Dentistry: A Review Pratik Surana, Priyanka P. Ostwal, Shruti Vishal Dev, Jayesh Tiwari, Kadire Shiva Charan Yadav, Gajji Renuka Research Journal of Pharmacy and Technology.2024; : 3489. CrossRef
Exploring the Current Applications and Effectiveness of ChatGPT in Nursing: An Integrative Review Yuan Luo, Yiqun Miao, Yuhan Zhao, Jiawei Li, Ying Wu Journal of Advanced Nursing.2024;[Epub] CrossRef
A Scoping Review on the Educational Applications of Generative AI in Primary and Secondary Education Solmoe Ahn, Jeongyoon Lee, Jungmin Park, Soyoung Jung, Jihoon Song The Journal of Korean Association of Computer Education.2024; 27(6): 11. CrossRef
Performance of GPT-3.5 and GPT-4 on the Korean Pharmacist Licensing Examination: Comparison Study Hye Kyung Jin, EunYoung Kim JMIR Medical Education.2024; 10: e57451. CrossRef
ChatGPT-Produced Content as a Resource in the Language Education Classroom: A Guiding Hand Rod E. Case, Leping Liu Computers in the Schools.2024; : 1. CrossRef
Evaluating the Feasibility of ChatGPT in Dental Morphology Education: A Pilot Study on AI-Assisted Learning in Dental Morphology Eun-Young Jeon, Hyun-Na Ahn, Jeong-Hyun Lee Journal of Dental Hygiene Science.2024; 24(4): 309. CrossRef
Detecting AI- generated versus human- written medical student essays: a semi-randomized controlled study (Preprint) Berin Doru, Christoph Maier, Johanna Sophie Busse, Thomas Lücke, Judith Schönhoff, Elena Enax- Krumova, Steffen Hessler, Maria Berger, Marianne Tokic JMIR Medical Education.2024;[Epub] CrossRef
Is ChatGPT reliable in education? Amal Abdullah Alibrahim South African Journal of Education.2024; 44(4): 1. CrossRef
Applicability of ChatGPT in Assisting to Solve Higher Order Problems in Pathology Ranwir K Sinha, Asitava Deb Roy, Nikhil Kumar, Himel Mondal Cureus.2023;[Epub] CrossRef
Issues in the 3rd year of the COVID-19 pandemic, including computer-based testing, study design, ChatGPT, journal metrics, and appreciation to reviewers Sun Huh Journal of Educational Evaluation for Health Professions.2023; 20: 5. CrossRef
Emergence of the metaverse and ChatGPT in journal publishing after the COVID-19 pandemic Sun Huh Science Editing.2023; 10(1): 1. CrossRef
Assessing the Capability of ChatGPT in Answering First- and Second-Order Knowledge Questions on Microbiology as per Competency-Based Medical Education Curriculum Dipmala Das, Nikhil Kumar, Langamba Angom Longjam, Ranwir Sinha, Asitava Deb Roy, Himel Mondal, Pratima Gupta Cureus.2023;[Epub] CrossRef
Evaluating ChatGPT's Ability to Solve Higher-Order Questions on the Competency-Based Medical Education Curriculum in Medical Biochemistry Arindam Ghosh, Aritri Bir Cureus.2023;[Epub] CrossRef
Overview of Early ChatGPT’s Presence in Medical Literature: Insights From a Hybrid Literature Review by ChatGPT and Human Experts Omar Temsah, Samina A Khan, Yazan Chaiah, Abdulrahman Senjab, Khalid Alhasan, Amr Jamal, Fadi Aljamaan, Khalid H Malki, Rabih Halwani, Jaffar A Al-Tawfiq, Mohamad-Hani Temsah, Ayman Al-Eyadhy Cureus.2023;[Epub] CrossRef
ChatGPT for Future Medical and Dental Research Bader Fatani Cureus.2023;[Epub] CrossRef
ChatGPT in Dentistry: A Comprehensive Review Hind M Alhaidry, Bader Fatani, Jenan O Alrayes, Aljowhara M Almana, Nawaf K Alfhaed Cureus.2023;[Epub] CrossRef
Can we trust AI chatbots’ answers about disease diagnosis and patient care? Sun Huh Journal of the Korean Medical Association.2023; 66(4): 218. CrossRef
Large Language Models in Medical Education: Opportunities, Challenges, and Future Directions Alaa Abd-alrazaq, Rawan AlSaad, Dari Alhuwail, Arfan Ahmed, Padraig Mark Healy, Syed Latifi, Sarah Aziz, Rafat Damseh, Sadam Alabed Alrazak, Javaid Sheikh JMIR Medical Education.2023; 9: e48291. CrossRef
Early applications of ChatGPT in medical practice, education and research Sam Sedaghat Clinical Medicine.2023; 23(3): 278. CrossRef
A Review of Research on Teaching and Learning Transformation under the Influence of ChatGPT Technology 璇 师 Advances in Education.2023; 13(05): 2617. CrossRef
Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study Soshi Takagi, Takashi Watari, Ayano Erabi, Kota Sakaguchi JMIR Medical Education.2023; 9: e48002. CrossRef
ChatGPT’s quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions Cosima C. Hoch, Barbara Wollenberg, Jan-Christoffer Lüers, Samuel Knoedler, Leonard Knoedler, Konstantin Frank, Sebastian Cotofana, Michael Alfertshofer European Archives of Oto-Rhino-Laryngology.2023; 280(9): 4271. CrossRef
Analysing the Applicability of ChatGPT, Bard, and Bing to Generate Reasoning-Based Multiple-Choice Questions in Medical Physiology Mayank Agarwal, Priyanka Sharma, Ayan Goswami Cureus.2023;[Epub] CrossRef
The Intersection of ChatGPT, Clinical Medicine, and Medical Education Rebecca Shin-Yee Wong, Long Chiau Ming, Raja Affendi Raja Ali JMIR Medical Education.2023; 9: e47274. CrossRef
The Role of Artificial Intelligence in Higher Education: ChatGPT Assessment for Anatomy Course Tarık TALAN, Yusuf KALINKARA Uluslararası Yönetim Bilişim Sistemleri ve Bilgisayar Bilimleri Dergisi.2023; 7(1): 33. CrossRef
Comparing ChatGPT’s ability to rate the degree of stereotypes and the consistency of stereotype attribution with those of medical students in New Zealand in developing a similarity rating test: a methodological study Chao-Cheng Lin, Zaine Akuhata-Huntington, Che-Wei Hsu Journal of Educational Evaluation for Health Professions.2023; 20: 17. CrossRef
Assessing the Efficacy of ChatGPT in Solving Questions Based on the Core Concepts in Physiology Arijita Banerjee, Aquil Ahmad, Payal Bhalla, Kavita Goyal Cureus.2023;[Epub] CrossRef
ChatGPT Performs on the Chinese National Medical Licensing Examination Xinyi Wang, Zhenye Gong, Guoxin Wang, Jingdan Jia, Ying Xu, Jialu Zhao, Qingye Fan, Shaun Wu, Weiguo Hu, Xiaoyang Li Journal of Medical Systems.2023;[Epub] CrossRef
Artificial intelligence and its impact on job opportunities among university students in North Lima, 2023 Doris Ruiz-Talavera, Jaime Enrique De la Cruz-Aguero, Nereo García-Palomino, Renzo Calderón-Espinoza, William Joel Marín-Rodriguez ICST Transactions on Scalable Information Systems.2023;[Epub] CrossRef
Revolutionizing Dental Care: A Comprehensive Review of Artificial Intelligence Applications Among Various Dental Specialties Najd Alzaid, Omar Ghulam, Modhi Albani, Rafa Alharbi, Mayan Othman, Hasan Taher, Saleem Albaradie, Suhael Ahmed Cureus.2023;[Epub] CrossRef
Opportunities, Challenges, and Future Directions of Generative Artificial Intelligence in Medical Education: Scoping Review Carl Preiksaitis, Christian Rose JMIR Medical Education.2023; 9: e48785. CrossRef
Exploring the impact of language models, such as ChatGPT, on student learning and assessment Araz Zirar Review of Education.2023;[Epub] CrossRef
Evaluating the reliability of ChatGPT as a tool for imaging test referral: a comparative study with a clinical decision support system Shani Rosen, Mor Saban European Radiology.2023; 34(5): 2826. CrossRef
The Significance of Artificial Intelligence Platforms in Anatomy Education: An Experience With ChatGPT and Google Bard Hasan B Ilgaz, Zehra Çelik Cureus.2023;[Epub] CrossRef
Is ChatGPT’s Knowledge and Interpretative Ability Comparable to First Professional MBBS (Bachelor of Medicine, Bachelor of Surgery) Students of India in Taking a Medical Biochemistry Examination? Abhra Ghosh, Nandita Maini Jindal, Vikram K Gupta, Ekta Bansal, Navjot Kaur Bajwa, Abhishek Sett Cureus.2023;[Epub] CrossRef
Ethical consideration of the use of generative artificial intelligence, including ChatGPT in writing a nursing article Sun Huh Child Health Nursing Research.2023; 29(4): 249. CrossRef
Potential Use of ChatGPT for Patient Information in Periodontology: A Descriptive Pilot Study Osman Babayiğit, Zeynep Tastan Eroglu, Dilek Ozkan Sen, Fatma Ucan Yarkac Cureus.2023;[Epub] CrossRef
Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study Aleksandra Ignjatović, Lazar Stevanović Journal of Educational Evaluation for Health Professions.2023; 20: 28. CrossRef
Assessing the Performance of ChatGPT in Medical Biochemistry Using Clinical Case Vignettes: Observational Study Krishna Mohan Surapaneni JMIR Medical Education.2023; 9: e47191. CrossRef
Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical Examination: a cross-sectional study Betzy Clariza Torres-Zegarra, Wagner Rios-Garcia, Alvaro Micael Ñaña-Cordova, Karen Fatima Arteaga-Cisneros, Xiomara Cristina Benavente Chalco, Marina Atena Bustamante Ordoñez, Carlos Jesus Gutierrez Rios, Carlos Alberto Ramos Godoy, Kristell Luisa Teresa Journal of Educational Evaluation for Health Professions.2023; 20: 30. CrossRef
ChatGPT’s performance in German OB/GYN exams – paving the way for AI-enhanced medical education and clinical practice Maximilian Riedel, Katharina Kaefinger, Antonia Stuehrenberg, Viktoria Ritter, Niklas Amann, Anna Graf, Florian Recker, Evelyn Klein, Marion Kiechle, Fabian Riedel, Bastian Meyer Frontiers in Medicine.2023;[Epub] CrossRef
Medical students’ patterns of using ChatGPT as a feedback tool and perceptions of ChatGPT in a Leadership and Communication course in Korea: a cross-sectional study Janghee Park Journal of Educational Evaluation for Health Professions.2023; 20: 29. CrossRef
FROM TEXT TO DIAGNOSE: CHATGPT’S EFFICACY IN MEDICAL DECISION-MAKING Yaroslav Mykhalko, Pavlo Kish, Yelyzaveta Rubtsova, Oleksandr Kutsyn, Valentyna Koval Wiadomości Lekarskie.2023; 76(11): 2345. CrossRef
Using ChatGPT for Clinical Practice and Medical Education: Cross-Sectional Survey of Medical Students’ and Physicians’ Perceptions Pasin Tangadulrat, Supinya Sono, Boonsin Tangtrakulwanich JMIR Medical Education.2023; 9: e50658. CrossRef
Below average ChatGPT performance in medical microbiology exam compared to university students Malik Sallam, Khaled Al-Salahat Frontiers in Education.2023;[Epub] CrossRef
ChatGPT: "To be or not to be" ... in academic research. The human mind's analytical rigor and capacity to discriminate between AI bots' truths and hallucinations Aurelian Anghelescu, Ilinca Ciobanu, Constantin Munteanu, Lucia Ana Maria Anghelescu, Gelu Onose Balneo and PRM Research Journal.2023; 14(Vol.14, no): 614. CrossRef
ChatGPT Review: A Sophisticated Chatbot Models in Medical & Health-related Teaching and Learning Nur Izah Ab Razak, Muhammad Fawwaz Muhammad Yusoff, Rahmita Wirza O.K. Rahmat Malaysian Journal of Medicine and Health Sciences.2023; 19(s12): 98. CrossRef
Application of artificial intelligence chatbots, including ChatGPT, in education, scholarly work, programming, and content generation and its prospects: a narrative review Tae Won Kim Journal of Educational Evaluation for Health Professions.2023; 20: 38. CrossRef
Trends in research on ChatGPT and adoption-related issues discussed in articles: a narrative review Sang-Jun Kim Science Editing.2023; 11(1): 3. CrossRef
Information amount, accuracy, and relevance of generative artificial intelligence platforms’ answers regarding learning objectives of medical arthropodology evaluated in English and Korean queries in December 2023: a descriptive study Hyunju Lee, Soobin Park Journal of Educational Evaluation for Health Professions.2023; 20: 39. CrossRef
What will ChatGPT revolutionize in the financial industry? Hassnian Ali, Ahmet Faruk Aysan Modern Finance.2023; 1(1): 116. CrossRef
Purpose This study aimed to identify factors that have been studied for their associations with National Licensing Examination (ENAM) scores in Peru.
Methods A search was conducted of literature databases and registers, including EMBASE, SciELO, Web of Science, MEDLINE, Peru’s National Register of Research Work, and Google Scholar. The following key terms were used: “ENAM” and “associated factors.” Studies in English and Spanish were included. The quality of the included studies was evaluated using the Medical Education Research Study Quality Instrument (MERSQI).
Results In total, 38,500 participants were enrolled in 12 studies. Most (11/12) studies were cross-sectional, except for one case-control study. Three studies were published in peer-reviewed journals. The mean MERSQI was 10.33. A better performance on the ENAM was associated with a higher-grade point average (GPA) (n=8), internship setting in EsSalud (n=4), and regular academic status (n=3). Other factors showed associations in various studies, such as medical school, internship setting, age, gender, socioeconomic status, simulations test, study resources, preparation time, learning styles, study techniques, test-anxiety, and self-regulated learning strategies.
Conclusion The ENAM is a multifactorial phenomenon; our model gives students a locus of control on what they can do to improve their score (i.e., implement self-regulated learning strategies) and faculty, health policymakers, and managers a framework to improve the ENAM score (i.e., design remediation programs to improve GPA and integrate anxiety-management courses into the curriculum).
Citations
Citations to this article as recorded by
Peruvian medical residency selection: a portrayal of scores, distribution, and predictors of 28,872 applicants between 2019 and 2023 Javier A. Flores-Cohaila, Brayan Miranda-Chavez, Cesar Copaja-Corzo, Xiomara C. Benavente-Chalco, Wagner Rios-García, Vanessa P. Moreno-Ccama, Angel Samanez-Obeso, Marco Rivarola-Hidalgo BMC Medical Education.2025;[Epub] CrossRef
Medical Student’s Attitudes towards Implementation of National Licensing Exam (NLE) – A Qualitative Exploratory Study Saima Bashir, Rehan Ahmed Khan Pakistan Journal of Health Sciences.2024; : 153. CrossRef
Performance of ChatGPT on the Peruvian National Licensing Medical Examination: Cross-Sectional Study Javier A Flores-Cohaila, Abigaíl García-Vicente, Sonia F Vizcarra-Jiménez, Janith P De la Cruz-Galán, Jesús D Gutiérrez-Arratia, Blanca Geraldine Quiroga Torres, Alvaro Taype-Rondan JMIR Medical Education.2023; 9: e48039. CrossRef
Purpose This study aims to apply the yes/no Angoff and Hofstee methods to actual Korean Medical Licensing Examination (KMLE) 2022 written examination data to estimate cut scores for the written KMLE.
Methods Fourteen panelists gathered to derive the cut score of the 86th KMLE written examination data using the yes/no Angoff method. The panel reviewed the items individually before the meeting and shared their respective understanding of the minimum-competency physician. The standard setting process was conducted in 5 rounds over a total of 800 minutes. In addition, 2 rounds of the Hofstee method were conducted before starting the standard setting process and after the second round of yes/no Angoff.
Results For yes/no Angoff, as each round progressed, the panel’s opinion gradually converged to a cut score of 198 points, and the final passing rate was 95.1%. The Hofstee cut score was 208 points out of a maximum 320 with a passing rate of 92.1% at the first round. It scored 204 points with a passing rate of 93.3% in the second round.
Conclusion The difference between the cut scores obtained through yes/no Angoff and Hofstee methods did not exceed 2% points, and they were within the range of cut scores from previous studies. In both methods, the difference between the panelists decreased as rounds were repeated. Overall, our findings suggest the acceptability of cut scores and the possibility of independent use of both methods.
Citations
Citations to this article as recorded by
Issues in the 3rd year of the COVID-19 pandemic, including computer-based testing, study design, ChatGPT, journal metrics, and appreciation to reviewers Sun Huh Journal of Educational Evaluation for Health Professions.2023; 20: 5. CrossRef
Presidential address: improving item validity and adopting computer-based testing, clinical skills assessments, artificial intelligence, and virtual reality in health professions licensing examinations in Korea Hyunjoo Pai Journal of Educational Evaluation for Health Professions.2023; 20: 8. CrossRef
Purpose Undertaking a standard-setting exercise is a common method for setting pass/fail cut scores for high-stakes examinations. The recently introduced equal Z standard-setting method (EZ method) has been found to be a valid and effective alternative for the commonly used Angoff and Hofstee methods and their variants. The current study aims to estimate the minimum number of panelists required for obtaining acceptable and reliable cut scores using the EZ method.
Methods The primary data were extracted from 31 panelists who used the EZ method for setting cut scores for a 12-station of medical school’s final objective structured clinical examination (OSCE) in Taiwan. For this study, a new data set composed of 1,000 random samples of different panel sizes, ranging from 5 to 25 panelists, was established and analyzed. Analysis of variance was performed to measure the differences in the cut scores set by the sampled groups, across all sizes within each station.
Results On average, a panel of 10 experts or more yielded cut scores with confidence more than or equal to 90% and 15 experts yielded cut scores with confidence more than or equal to 95%. No significant differences in cut scores associated with panel size were identified for panels of 5 or more experts.
Conclusion The EZ method was found to be valid and feasible. Less than an hour was required for 12 panelists to assess 12 OSCE stations. Calculating the cut scores required only basic statistical skills.
Purpose This study investigated whether the reliability was acceptable when the number of cases in the objective structured clinical examination (OSCE) decreased from 12 to 8 using generalizability theory (GT).
Methods This psychometric study analyzed the OSCE data of 439 fourth-year medical students conducted in the Busan and Gyeongnam areas of South Korea from July 12 to 15, 2021. The generalizability study (G-study) considered 3 facets—students (p), cases (c), and items (i)—and designed the analysis as p×(i:c) due to items being nested in a case. The acceptable generalizability (G) coefficient was set to 0.70. The G-study and decision study (D-study) were performed using G String IV ver. 6.3.8 (Papawork, Hamilton, ON, Canada).
Results All G coefficients except for July 14 (0.69) were above 0.70. The major sources of variance components (VCs) were items nested in cases (i:c), from 51.34% to 57.70%, and residual error (pi:c), from 39.55% to 43.26%. The proportion of VCs in cases was negligible, ranging from 0% to 2.03%.
Conclusion The case numbers decreased in the 2021 Busan and Gyeongnam OSCE. However, the reliability was acceptable. In the D-study, reliability was maintained at 0.70 or higher if there were more than 21 items/case in 8 cases and more than 18 items/case in 9 cases. However, according to the G-study, increasing the number of items nested in cases rather than the number of cases could further improve reliability. The consortium needs to maintain a case bank with various items to implement a reliable blueprinting combination for the OSCE.
Citations
Citations to this article as recorded by
Applying the Generalizability Theory to Identify the Sources of Validity Evidence for the Quality of Communication Questionnaire Flávia Del Castanhel, Fernanda R. Fonseca, Luciana Bonnassis Burg, Leonardo Maia Nogueira, Getúlio Rodrigues de Oliveira Filho, Suely Grosseman American Journal of Hospice and Palliative Medicine®.2024; 41(7): 792. CrossRef
Purpose The percent Angoff (PA) method has been recommended as a reliable method to set the cutoff score instead of a fixed cut point of 60% in the Korean Medical Licensing Examination (KMLE). The yes/no Angoff (YNA) method, which is easy for panelists to judge, can be considered as an alternative because the KMLE has many items to evaluate. This study aimed to compare the cutoff score and the reliability depending on whether the PA or the YNA standard-setting method was used in the KMLE.
Methods The materials were the open-access PA data of the KMLE. The PA data were converted to YNA data in 5 categories, in which the probabilities for a “yes” decision by panelists were 50%, 60%, 70%, 80%, and 90%. SPSS for descriptive analysis and G-string for generalizability theory were used to present the results.
Results The PA method and the YNA method counting 60% as “yes,” estimated similar cutoff scores. Those cutoff scores were deemed acceptable based on the results of the Hofstee method. The highest reliability coefficients estimated by the generalizability test were from the PA method and the YNA method, with probabilities of 70%, 80%, 60%, and 50% for deciding “yes,” in descending order. The panelist’s specialty was the main cause of the error variance. The error size was similar regardless of the standard-setting method.
Conclusion The above results showed that the PA method was more reliable than the YNA method in estimating the cutoff score of the KMLE. However, the YNA method with a 60% probability for deciding “yes” also can be used as a substitute for the PA method in estimating the cutoff score of the KMLE.
Citations
Citations to this article as recorded by
Issues in the 3rd year of the COVID-19 pandemic, including computer-based testing, study design, ChatGPT, journal metrics, and appreciation to reviewers Sun Huh Journal of Educational Evaluation for Health Professions.2023; 20: 5. CrossRef
Possibility of independent use of the yes/no Angoff and Hofstee methods for the standard setting of the Korean Medical Licensing Examination written test: a descriptive study Do-Hwan Kim, Ye Ji Kang, Hoon-Ki Park Journal of Educational Evaluation for Health Professions.2022; 19: 33. CrossRef
Purpose Setting standards is critical in health professions. However, appropriate standard setting methods do not always apply to the set cut score in performance assessment. The aim of this study was to compare the cut score when the standard setting is changed from the norm-referenced method to the borderline group method (BGM) and borderline regression method (BRM) in an objective structured clinical examination (OSCE) in medical school.
Methods This was an explorative study to model the implementation of the BGM and BRM. A total of 107 fourth-year medical students attended the OSCE at 7 stations for encountering standardized patients (SPs) and at 1 station for performing skills on a manikin on July 15th, 2021. Thirty-two physician examiners evaluated the performance by completing a checklist and global rating scales.
Results The cut score of the norm-referenced method was lower than that of the BGM (P<0.01) and BRM (P<0.02). There was no significant difference in the cut score between the BGM and BRM (P=0.40). The station with the highest standard deviation and the highest proportion of the borderline group showed the largest cut score difference in standard setting methods.
Conclusion Prefixed cut scores by the norm-referenced method without considering station contents or examinee performance can vary due to station difficulty and content, affecting the appropriateness of standard setting decisions. If there is an adequate consensus on the criteria for the borderline group, standard setting with the BRM could be applied as a practical and defensible method to determine the cut score for OSCE.
Citations
Citations to this article as recorded by
Analyzing the Quality of Objective Structured Clinical Examination in Alborz University of Medical Sciences Suleiman Ahmadi, Amin Habibi, Mitra Rahimzadeh, Shahla Bahrami Alborz University Medical Journal.2023; 12(4): 485. CrossRef
Possibility of using the yes/no Angoff method as a substitute for the percent Angoff method for estimating the cutoff score of the Korean Medical Licensing Examination: a simulation study Janghee Park Journal of Educational Evaluation for Health Professions.2022; 19: 23. CrossRef
Newly appointed medical faculty members’ self-evaluation of their educational roles at the Catholic University of Korea College of Medicine in 2020 and 2021: a cross-sectional survey-based study Sun Kim, A Ra Cho, Chul Woon Chung Journal of Educational Evaluation for Health Professions.2021; 18: 28. CrossRef