Skip Navigation
Skip to contents

JEEHP : Journal of Educational Evaluation for Health Professions

OPEN ACCESS
SEARCH
Search

Search

Page Path
HOME > Search
36 "Educational measurement"
Filter
Filter
Article category
Keywords
Publication year
Authors
Funded articles
Research articles
Reliability and construct validation of the Blended Learning Usability Evaluation–Questionnaire with interprofessional clinicians in Canada : a methodological study
Anish Kumar Arora, Jeff Myers, Tavis Apramian, Kulamakan Kulasegaram, Daryl Bainbridge, Hsien Seow
J Educ Eval Health Prof. 2025;22:5.   Published online January 16, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.5    [Epub ahead of print]
  • 246 View
  • 71 Download
AbstractAbstract PDFSupplementary Material
Purpose
To generate Cronbach’s alpha and further mixed methods construct validity evidence for the Blended Learning Usability Evaluation–Questionnaire (BLUE-Q).
Methods
Forty interprofessional clinicians completed the BLUE-Q after finishing a 3-month long blended learning professional development program in Ontario, Canada. Reliability was assessed with Cronbach’s α for each of the 3 sections of the BLUE-Q and for all quantitative items together. Construct validity was evaluated through the Grand-Guillaume-Perrenoud et al. framework, which consists of 3 elements: congruence, convergence, and credibility. To compare quantitative and qualitative results, descriptive statistics, including means and standard deviations for each Likert scale item of the BLUE-Q were calculated.
Results
Cronbach’s α was 0.95 for the pedagogical usability section, 0.85 for the synchronous modality section, 0.93 for the asynchronous modality section, and 0.96 for all quantitative items together. Mean ratings (with standard deviations) were 4.77 (0.506) for pedagogy, 4.64 (0.654) for synchronous learning, and 4.75 (0.536) for asynchronous learning. Of the 239 qualitative comments received, 178 were identified as substantive, of which 88% were considered congruent and 79% were considered convergent with the high means. Among all congruent responses, 69% were considered confirming statements and 31% were considered clarifying statements, suggesting appropriate credibility. Analysis of the clarifying statements assisted in identifying 5 categories of suggestions for program improvement.
Conclusion
The BLUE-Q demonstrates high reliability and appropriate construct validity in the context of a blended learning program with interprofessional clinicians, making it a valuable tool for comprehensive program evaluation, quality improvement, and evaluative research in health professions education.
Validation of the 21st Century Skills Assessment Scale for public health students in Thailand: a methodological study  
Suphawadee Panthumas, Kaung Zaw, Wirin Kittipichai
J Educ Eval Health Prof. 2024;21:37.   Published online December 10, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.37
  • 542 View
  • 144 Download
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to develop and validate the 21st Century Skills Assessment Scale (21CSAS) for Thai public health (PH) undergraduate students using the Partnership for 21st Century Skills framework.
Methods
A cross-sectional survey was conducted among 727 first- to fourth-year PH undergraduate students from 4 autonomous universities in Thailand. Data were collected using self-administered questionnaires between January and March 2023. Exploratory factor analysis (EFA) was used to explore the underlying dimensions of 21CSAS, while confirmatory factor analysis (CFA) was conducted to test the hypothesized factor structure using Mplus software (Muthén & Muthén). Reliability and item discrimination were assessed using Cronbach’s α and the corrected item-total correlation, respectively.
Results
EFA performed on a dataset of 300 students revealed a 20-item scale with a 6-factor structure: (1) creativity and innovation; (2) critical thinking and problem-solving; (3) information, media, and technology; (4) communication and collaboration; (5) initiative and self-direction; and (6) social and cross-cultural skills. The rotated eigenvalues ranged from 2.12 to 1.73. CFA performed on another dataset of 427 students confirmed a good model fit (χ2/degrees of freedom=2.67, comparative fit index=0.93, Tucker-Lewis index=0.91, root mean square error of approximation=0.06, standardized root mean square residual=0.06), explaining 34%–71% of variance in the items. Item loadings ranged from 0.58 to 0.84. The 21CSAS had a Cronbach’s α of 0.92.
Conclusion
The 21CSAS proved be a valid and reliable tool for assessing 21st century skills among Thai PH undergraduate students. These findings provide insights for educational system to inform policy, practice, and research regarding 21st-century skills among undergraduate students.
Technical report
Increased accessibility of computer-based testing for residency application to a hospital in Brazil with item characteristics comparable to paper-based testing: a psychometric study  
Marcos Carvalho Borges, Luciane Loures Santos, Paulo Henrique Manso, Elaine Christine Dantas Moisés, Pedro Soler Coltro, Priscilla Costa Fonseca, Paulo Roberto Alves Gentil, Rodrigo de Carvalho Santana, Lucas Faria Rodrigues, Benedito Carlos Maciel, Hilton Marcos Alves Ricz
J Educ Eval Health Prof. 2024;21:32.   Published online November 11, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.32
  • 536 View
  • 128 Download
AbstractAbstract PDFSupplementary Material
Purpose
With the coronavirus disease 2019 pandemic, online high-stakes exams have become a viable alternative. This study evaluated the feasibility of computer-based testing (CBT) for medical residency applications in Brazil and its impacts on item quality and applicants’ access compared to paper-based testing.
Methods
In 2020, an online CBT was conducted in a Ribeirao Preto Clinical Hospital in Brazil. In total, 120 multiple-choice question items were constructed. Two years later, the exam was performed as paper-based testing. Item construction processes were similar for both exams. Difficulty and discrimination indexes, point-biserial coefficient, difficulty, discrimination, guessing parameters, and Cronbach’s α coefficient were measured based on the item response and classical test theories. Internet stability for applicants was monitored.
Results
In 2020, 4,846 individuals (57.1% female, mean age of 26.64±3.37 years) applied to the residency program, versus 2,196 individuals (55.2% female, mean age of 26.47±3.20 years) in 2022. For CBT, there was an increase of 2,650 applicants (120.7%), albeit with significant differences in demographic characteristics. There was a significant increase in applicants from more distant and lower-income Brazilian regions, such as the North (5.6% vs. 2.7%) and Northeast (16.9% vs. 9.0%). No significant differences were found in difficulty and discrimination indexes, point-biserial coefficients, and Cronbach’s α coefficients between the 2 exams.
Conclusion
Online CBT with multiple-choice questions was a viable format for a residency application exam, improving accessibility without compromising exam integrity and quality.
Review
Immersive simulation in nursing and midwifery education: a systematic review  
Lahoucine Ben Yahya, Aziz Naciri, Mohamed Radid, Ghizlane Chemsi
J Educ Eval Health Prof. 2024;21:19.   Published online August 8, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.19
  • 3,337 View
  • 395 Download
  • 1 Web of Science
  • 1 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
Immersive simulation is an innovative training approach in health education that enhances student learning. This study examined its impact on engagement, motivation, and academic performance in nursing and midwifery students.
Methods
A comprehensive systematic search was meticulously conducted in 4 reputable databases—Scopus, PubMed, Web of Science, and Science Direct—following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. The research protocol was pre-registered in the PROSPERO registry, ensuring transparency and rigor. The quality of the included studies was assessed using the Medical Education Research Study Quality Instrument.
Results
Out of 90 identified studies, 11 were included in the present review, involving 1,090 participants. Four out of 5 studies observed high post-test engagement scores in the intervention groups. Additionally, 5 out of 6 studies that evaluated motivation found higher post-test motivational scores in the intervention groups than in control groups using traditional approaches. Furthermore, among the 8 out of 11 studies that evaluated academic performance during immersive simulation training, 5 reported significant differences (P<0.001) in favor of the students in the intervention groups.
Conclusion
Immersive simulation, as demonstrated by this study, has a significant potential to enhance student engagement, motivation, and academic performance, surpassing traditional teaching methods. This potential underscores the urgent need for future research in various contexts to better integrate this innovative educational approach into nursing and midwifery education curricula, inspiring hope for improved teaching methods.

Citations

Citations to this article as recorded by  
  • Application of Virtual Reality, Artificial Intelligence, and Other Innovative Technologies in Healthcare Education (Nursing and Midwifery Specialties): Challenges and Strategies
    Galya Georgieva-Tsaneva, Ivanichka Serbezova, Silvia Beloeva
    Education Sciences.2024; 15(1): 11.     CrossRef
Research article
Development and validity evidence for the resident-led large group teaching assessment instrument in the United States: a methodological study  
Ariel Shana Frey-Vogel, Kristina Dzara, Kimberly Anne Gifford, Yoon Soo Park, Justin Berk, Allison Heinly, Darcy Wolcott, Daniel Adam Hall, Shannon Elliott Scott-Vernaglia, Katherine Anne Sparger, Erica Ye-pyng Chung
J Educ Eval Health Prof. 2024;21:3.   Published online February 23, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.3
  • 1,418 View
  • 206 Download
AbstractAbstract PDFSupplementary Material
Purpose
Despite educational mandates to assess resident teaching competence, limited instruments with validity evidence exist for this purpose. Existing instruments do not allow faculty to assess resident-led teaching in a large group format or whether teaching was interactive. This study gathers validity evidence on the use of the Resident-led Large Group Teaching Assessment Instrument (Relate), an instrument used by faculty to assess resident teaching competency. Relate comprises 23 behaviors divided into 6 elements: learning environment, goals and objectives, content of talk, promotion of understanding and retention, session management, and closure.
Methods
Messick’s unified validity framework was used for this study. Investigators used video recordings of resident-led teaching from 3 pediatric residency programs to develop Relate and a rater guidebook. Faculty were trained on instrument use through frame-of-reference training. Resident teaching at all sites was video-recorded during 2018–2019. Two trained faculty raters assessed each video. Descriptive statistics on performance were obtained. Validity evidence sources include: rater training effect (response process), reliability and variability (internal structure), and impact on Milestones assessment (relations to other variables).
Results
Forty-eight videos, from 16 residents, were analyzed. Rater training improved inter-rater reliability from 0.04 to 0.64. The Φ-coefficient reliability was 0.50. There was a significant correlation between overall Relate performance and the pediatric teaching Milestone (r=0.34, P=0.019).
Conclusion
Relate provides validity evidence with sufficient reliability to measure resident-led large-group teaching competence.
Technical report
Item difficulty index, discrimination index, and reliability of the 26 health professions licensing examinations in 2022, Korea: a psychometric study
Yoon Hee Kim, Bo Hyun Kim, Joonki Kim, Bokyoung Jung, Sangyoung Bae
J Educ Eval Health Prof. 2023;20:31.   Published online November 22, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.31
  • 2,157 View
  • 142 Download
AbstractAbstract PDFSupplementary Material
Purpose
This study presents item analysis results of the 26 health personnel licensing examinations managed by the Korea Health Personnel Licensing Examination Institute (KHPLEI) in 2022.
Methods
The item difficulty index, item discrimination index, and reliability were calculated. The item discrimination index was calculated using a discrimination index based on the upper and lower 27% rule and the item-total correlation.
Results
Out of 468,352 total examinees, 418,887 (89.4%) passed. The pass rates ranged from 27.3% for health educators level 1 to 97.1% for oriental medical doctors. Most examinations had a high average difficulty index, albeit to varying degrees, ranging from 61.3% for prosthetists and orthotists to 83.9% for care workers. The average discrimination index based on the upper and lower 27% rule ranged from 0.17 for oriental medical doctors to 0.38 for radiological technologists. The average item-total correlation ranged from 0.20 for oriental medical doctors to 0.38 for radiological technologists. The Cronbach α, as a measure of reliability, ranged from 0.872 for health educators-level 3 to 0.978 for medical technologists. The correlation coefficient between the average difficulty index and average discrimination index was -0.2452 (P=0.1557), that between the average difficulty index and the average item-total correlation was 0.3502 (P=0.0392), and that between the average discrimination index and the average item-total correlation was 0.7944 (P<0.0001).
Conclusion
This technical report presents the item analysis results and reliability of the recent examinations by the KHPLEI, demonstrating an acceptable range of difficulty index and discrimination index values, as well as good reliability.
Research article
Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical Examination: a cross-sectional study  
Betzy Clariza Torres-Zegarra, Wagner Rios-Garcia, Alvaro Micael Ñaña-Cordova, Karen Fatima Arteaga-Cisneros, Xiomara Cristina Benavente Chalco, Marina Atena Bustamante Ordoñez, Carlos Jesus Gutierrez Rios, Carlos Alberto Ramos Godoy, Kristell Luisa Teresa Panta Quezada, Jesus Daniel Gutierrez-Arratia, Javier Alejandro Flores-Cohaila
J Educ Eval Health Prof. 2023;20:30.   Published online November 20, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.30
  • 3,092 View
  • 222 Download
  • 13 Web of Science
  • 18 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
We aimed to describe the performance and evaluate the educational value of justifications provided by artificial intelligence chatbots, including GPT-3.5, GPT-4, Bard, Claude, and Bing, on the Peruvian National Medical Licensing Examination (P-NLME).
Methods
This was a cross-sectional analytical study. On July 25, 2023, each multiple-choice question (MCQ) from the P-NLME was entered into each chatbot (GPT-3, GPT-4, Bing, Bard, and Claude) 3 times. Then, 4 medical educators categorized the MCQs in terms of medical area, item type, and whether the MCQ required Peru-specific knowledge. They assessed the educational value of the justifications from the 2 top performers (GPT-4 and Bing).
Results
GPT-4 scored 86.7% and Bing scored 82.2%, followed by Bard and Claude, and the historical performance of Peruvian examinees was 55%. Among the factors associated with correct answers, only MCQs that required Peru-specific knowledge had lower odds (odds ratio, 0.23; 95% confidence interval, 0.09–0.61), whereas the remaining factors showed no associations. In assessing the educational value of justifications provided by GPT-4 and Bing, neither showed any significant differences in certainty, usefulness, or potential use in the classroom.
Conclusion
Among chatbots, GPT-4 and Bing were the top performers, with Bing performing better at Peru-specific MCQs. Moreover, the educational value of justifications provided by the GPT-4 and Bing could be deemed appropriate. However, it is essential to start addressing the educational value of these chatbots, rather than merely their performance on examinations.

Citations

Citations to this article as recorded by  
  • PICOT questions and search strategies formulation: A novel approach using artificial intelligence automation
    Lucija Gosak, Gregor Štiglic, Lisiane Pruinelli, Dominika Vrbnjak
    Journal of Nursing Scholarship.2025; 57(1): 5.     CrossRef
  • Capable exam-taker and question-generator: the dual role of generative AI in medical education assessment
    Yihong Qiu, Chang Liu
    Global Medical Education.2025;[Epub]     CrossRef
  • Comparison of artificial intelligence systems in answering prosthodontics questions from the dental specialty exam in Turkey
    Busra Tosun, Zeynep Sen Yilmaz
    Journal of Dental Sciences.2025;[Epub]     CrossRef
  • Benchmarking LLM chatbots’ oncological knowledge with the Turkish Society of Medical Oncology’s annual board examination questions
    Efe Cem Erdat, Engin Eren Kavak
    BMC Cancer.2025;[Epub]     CrossRef
  • Performance of GPT-4V in Answering the Japanese Otolaryngology Board Certification Examination Questions: Evaluation Study
    Masao Noda, Takayoshi Ueno, Ryota Koshu, Yuji Takaso, Mari Dias Shimada, Chizu Saito, Hisashi Sugimoto, Hiroaki Fushiki, Makoto Ito, Akihiro Nomura, Tomokazu Yoshizaki
    JMIR Medical Education.2024; 10: e57054.     CrossRef
  • Response to Letter to the Editor re: “Artificial Intelligence Versus Expert Plastic Surgeon: Comparative Study Shows ChatGPT ‘Wins' Rhinoplasty Consultations: Should We Be Worried? [1]” by Durairaj et al
    Kay Durairaj, Omer Baker
    Facial Plastic Surgery & Aesthetic Medicine.2024; 26(3): 276.     CrossRef
  • Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
    Xiaojun Xu, Yixiao Chen, Jing Miao
    Journal of Educational Evaluation for Health Professions.2024; 21: 6.     CrossRef
  • Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis
    Mingxin Liu, Tsuyoshi Okuhara, XinYi Chang, Ritsuko Shirabe, Yuriko Nishiie, Hiroko Okada, Takahiro Kiuchi
    Journal of Medical Internet Research.2024; 26: e60807.     CrossRef
  • Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study
    Giacomo Rossettini, Lia Rodeghiero, Federica Corradi, Chad Cook, Paolo Pillastrini, Andrea Turolla, Greta Castellini, Stefania Chiappinotto, Silvia Gianola, Alvisa Palese
    BMC Medical Education.2024;[Epub]     CrossRef
  • Evaluating the competency of ChatGPT in MRCP Part 1 and a systematic literature review of its capabilities in postgraduate medical assessments
    Oliver Vij, Henry Calver, Nikki Myall, Mrinalini Dey, Koushan Kouranloo, Thiago P. Fernandes
    PLOS ONE.2024; 19(7): e0307372.     CrossRef
  • Large Language Models in Pediatric Education: Current Uses and Future Potential
    Srinivasan Suresh, Sanghamitra M. Misra
    Pediatrics.2024;[Epub]     CrossRef
  • Comparison of the Performance of ChatGPT, Claude and Bard in Support of Myopia Prevention and Control
    Yan Wang, Lihua Liang, Ran Li, Yihua Wang, Changfu Hao
    Journal of Multidisciplinary Healthcare.2024; Volume 17: 3917.     CrossRef
  • Evaluating Large Language Models in Dental Anesthesiology: A Comparative Analysis of ChatGPT-4, Claude 3 Opus, and Gemini 1.0 on the Japanese Dental Society of Anesthesiology Board Certification Exam
    Misaki Fujimoto, Hidetaka Kuroda, Tomomi Katayama, Atsuki Yamaguchi, Norika Katagiri, Keita Kagawa, Shota Tsukimoto, Akito Nakano, Uno Imaizumi, Aiji Sato-Boku, Naotaka Kishimoto, Tomoki Itamiya, Kanta Kido, Takuro Sanuki
    Cureus.2024;[Epub]     CrossRef
  • Dermatological Knowledge and Image Analysis Performance of Large Language Models Based on Specialty Certificate Examination in Dermatology
    Ka Siu Fan, Ka Hay Fan
    Dermato.2024; 4(4): 124.     CrossRef
  • ChatGPT and Other Large Language Models in Medical Education — Scoping Literature Review
    Alexandra Aster, Matthias Carl Laupichler, Tamina Rockwell-Kollmann, Gilda Masala, Ebru Bala, Tobias Raupach
    Medical Science Educator.2024;[Epub]     CrossRef
  • Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study
    Yikai Chen, Xiujie Huang, Fangjie Yang, Haiming Lin, Haoyu Lin, Zhuoqun Zheng, Qifeng Liang, Jinhai Zhang, Xinxin Li
    BMC Medical Education.2024;[Epub]     CrossRef
  • Using large language models (ChatGPT, Copilot, PaLM, Bard, and Gemini) in Gross Anatomy course: Comparative analysis
    Volodymyr Mavrych, Paul Ganguly, Olena Bolgova
    Clinical Anatomy.2024;[Epub]     CrossRef
  • Information amount, accuracy, and relevance of generative artificial intelligence platforms’ answers regarding learning objectives of medical arthropodology evaluated in English and Korean queries in December 2023: a descriptive study
    Hyunju Lee, Soobin Park
    Journal of Educational Evaluation for Health Professions.2023; 20: 39.     CrossRef
Brief reports
Training and implementation of handheld ultrasound technology at Georgetown Public Hospital Corporation in Guyana: a virtual learning cohort study  
Michelle Bui, Adrian Fernandez, Budheshwar Ramsukh, Onika Noel, Chris Prashad, David Bayne
J Educ Eval Health Prof. 2023;20:11.   Published online April 4, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.11
  • 3,223 View
  • 103 Download
  • 2 Web of Science
  • 2 Crossref
AbstractAbstract PDFSupplementary Material
A virtual point-of-care ultrasound (POCUS) education program was initiated to introduce handheld ultrasound technology to Georgetown Public Hospital Corporation in Guyana, a low-resource setting. We studied ultrasound competency and participant satisfaction in a cohort of 20 physicians-in-training through the urology clinic. The program consisted of a training phase, where they learned how to use the Butterfly iQ ultrasound, and a mentored implementation phase, where they applied their skills in the clinic. The assessment was through written exams and an objective structured clinical exam (OSCE). Fourteen students completed the program. The written exam scores were 3.36/5 in the training phase and 3.57/5 in the mentored implementation phase, and all students earned 100% on the OSCE. Students expressed satisfaction with the program. Our POCUS education program demonstrates the potential to teach clinical skills in low-resource settings and the value of virtual global health partnerships in advancing POCUS and minimally invasive diagnostics.

Citations

Citations to this article as recorded by  
  • A Clinician’s Guide to the Implementation of Point-of-Care Ultrasound (POCUS) in the Outpatient Practice
    Joshua Overgaard, Bright P. Thilagar, Mohammed Nadir Bhuiyan
    Journal of Primary Care & Community Health.2024;[Epub]     CrossRef
  • Efficacy of Handheld Ultrasound in Medical Education: A Comprehensive Systematic Review and Narrative Analysis
    Mariam Haji-Hassan, Roxana-Denisa Capraș, Sorana D. Bolboacă
    Diagnostics.2023; 13(24): 3665.     CrossRef
Are ChatGPT’s knowledge and interpretation ability comparable to those of medical students in Korea for taking a parasitology examination?: a descriptive study  
Sun Huh
J Educ Eval Health Prof. 2023;20:1.   Published online January 11, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.1
  • 15,034 View
  • 1,119 Download
  • 186 Web of Science
  • 94 Crossref
AbstractAbstract PDFSupplementary Material
This study aimed to compare the knowledge and interpretation ability of ChatGPT, a language model of artificial general intelligence, with those of medical students in Korea by administering a parasitology examination to both ChatGPT and medical students. The examination consisted of 79 items and was administered to ChatGPT on January 1, 2023. The examination results were analyzed in terms of ChatGPT’s overall performance score, its correct answer rate by the items’ knowledge level, and the acceptability of its explanations of the items. ChatGPT’s performance was lower than that of the medical students, and ChatGPT’s correct answer rate was not related to the items’ knowledge level. However, there was a relationship between acceptable explanations and correct answers. In conclusion, ChatGPT’s knowledge and interpretation ability for this parasitology examination were not yet comparable to those of medical students in Korea.

Citations

Citations to this article as recorded by  
  • ChatGPT and the AI revolution: a comprehensive investigation of its multidimensional impact and potential
    Mohd Afjal
    Library Hi Tech.2025; 43(1): 353.     CrossRef
  • Utility of ChatGPT as a preparation tool for the Orthopaedic In‐Training Examination
    Dhruv Mendiratta, Isabel Herzog, Rohan Singh, Ashok Para, Tej Joshi, Michael Vosbikian, Neil Kaushal
    Journal of Experimental Orthopaedics.2025;[Epub]     CrossRef
  • Exploring knowledge, attitudes, and practices of academics in the field of educational sciences towards using ChatGPT
    Burcu Karafil, Ahmet Uyar
    Education and Information Technologies.2025;[Epub]     CrossRef
  • Factors influencing Chinese pre-service teachers’ adoption of generative AI in teaching: an empirical study based on UTAUT2 and PLS-SEM
    Linlin Hu, Hao Wang, Yunfei Xin
    Education and Information Technologies.2025;[Epub]     CrossRef
  • Integrating AI Technology Into Language Teacher Education: Challenges, Potentials, and Assumptions
    Rod Case, Leping Liu, Joseph Mintz
    Computers in the Schools.2025; : 1.     CrossRef
  • Performance of ChatGPT-3.5 and ChatGPT-4 in the Taiwan National Pharmacist Licensing Examination: Comparative Evaluation Study
    Ying-Mei Wang, Hung-Wei Shen, Tzeng-Ji Chen, Shu-Chiung Chiang, Ting-Guan Lin
    JMIR Medical Education.2025; 11: e56850.     CrossRef
  • Unveiling the impact of ChatGPT: investigating self-efficacy, anxiety and motivation on student performance in blended learning environments
    Ridwan Daud Mahande, M. Miftach Fakhri, Irwansyah Suwahyu, Dwi Rezky Anandari Sulaiman
    Journal of Applied Research in Higher Education.2025;[Epub]     CrossRef
  • Performance of ChatGPT on the India Undergraduate Community Medicine Examination: Cross-Sectional Study
    Aravind P Gandhi, Felista Karen Joesph, Vineeth Rajagopal, P Aparnavi, Sushma Katkuri, Sonal Dayama, Prakasini Satapathy, Mahalaqua Nazli Khatib, Shilpa Gaidhane, Quazi Syed Zahiruddin, Ashish Behera
    JMIR Formative Research.2024; 8: e49964.     CrossRef
  • Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT
    Jad Abi-Rafeh, Hong Hao Xu, Roy Kazan, Ruth Tevlin, Heather Furnas
    Aesthetic Surgery Journal.2024; 44(3): 329.     CrossRef
  • Redesigning Tertiary Educational Evaluation with AI: A Task-Based Analysis of LIS Students’ Assessment on Written Tests and Utilizing ChatGPT at NSTU
    Shamima Yesmin
    Science & Technology Libraries.2024; 43(4): 355.     CrossRef
  • Unveiling the ChatGPT phenomenon: Evaluating the consistency and accuracy of endodontic question answers
    Ana Suárez, Víctor Díaz‐Flores García, Juan Algar, Margarita Gómez Sánchez, María Llorente de Pedro, Yolanda Freire
    International Endodontic Journal.2024; 57(1): 108.     CrossRef
  • Bob or Bot: Exploring ChatGPT's Answers to University Computer Science Assessment
    Mike Richards, Kevin Waugh, Mark Slaymaker, Marian Petre, John Woodthorpe, Daniel Gooch
    ACM Transactions on Computing Education.2024; 24(1): 1.     CrossRef
  • A systematic review of ChatGPT use in K‐12 education
    Peng Zhang, Gemma Tur
    European Journal of Education.2024;[Epub]     CrossRef
  • Evaluating ChatGPT as a self‐learning tool in medical biochemistry: A performance assessment in undergraduate medical university examination
    Krishna Mohan Surapaneni, Anusha Rajajagadeesan, Lakshmi Goudhaman, Shalini Lakshmanan, Saranya Sundaramoorthi, Dineshkumar Ravi, Kalaiselvi Rajendiran, Porchelvan Swaminathan
    Biochemistry and Molecular Biology Education.2024; 52(2): 237.     CrossRef
  • Examining the use of ChatGPT in public universities in Hong Kong: a case study of restricted access areas
    Michelle W. T. Cheng, Iris H. Y. YIM
    Discover Education.2024;[Epub]     CrossRef
  • Performance of ChatGPT on Ophthalmology-Related Questions Across Various Examination Levels: Observational Study
    Firas Haddad, Joanna S Saade
    JMIR Medical Education.2024; 10: e50842.     CrossRef
  • Assessment of Artificial Intelligence Platforms With Regard to Medical Microbiology Knowledge: An Analysis of ChatGPT and Gemini
    Jai Ranjan, Absar Ahmad, Monalisa Subudhi, Ajay Kumar
    Cureus.2024;[Epub]     CrossRef
  • A comparative vignette study: Evaluating the potential role of a generative AI model in enhancing clinical decision‐making in nursing
    Mor Saban, Ilana Dubovi
    Journal of Advanced Nursing.2024;[Epub]     CrossRef
  • Comparison of the Performance of GPT-3.5 and GPT-4 With That of Medical Students on the Written German Medical Licensing Examination: Observational Study
    Annika Meyer, Janik Riese, Thomas Streichert
    JMIR Medical Education.2024; 10: e50965.     CrossRef
  • From hype to insight: Exploring ChatGPT's early footprint in education via altmetrics and bibliometrics
    Lung‐Hsiang Wong, Hyejin Park, Chee‐Kit Looi
    Journal of Computer Assisted Learning.2024; 40(4): 1428.     CrossRef
  • A scoping review of artificial intelligence in medical education: BEME Guide No. 84
    Morris Gordon, Michelle Daniel, Aderonke Ajiboye, Hussein Uraiby, Nicole Y. Xu, Rangana Bartlett, Janice Hanson, Mary Haas, Maxwell Spadafore, Ciaran Grafton-Clarke, Rayhan Yousef Gasiea, Colin Michie, Janet Corral, Brian Kwan, Diana Dolmans, Satid Thamma
    Medical Teacher.2024; 46(4): 446.     CrossRef
  • Üniversite Öğrencilerinin ChatGPT 3,5 Deneyimleri: Yapay Zekâyla Yazılmış Masal Varyantları
    Bilge GÖK, Fahri TEMİZYÜREK, Özlem BAŞ
    Korkut Ata Türkiyat Araştırmaları Dergisi.2024; (14): 1040.     CrossRef
  • Tracking ChatGPT Research: Insights From the Literature and the Web
    Omar Mubin, Fady Alnajjar, Zouheir Trabelsi, Luqman Ali, Medha Mohan Ambali Parambil, Zhao Zou
    IEEE Access.2024; 12: 30518.     CrossRef
  • Potential applications of ChatGPT in obstetrics and gynecology in Korea: a review article
    YooKyung Lee, So Yun Kim
    Obstetrics & Gynecology Science.2024; 67(2): 153.     CrossRef
  • Application of generative language models to orthopaedic practice
    Jessica Caterson, Olivia Ambler, Nicholas Cereceda-Monteoliva, Matthew Horner, Andrew Jones, Arwel Tomos Poacher
    BMJ Open.2024; 14(3): e076484.     CrossRef
  • Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
    Xiaojun Xu, Yixiao Chen, Jing Miao
    Journal of Educational Evaluation for Health Professions.2024; 21: 6.     CrossRef
  • The advent of ChatGPT: Job Made Easy or Job Loss to Data Analysts
    Abiola Timothy Owolabi, Oluwaseyi Oluwadamilare Okunlola, Emmanuel Taiwo Adewuyi, Janet Iyabo Idowu, Olasunkanmi James Oladapo
    WSEAS TRANSACTIONS ON COMPUTERS.2024; 23: 24.     CrossRef
  • ChatGPT in dentomaxillofacial radiology education
    Hilal Peker Öztürk, Hakan Avsever, Buğra Şenel, Şükran Ayran, Mustafa Çağrı Peker, Hatice Seda Özgedik, Nurten Baysal
    Journal of Health Sciences and Medicine.2024; 7(2): 224.     CrossRef
  • Performance of ChatGPT on the Korean National Examination for Dental Hygienists
    Soo-Myoung Bae, Hye-Rim Jeon, Gyoung-Nam Kim, Seon-Hui Kwak, Hyo-Jin Lee
    Journal of Dental Hygiene Science.2024; 24(1): 62.     CrossRef
  • Medical knowledge of ChatGPT in public health, infectious diseases, COVID-19 pandemic, and vaccines: multiple choice questions examination based performance
    Sultan Ayoub Meo, Metib Alotaibi, Muhammad Zain Sultan Meo, Muhammad Omair Sultan Meo, Mashhood Hamid
    Frontiers in Public Health.2024;[Epub]     CrossRef
  • Unlock the potential for Saudi Arabian higher education: a systematic review of the benefits of ChatGPT
    Eman Faisal
    Frontiers in Education.2024;[Epub]     CrossRef
  • Does the Information Quality of ChatGPT Meet the Requirements of Orthopedics and Trauma Surgery?
    Adnan Kasapovic, Thaer Ali, Mari Babasiz, Jessica Bojko, Martin Gathen, Robert Kaczmarczyk, Jonas Roos
    Cureus.2024;[Epub]     CrossRef
  • Exploring the Profile of University Assessments Flagged as Containing AI-Generated Material
    Daniel Gooch, Kevin Waugh, Mike Richards, Mark Slaymaker, John Woodthorpe
    ACM Inroads.2024; 15(2): 39.     CrossRef
  • Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom’s Taxonomy
    Ambadasu Bharatha, Nkemcho Ojeh, Ahbab Mohammad Fazle Rabbi, Michael Campbell, Kandamaran Krishnamurthy, Rhaheem Layne-Yarde, Alok Kumar, Dale Springer, Kenneth Connell, Md Anwarul Majumder
    Advances in Medical Education and Practice.2024; Volume 15: 393.     CrossRef
  • The emergence of generative artificial intelligence platforms in 2023, journal metrics, appreciation to reviewers and volunteers, and obituary
    Sun Huh
    Journal of Educational Evaluation for Health Professions.2024; 21: 9.     CrossRef
  • ChatGPT, a Friend or a Foe in Medical Education: A Review of Strengths, Challenges, and Opportunities
    Mahdi Zarei, Maryam Zarei, Sina Hamzehzadeh, Sepehr Shakeri Bavil Oliyaei, Mohammad-Salar Hosseini
    Shiraz E-Medical Journal.2024;[Epub]     CrossRef
  • Augmenting intensive care unit nursing practice with generative AI: A formative study of diagnostic synergies using simulation‐based clinical cases
    Chedva Levin, Moriya Suliman, Etti Naimi, Mor Saban
    Journal of Clinical Nursing.2024;[Epub]     CrossRef
  • Artificial intelligence chatbots for the nutrition management of diabetes and the metabolic syndrome
    Farah Naja, Mandy Taktouk, Dana Matbouli, Sharfa Khaleel, Ayah Maher, Berna Uzun, Maryam Alameddine, Lara Nasreddine
    European Journal of Clinical Nutrition.2024; 78(10): 887.     CrossRef
  • Large language models in healthcare: from a systematic review on medical examinations to a comparative analysis on fundamentals of robotic surgery online test
    Andrea Moglia, Konstantinos Georgiou, Pietro Cerveri, Luca Mainardi, Richard M. Satava, Alfred Cuschieri
    Artificial Intelligence Review.2024;[Epub]     CrossRef
  • Is ChatGPT Enhancing Youth’s Learning, Engagement and Satisfaction?
    Christina Sanchita Shah, Smriti Mathur, Sushant Kr. Vishnoi
    Journal of Computer Information Systems.2024; : 1.     CrossRef
  • Comparison of ChatGPT, Gemini, and Le Chat with physician interpretations of medical laboratory questions from an online health forum
    Annika Meyer, Ari Soleman, Janik Riese, Thomas Streichert
    Clinical Chemistry and Laboratory Medicine (CCLM).2024;[Epub]     CrossRef
  • Performance of ChatGPT-3.5 and GPT-4 in national licensing examinations for medicine, pharmacy, dentistry, and nursing: a systematic review and meta-analysis
    Hye Kyung Jin, Ha Eun Lee, EunYoung Kim
    BMC Medical Education.2024;[Epub]     CrossRef
  • Role of ChatGPT in Dentistry: A Review
    Pratik Surana, Priyanka P. Ostwal, Shruti Vishal Dev, Jayesh Tiwari, Kadire Shiva Charan Yadav, Gajji Renuka
    Research Journal of Pharmacy and Technology.2024; : 3489.     CrossRef
  • Exploring the Current Applications and Effectiveness of ChatGPT in Nursing: An Integrative Review
    Yuan Luo, Yiqun Miao, Yuhan Zhao, Jiawei Li, Ying Wu
    Journal of Advanced Nursing.2024;[Epub]     CrossRef
  • A Scoping Review on the Educational Applications of Generative AI in Primary and Secondary Education
    Solmoe Ahn, Jeongyoon Lee, Jungmin Park, Soyoung Jung, Jihoon Song
    The Journal of Korean Association of Computer Education.2024; 27(6): 11.     CrossRef
  • Performance of GPT-3.5 and GPT-4 on the Korean Pharmacist Licensing Examination: Comparison Study
    Hye Kyung Jin, EunYoung Kim
    JMIR Medical Education.2024; 10: e57451.     CrossRef
  • ChatGPT-Produced Content as a Resource in the Language Education Classroom: A Guiding Hand
    Rod E. Case, Leping Liu
    Computers in the Schools.2024; : 1.     CrossRef
  • Evaluating the Feasibility of ChatGPT in Dental Morphology Education: A Pilot Study on AI-Assisted Learning in Dental Morphology
    Eun-Young Jeon, Hyun-Na Ahn, Jeong-Hyun Lee
    Journal of Dental Hygiene Science.2024; 24(4): 309.     CrossRef
  • Detecting AI- generated versus human- written medical student essays: a semi-randomized controlled study (Preprint)
    Berin Doru, Christoph Maier, Johanna Sophie Busse, Thomas Lücke, Judith Schönhoff, Elena Enax- Krumova, Steffen Hessler, Maria Berger, Marianne Tokic
    JMIR Medical Education.2024;[Epub]     CrossRef
  • Is ChatGPT reliable in education?
    Amal Abdullah Alibrahim
    South African Journal of Education.2024; 44(4): 1.     CrossRef
  • Applicability of ChatGPT in Assisting to Solve Higher Order Problems in Pathology
    Ranwir K Sinha, Asitava Deb Roy, Nikhil Kumar, Himel Mondal
    Cureus.2023;[Epub]     CrossRef
  • Issues in the 3rd year of the COVID-19 pandemic, including computer-based testing, study design, ChatGPT, journal metrics, and appreciation to reviewers
    Sun Huh
    Journal of Educational Evaluation for Health Professions.2023; 20: 5.     CrossRef
  • Emergence of the metaverse and ChatGPT in journal publishing after the COVID-19 pandemic
    Sun Huh
    Science Editing.2023; 10(1): 1.     CrossRef
  • Assessing the Capability of ChatGPT in Answering First- and Second-Order Knowledge Questions on Microbiology as per Competency-Based Medical Education Curriculum
    Dipmala Das, Nikhil Kumar, Langamba Angom Longjam, Ranwir Sinha, Asitava Deb Roy, Himel Mondal, Pratima Gupta
    Cureus.2023;[Epub]     CrossRef
  • Evaluating ChatGPT's Ability to Solve Higher-Order Questions on the Competency-Based Medical Education Curriculum in Medical Biochemistry
    Arindam Ghosh, Aritri Bir
    Cureus.2023;[Epub]     CrossRef
  • Overview of Early ChatGPT’s Presence in Medical Literature: Insights From a Hybrid Literature Review by ChatGPT and Human Experts
    Omar Temsah, Samina A Khan, Yazan Chaiah, Abdulrahman Senjab, Khalid Alhasan, Amr Jamal, Fadi Aljamaan, Khalid H Malki, Rabih Halwani, Jaffar A Al-Tawfiq, Mohamad-Hani Temsah, Ayman Al-Eyadhy
    Cureus.2023;[Epub]     CrossRef
  • ChatGPT for Future Medical and Dental Research
    Bader Fatani
    Cureus.2023;[Epub]     CrossRef
  • ChatGPT in Dentistry: A Comprehensive Review
    Hind M Alhaidry, Bader Fatani, Jenan O Alrayes, Aljowhara M Almana, Nawaf K Alfhaed
    Cureus.2023;[Epub]     CrossRef
  • Can we trust AI chatbots’ answers about disease diagnosis and patient care?
    Sun Huh
    Journal of the Korean Medical Association.2023; 66(4): 218.     CrossRef
  • Large Language Models in Medical Education: Opportunities, Challenges, and Future Directions
    Alaa Abd-alrazaq, Rawan AlSaad, Dari Alhuwail, Arfan Ahmed, Padraig Mark Healy, Syed Latifi, Sarah Aziz, Rafat Damseh, Sadam Alabed Alrazak, Javaid Sheikh
    JMIR Medical Education.2023; 9: e48291.     CrossRef
  • Early applications of ChatGPT in medical practice, education and research
    Sam Sedaghat
    Clinical Medicine.2023; 23(3): 278.     CrossRef
  • A Review of Research on Teaching and Learning Transformation under the Influence of ChatGPT Technology
    璇 师
    Advances in Education.2023; 13(05): 2617.     CrossRef
  • Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study
    Soshi Takagi, Takashi Watari, Ayano Erabi, Kota Sakaguchi
    JMIR Medical Education.2023; 9: e48002.     CrossRef
  • ChatGPT’s quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions
    Cosima C. Hoch, Barbara Wollenberg, Jan-Christoffer Lüers, Samuel Knoedler, Leonard Knoedler, Konstantin Frank, Sebastian Cotofana, Michael Alfertshofer
    European Archives of Oto-Rhino-Laryngology.2023; 280(9): 4271.     CrossRef
  • Analysing the Applicability of ChatGPT, Bard, and Bing to Generate Reasoning-Based Multiple-Choice Questions in Medical Physiology
    Mayank Agarwal, Priyanka Sharma, Ayan Goswami
    Cureus.2023;[Epub]     CrossRef
  • The Intersection of ChatGPT, Clinical Medicine, and Medical Education
    Rebecca Shin-Yee Wong, Long Chiau Ming, Raja Affendi Raja Ali
    JMIR Medical Education.2023; 9: e47274.     CrossRef
  • The Role of Artificial Intelligence in Higher Education: ChatGPT Assessment for Anatomy Course
    Tarık TALAN, Yusuf KALINKARA
    Uluslararası Yönetim Bilişim Sistemleri ve Bilgisayar Bilimleri Dergisi.2023; 7(1): 33.     CrossRef
  • Comparing ChatGPT’s ability to rate the degree of stereotypes and the consistency of stereotype attribution with those of medical students in New Zealand in developing a similarity rating test: a methodological study
    Chao-Cheng Lin, Zaine Akuhata-Huntington, Che-Wei Hsu
    Journal of Educational Evaluation for Health Professions.2023; 20: 17.     CrossRef
  • Examining Real-World Medication Consultations and Drug-Herb Interactions: ChatGPT Performance Evaluation
    Hsing-Yu Hsu, Kai-Cheng Hsu, Shih-Yen Hou, Ching-Lung Wu, Yow-Wen Hsieh, Yih-Dih Cheng
    JMIR Medical Education.2023; 9: e48433.     CrossRef
  • Assessing the Efficacy of ChatGPT in Solving Questions Based on the Core Concepts in Physiology
    Arijita Banerjee, Aquil Ahmad, Payal Bhalla, Kavita Goyal
    Cureus.2023;[Epub]     CrossRef
  • ChatGPT Performs on the Chinese National Medical Licensing Examination
    Xinyi Wang, Zhenye Gong, Guoxin Wang, Jingdan Jia, Ying Xu, Jialu Zhao, Qingye Fan, Shaun Wu, Weiguo Hu, Xiaoyang Li
    Journal of Medical Systems.2023;[Epub]     CrossRef
  • Artificial intelligence and its impact on job opportunities among university students in North Lima, 2023
    Doris Ruiz-Talavera, Jaime Enrique De la Cruz-Aguero, Nereo García-Palomino, Renzo Calderón-Espinoza, William Joel Marín-Rodriguez
    ICST Transactions on Scalable Information Systems.2023;[Epub]     CrossRef
  • Revolutionizing Dental Care: A Comprehensive Review of Artificial Intelligence Applications Among Various Dental Specialties
    Najd Alzaid, Omar Ghulam, Modhi Albani, Rafa Alharbi, Mayan Othman, Hasan Taher, Saleem Albaradie, Suhael Ahmed
    Cureus.2023;[Epub]     CrossRef
  • Opportunities, Challenges, and Future Directions of Generative Artificial Intelligence in Medical Education: Scoping Review
    Carl Preiksaitis, Christian Rose
    JMIR Medical Education.2023; 9: e48785.     CrossRef
  • Exploring the impact of language models, such as ChatGPT, on student learning and assessment
    Araz Zirar
    Review of Education.2023;[Epub]     CrossRef
  • Evaluating the reliability of ChatGPT as a tool for imaging test referral: a comparative study with a clinical decision support system
    Shani Rosen, Mor Saban
    European Radiology.2023; 34(5): 2826.     CrossRef
  • The Significance of Artificial Intelligence Platforms in Anatomy Education: An Experience With ChatGPT and Google Bard
    Hasan B Ilgaz, Zehra Çelik
    Cureus.2023;[Epub]     CrossRef
  • Is ChatGPT’s Knowledge and Interpretative Ability Comparable to First Professional MBBS (Bachelor of Medicine, Bachelor of Surgery) Students of India in Taking a Medical Biochemistry Examination?
    Abhra Ghosh, Nandita Maini Jindal, Vikram K Gupta, Ekta Bansal, Navjot Kaur Bajwa, Abhishek Sett
    Cureus.2023;[Epub]     CrossRef
  • Ethical consideration of the use of generative artificial intelligence, including ChatGPT in writing a nursing article
    Sun Huh
    Child Health Nursing Research.2023; 29(4): 249.     CrossRef
  • Potential Use of ChatGPT for Patient Information in Periodontology: A Descriptive Pilot Study
    Osman Babayiğit, Zeynep Tastan Eroglu, Dilek Ozkan Sen, Fatma Ucan Yarkac
    Cureus.2023;[Epub]     CrossRef
  • Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study
    Aleksandra Ignjatović, Lazar Stevanović
    Journal of Educational Evaluation for Health Professions.2023; 20: 28.     CrossRef
  • Assessing the Performance of ChatGPT in Medical Biochemistry Using Clinical Case Vignettes: Observational Study
    Krishna Mohan Surapaneni
    JMIR Medical Education.2023; 9: e47191.     CrossRef
  • Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical Examination: a cross-sectional study
    Betzy Clariza Torres-Zegarra, Wagner Rios-Garcia, Alvaro Micael Ñaña-Cordova, Karen Fatima Arteaga-Cisneros, Xiomara Cristina Benavente Chalco, Marina Atena Bustamante Ordoñez, Carlos Jesus Gutierrez Rios, Carlos Alberto Ramos Godoy, Kristell Luisa Teresa
    Journal of Educational Evaluation for Health Professions.2023; 20: 30.     CrossRef
  • ChatGPT’s performance in German OB/GYN exams – paving the way for AI-enhanced medical education and clinical practice
    Maximilian Riedel, Katharina Kaefinger, Antonia Stuehrenberg, Viktoria Ritter, Niklas Amann, Anna Graf, Florian Recker, Evelyn Klein, Marion Kiechle, Fabian Riedel, Bastian Meyer
    Frontiers in Medicine.2023;[Epub]     CrossRef
  • Medical students’ patterns of using ChatGPT as a feedback tool and perceptions of ChatGPT in a Leadership and Communication course in Korea: a cross-sectional study
    Janghee Park
    Journal of Educational Evaluation for Health Professions.2023; 20: 29.     CrossRef
  • FROM TEXT TO DIAGNOSE: CHATGPT’S EFFICACY IN MEDICAL DECISION-MAKING
    Yaroslav Mykhalko, Pavlo Kish, Yelyzaveta Rubtsova, Oleksandr Kutsyn, Valentyna Koval
    Wiadomości Lekarskie.2023; 76(11): 2345.     CrossRef
  • Using ChatGPT for Clinical Practice and Medical Education: Cross-Sectional Survey of Medical Students’ and Physicians’ Perceptions
    Pasin Tangadulrat, Supinya Sono, Boonsin Tangtrakulwanich
    JMIR Medical Education.2023; 9: e50658.     CrossRef
  • Below average ChatGPT performance in medical microbiology exam compared to university students
    Malik Sallam, Khaled Al-Salahat
    Frontiers in Education.2023;[Epub]     CrossRef
  • ChatGPT: "To be or not to be" ... in academic research. The human mind's analytical rigor and capacity to discriminate between AI bots' truths and hallucinations
    Aurelian Anghelescu, Ilinca Ciobanu, Constantin Munteanu, Lucia Ana Maria Anghelescu, Gelu Onose
    Balneo and PRM Research Journal.2023; 14(Vol.14, no): 614.     CrossRef
  • ChatGPT Review: A Sophisticated Chatbot Models in Medical & Health-related Teaching and Learning
    Nur Izah Ab Razak, Muhammad Fawwaz Muhammad Yusoff, Rahmita Wirza O.K. Rahmat
    Malaysian Journal of Medicine and Health Sciences.2023; 19(s12): 98.     CrossRef
  • Application of artificial intelligence chatbots, including ChatGPT, in education, scholarly work, programming, and content generation and its prospects: a narrative review
    Tae Won Kim
    Journal of Educational Evaluation for Health Professions.2023; 20: 38.     CrossRef
  • Trends in research on ChatGPT and adoption-related issues discussed in articles: a narrative review
    Sang-Jun Kim
    Science Editing.2023; 11(1): 3.     CrossRef
  • Information amount, accuracy, and relevance of generative artificial intelligence platforms’ answers regarding learning objectives of medical arthropodology evaluated in English and Korean queries in December 2023: a descriptive study
    Hyunju Lee, Soobin Park
    Journal of Educational Evaluation for Health Professions.2023; 20: 39.     CrossRef
  • What will ChatGPT revolutionize in the financial industry?
    Hassnian Ali, Ahmet Faruk Aysan
    Modern Finance.2023; 1(1): 116.     CrossRef
Review
Factors associated with medical students’ scores on the National Licensing Exam in Peru: a systematic review  
Javier Alejandro Flores-Cohaila
J Educ Eval Health Prof. 2022;19:38.   Published online December 29, 2022
DOI: https://doi.org/10.3352/jeehp.2022.19.38
  • 4,790 View
  • 325 Download
  • 1 Web of Science
  • 3 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to identify factors that have been studied for their associations with National Licensing Examination (ENAM) scores in Peru.
Methods
A search was conducted of literature databases and registers, including EMBASE, SciELO, Web of Science, MEDLINE, Peru’s National Register of Research Work, and Google Scholar. The following key terms were used: “ENAM” and “associated factors.” Studies in English and Spanish were included. The quality of the included studies was evaluated using the Medical Education Research Study Quality Instrument (MERSQI).
Results
In total, 38,500 participants were enrolled in 12 studies. Most (11/12) studies were cross-sectional, except for one case-control study. Three studies were published in peer-reviewed journals. The mean MERSQI was 10.33. A better performance on the ENAM was associated with a higher-grade point average (GPA) (n=8), internship setting in EsSalud (n=4), and regular academic status (n=3). Other factors showed associations in various studies, such as medical school, internship setting, age, gender, socioeconomic status, simulations test, study resources, preparation time, learning styles, study techniques, test-anxiety, and self-regulated learning strategies.
Conclusion
The ENAM is a multifactorial phenomenon; our model gives students a locus of control on what they can do to improve their score (i.e., implement self-regulated learning strategies) and faculty, health policymakers, and managers a framework to improve the ENAM score (i.e., design remediation programs to improve GPA and integrate anxiety-management courses into the curriculum).

Citations

Citations to this article as recorded by  
  • Peruvian medical residency selection: a portrayal of scores, distribution, and predictors of 28,872 applicants between 2019 and 2023
    Javier A. Flores-Cohaila, Brayan Miranda-Chavez, Cesar Copaja-Corzo, Xiomara C. Benavente-Chalco, Wagner Rios-García, Vanessa P. Moreno-Ccama, Angel Samanez-Obeso, Marco Rivarola-Hidalgo
    BMC Medical Education.2025;[Epub]     CrossRef
  • Medical Student’s Attitudes towards Implementation of National Licensing Exam (NLE) – A Qualitative Exploratory Study
    Saima Bashir, Rehan Ahmed Khan
    Pakistan Journal of Health Sciences.2024; : 153.     CrossRef
  • Performance of ChatGPT on the Peruvian National Licensing Medical Examination: Cross-Sectional Study
    Javier A Flores-Cohaila, Abigaíl García-Vicente, Sonia F Vizcarra-Jiménez, Janith P De la Cruz-Galán, Jesús D Gutiérrez-Arratia, Blanca Geraldine Quiroga Torres, Alvaro Taype-Rondan
    JMIR Medical Education.2023; 9: e48039.     CrossRef
Research articles
Possibility of independent use of the yes/no Angoff and Hofstee methods for the standard setting of the Korean Medical Licensing Examination written test: a descriptive study  
Do-Hwan Kim, Ye Ji Kang, Hoon-Ki Park
J Educ Eval Health Prof. 2022;19:33.   Published online December 12, 2022
DOI: https://doi.org/10.3352/jeehp.2022.19.33
  • 2,751 View
  • 134 Download
  • 2 Web of Science
  • 2 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study aims to apply the yes/no Angoff and Hofstee methods to actual Korean Medical Licensing Examination (KMLE) 2022 written examination data to estimate cut scores for the written KMLE.
Methods
Fourteen panelists gathered to derive the cut score of the 86th KMLE written examination data using the yes/no Angoff method. The panel reviewed the items individually before the meeting and shared their respective understanding of the minimum-competency physician. The standard setting process was conducted in 5 rounds over a total of 800 minutes. In addition, 2 rounds of the Hofstee method were conducted before starting the standard setting process and after the second round of yes/no Angoff.
Results
For yes/no Angoff, as each round progressed, the panel’s opinion gradually converged to a cut score of 198 points, and the final passing rate was 95.1%. The Hofstee cut score was 208 points out of a maximum 320 with a passing rate of 92.1% at the first round. It scored 204 points with a passing rate of 93.3% in the second round.
Conclusion
The difference between the cut scores obtained through yes/no Angoff and Hofstee methods did not exceed 2% points, and they were within the range of cut scores from previous studies. In both methods, the difference between the panelists decreased as rounds were repeated. Overall, our findings suggest the acceptability of cut scores and the possibility of independent use of both methods.

Citations

Citations to this article as recorded by  
  • Issues in the 3rd year of the COVID-19 pandemic, including computer-based testing, study design, ChatGPT, journal metrics, and appreciation to reviewers
    Sun Huh
    Journal of Educational Evaluation for Health Professions.2023; 20: 5.     CrossRef
  • Presidential address: improving item validity and adopting computer-based testing, clinical skills assessments, artificial intelligence, and virtual reality in health professions licensing examinations in Korea
    Hyunjoo Pai
    Journal of Educational Evaluation for Health Professions.2023; 20: 8.     CrossRef
Equal Z standard-setting method to estimate the minimum number of panelists for a medical school’s objective structured clinical examination in Taiwan: a simulation study  
Ying-Ying Yang, Pin-Hsiang Huang, Ling-Yu Yang, Chia-Chang Huang, Chih-Wei Liu, Shiau-Shian Huang, Chen-Huan Chen, Fa-Yauh Lee, Shou-Yen Kao, Boaz Shulruf
J Educ Eval Health Prof. 2022;19:27.   Published online October 17, 2022
DOI: https://doi.org/10.3352/jeehp.2022.19.27
  • 2,413 View
  • 130 Download
AbstractAbstract PDFSupplementary Material
Purpose
Undertaking a standard-setting exercise is a common method for setting pass/fail cut scores for high-stakes examinations. The recently introduced equal Z standard-setting method (EZ method) has been found to be a valid and effective alternative for the commonly used Angoff and Hofstee methods and their variants. The current study aims to estimate the minimum number of panelists required for obtaining acceptable and reliable cut scores using the EZ method.
Methods
The primary data were extracted from 31 panelists who used the EZ method for setting cut scores for a 12-station of medical school’s final objective structured clinical examination (OSCE) in Taiwan. For this study, a new data set composed of 1,000 random samples of different panel sizes, ranging from 5 to 25 panelists, was established and analyzed. Analysis of variance was performed to measure the differences in the cut scores set by the sampled groups, across all sizes within each station.
Results
On average, a panel of 10 experts or more yielded cut scores with confidence more than or equal to 90% and 15 experts yielded cut scores with confidence more than or equal to 95%. No significant differences in cut scores associated with panel size were identified for panels of 5 or more experts.
Conclusion
The EZ method was found to be valid and feasible. Less than an hour was required for 12 panelists to assess 12 OSCE stations. Calculating the cut scores required only basic statistical skills.
Acceptability of the 8-case objective structured clinical examination of medical students in Korea using generalizability theory: a reliability study  
Song Yi Park, Sang-Hwa Lee, Min-Jeong Kim, Ki-Hwan Ji, Ji Ho Ryu
J Educ Eval Health Prof. 2022;19:26.   Published online September 8, 2022
DOI: https://doi.org/10.3352/jeehp.2022.19.26
  • 3,263 View
  • 227 Download
  • 1 Web of Science
  • 1 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study investigated whether the reliability was acceptable when the number of cases in the objective structured clinical examination (OSCE) decreased from 12 to 8 using generalizability theory (GT).
Methods
This psychometric study analyzed the OSCE data of 439 fourth-year medical students conducted in the Busan and Gyeongnam areas of South Korea from July 12 to 15, 2021. The generalizability study (G-study) considered 3 facets—students (p), cases (c), and items (i)—and designed the analysis as p×(i:c) due to items being nested in a case. The acceptable generalizability (G) coefficient was set to 0.70. The G-study and decision study (D-study) were performed using G String IV ver. 6.3.8 (Papawork, Hamilton, ON, Canada).
Results
All G coefficients except for July 14 (0.69) were above 0.70. The major sources of variance components (VCs) were items nested in cases (i:c), from 51.34% to 57.70%, and residual error (pi:c), from 39.55% to 43.26%. The proportion of VCs in cases was negligible, ranging from 0% to 2.03%.
Conclusion
The case numbers decreased in the 2021 Busan and Gyeongnam OSCE. However, the reliability was acceptable. In the D-study, reliability was maintained at 0.70 or higher if there were more than 21 items/case in 8 cases and more than 18 items/case in 9 cases. However, according to the G-study, increasing the number of items nested in cases rather than the number of cases could further improve reliability. The consortium needs to maintain a case bank with various items to implement a reliable blueprinting combination for the OSCE.

Citations

Citations to this article as recorded by  
  • Applying the Generalizability Theory to Identify the Sources of Validity Evidence for the Quality of Communication Questionnaire
    Flávia Del Castanhel, Fernanda R. Fonseca, Luciana Bonnassis Burg, Leonardo Maia Nogueira, Getúlio Rodrigues de Oliveira Filho, Suely Grosseman
    American Journal of Hospice and Palliative Medicine®.2024; 41(7): 792.     CrossRef
Possibility of using the yes/no Angoff method as a substitute for the percent Angoff method for estimating the cutoff score of the Korean Medical Licensing Examination: a simulation study  
Janghee Park
J Educ Eval Health Prof. 2022;19:23.   Published online August 31, 2022
DOI: https://doi.org/10.3352/jeehp.2022.19.23
  • 3,533 View
  • 173 Download
  • 2 Web of Science
  • 2 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
The percent Angoff (PA) method has been recommended as a reliable method to set the cutoff score instead of a fixed cut point of 60% in the Korean Medical Licensing Examination (KMLE). The yes/no Angoff (YNA) method, which is easy for panelists to judge, can be considered as an alternative because the KMLE has many items to evaluate. This study aimed to compare the cutoff score and the reliability depending on whether the PA or the YNA standard-setting method was used in the KMLE.
Methods
The materials were the open-access PA data of the KMLE. The PA data were converted to YNA data in 5 categories, in which the probabilities for a “yes” decision by panelists were 50%, 60%, 70%, 80%, and 90%. SPSS for descriptive analysis and G-string for generalizability theory were used to present the results.
Results
The PA method and the YNA method counting 60% as “yes,” estimated similar cutoff scores. Those cutoff scores were deemed acceptable based on the results of the Hofstee method. The highest reliability coefficients estimated by the generalizability test were from the PA method and the YNA method, with probabilities of 70%, 80%, 60%, and 50% for deciding “yes,” in descending order. The panelist’s specialty was the main cause of the error variance. The error size was similar regardless of the standard-setting method.
Conclusion
The above results showed that the PA method was more reliable than the YNA method in estimating the cutoff score of the KMLE. However, the YNA method with a 60% probability for deciding “yes” also can be used as a substitute for the PA method in estimating the cutoff score of the KMLE.

Citations

Citations to this article as recorded by  
  • Issues in the 3rd year of the COVID-19 pandemic, including computer-based testing, study design, ChatGPT, journal metrics, and appreciation to reviewers
    Sun Huh
    Journal of Educational Evaluation for Health Professions.2023; 20: 5.     CrossRef
  • Possibility of independent use of the yes/no Angoff and Hofstee methods for the standard setting of the Korean Medical Licensing Examination written test: a descriptive study
    Do-Hwan Kim, Ye Ji Kang, Hoon-Ki Park
    Journal of Educational Evaluation for Health Professions.2022; 19: 33.     CrossRef
Comparing the cut score for the borderline group method and borderline regression method with norm-referenced standard setting in an objective structured clinical examination in medical school in Korea  
Song Yi Park, Sang-Hwa Lee, Min-Jeong Kim, Ki-Hwan Ji, Ji Ho Ryu
J Educ Eval Health Prof. 2021;18:25.   Published online September 27, 2021
DOI: https://doi.org/10.3352/jeehp.2021.18.25
  • 6,518 View
  • 316 Download
  • 3 Web of Science
  • 3 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
Setting standards is critical in health professions. However, appropriate standard setting methods do not always apply to the set cut score in performance assessment. The aim of this study was to compare the cut score when the standard setting is changed from the norm-referenced method to the borderline group method (BGM) and borderline regression method (BRM) in an objective structured clinical examination (OSCE) in medical school.
Methods
This was an explorative study to model the implementation of the BGM and BRM. A total of 107 fourth-year medical students attended the OSCE at 7 stations for encountering standardized patients (SPs) and at 1 station for performing skills on a manikin on July 15th, 2021. Thirty-two physician examiners evaluated the performance by completing a checklist and global rating scales.
Results
The cut score of the norm-referenced method was lower than that of the BGM (P<0.01) and BRM (P<0.02). There was no significant difference in the cut score between the BGM and BRM (P=0.40). The station with the highest standard deviation and the highest proportion of the borderline group showed the largest cut score difference in standard setting methods.
Conclusion
Prefixed cut scores by the norm-referenced method without considering station contents or examinee performance can vary due to station difficulty and content, affecting the appropriateness of standard setting decisions. If there is an adequate consensus on the criteria for the borderline group, standard setting with the BRM could be applied as a practical and defensible method to determine the cut score for OSCE.

Citations

Citations to this article as recorded by  
  • Analyzing the Quality of Objective Structured Clinical Examination in Alborz University of Medical Sciences
    Suleiman Ahmadi, Amin Habibi, Mitra Rahimzadeh, Shahla Bahrami
    Alborz University Medical Journal.2023; 12(4): 485.     CrossRef
  • Possibility of using the yes/no Angoff method as a substitute for the percent Angoff method for estimating the cutoff score of the Korean Medical Licensing Examination: a simulation study
    Janghee Park
    Journal of Educational Evaluation for Health Professions.2022; 19: 23.     CrossRef
  • Newly appointed medical faculty members’ self-evaluation of their educational roles at the Catholic University of Korea College of Medicine in 2020 and 2021: a cross-sectional survey-based study
    Sun Kim, A Ra Cho, Chul Woon Chung
    Journal of Educational Evaluation for Health Professions.2021; 18: 28.     CrossRef

JEEHP : Journal of Educational Evaluation for Health Professions
TOP