Skip Navigation
Skip to contents

JEEHP : Journal of Educational Evaluation for Health Professions

OPEN ACCESS
SEARCH
Search

New issue

Page Path
HOME > Browse articles > New issue
18 New issue
Filter
Filter
Article category
Keywords
Authors
Funded articles
Volume 22; 2025
Prev issue Next issue

Research article
Longitudinal relationships between Korean medical students’ academic performance in medical knowledge and clinical performance examinations: a retrospective longitudinal study
Yulim Kang, Hae Won Kim
J Educ Eval Health Prof. 2025;22:18.   Published online June 10, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.18    [Epub ahead of print]
  • 163 View
  • 25 Download
AbstractAbstract PDFSupplementary Material
Purpose
This study investigated the longitudinal relationships between performance on 3 examinations assessing medical knowledge and clinical skills among Korean medical students in the clinical phase. This study addressed the stability of each examination score and the interrelationships among examinations over time.
Methods
A retrospective longitudinal study was conducted at Yonsei University College of Medicine in Korea with a cohort of 112 medical students over 2 years. The students were in their third year in 2022 and progressed to the fourth year in 2023. We obtained comprehensive clinical science examination (CCSE) and progress test (PT) scores 3 times (T1–T3), and clinical performance examination (CPX) scores twice (T1 and T2). Autoregressive cross-lagged models were fitted to analyze their relationships.
Results
For each of the 3 examinations, the score at 1 time point predicted the subsequent score. Regarding cross-lagged effects, the CCSE at T1 predicted PT at T2 (β=0.472, P<0.001) and CCSE at T2 predicted PT at T3 (β=0.527, P<0.001). The CPX at T1 predicted the CCSE at T2 (β=0.163, P=0.006), and the CPX at T2 predicted the CCSE at T3 (β=0.154, P=0.006). The PT at T1 predicted the CPX at T2 (β=0.273, P=0.006).
Conclusion
The study identified each examination’s stability and the complexity of the longitudinal relationships between them. These findings may help predict medical students’ performance on subsequent examinations, potentially informing the provision of necessary student support.
Educational/Faculty development material
Radiotorax.es: a web-based tool for formative self-assessment in chest X-ray interpretation
Verónica Illescas-Megías, Jorge Manuel Maqueda-Pérez, Dolores Domínguez-Pinos, Teodoro Rudolphi Solero, Francisco Sendra-Portero
J Educ Eval Health Prof. 2025;22:17.   Published online June 9, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.17    [Epub ahead of print]
  • 131 View
  • 12 Download
AbstractAbstract PDFSupplementary Material
Radiotorax.es is a free, non-profit web-based tool designed to support formative self-assessment in chest X-ray interpretation. This article presents its structure, educational applications, and usage data from 11 years of continuous operation. Users complete interpretation rounds of 20 clinical cases, compare their reports with expert evaluations, and conduct a structured self-assessment. From 2011 to 2022, 14,389 users registered, and 7,726 completed at least one session. Most were medical students (75.8%), followed by residents (15.2%) and practicing physicians (9.0%). The platform has been integrated into undergraduate medical curricula and used in various educational contexts, including tutorials, peer and expert review, and longitudinal tracking. Its flexible design supports self-directed learning, instructor-guided use, and multicenter research. As a freely accessible resource based on real clinical cases, Radiotorax.es provides a scalable, realistic, and well-received training environment that promotes diagnostic skill development, reflection, and educational innovation in radiology education.
Research articles
Performance of large language models on Thailand’s national medical licensing examination: a cross-sectional study
Prut Saowaprut, Romen Samuel Wabina, Junwei Yang, Lertboon Siriwat
J Educ Eval Health Prof. 2025;22:16.   Published online May 12, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.16    [Epub ahead of print]
  • 586 View
  • 110 Download
AbstractAbstract PDF
Purpose
This study aimed to evaluate the feasibility of general-purpose large language models (LLMs) in addressing inequities in medical licensure exam preparation for Thailand’s National Medical Licensing Examination (ThaiNLE), which currently lacks standardized public study materials.
Methods
We assessed 4 multi-modal LLMs (GPT-4, Claude 3 Opus, Gemini 1.0/1.5 Pro) using a 304-question ThaiNLE Step 1 mock examination (10.2% image-based), applying deterministic API configurations and 5 inference repetitions per model. Performance was measured via micro- and macro-accuracy metrics compared against historical passing thresholds.
Results
All models exceeded passing scores, with GPT-4 achieving the highest accuracy (88.9%; 95% confidence interval, 88.7–89.1), surpassing Thailand’s national average by more than 2 standard deviations. Claude 3.5 Sonnet (80.1%) and Gemini 1.5 Pro (72.8%) followed hierarchically. Models demonstrated robustness across 17 of 20 medical domains, but variability was noted in genetics (74.0%) and cardiovascular topics (58.3%). While models demonstrated proficiency with images (Gemini 1.0 Pro: +9.9% vs. text), text-only accuracy remained superior (GPT-4o: 90.0% vs. 82.6%).
Conclusion
General-purpose LLMs show promise as equitable preparatory tools for ThaiNLE Step 1. However, domain-specific knowledge gaps and inconsistent multi-modal integration warrant refinement before clinical deployment.
Mixed reality versus manikins in basic life support simulation-based training for medical students in France: the mixed reality non-inferiority randomized controlled trial
Sofia Barlocco De La Vega, Evelyne Guerif-Dubreucq, Jebrane Bouaoud, Myriam Awad, Léonard Mathon, Agathe Beauvais, Thomas Olivier, Pierre-Clément Thiébaud, Anne-Laure Philippon
J Educ Eval Health Prof. 2025;22:15.   Published online May 12, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.15    [Epub ahead of print]
  • 469 View
  • 74 Download
AbstractAbstract PDF
Purpose
To compare the effectiveness of mixed reality with traditional manikin-based simulation in basic life support (BLS) training, testing the hypothesis that mixed reality is non-inferior to manikin-based simulation.
Methods
A non-inferiority randomized controlled trial was conducted. Third-year medical students were randomized into 2 groups. The mixed reality group received 32 minutes of individual training using a virtual reality headset and a torso for chest compressions (CC). The manikin group participated in 2 hours of group training consisting of theoretical and practical sessions using a low-fidelity manikin. The primary outcome was the overall BLS performance score, assessed at 1 month through a standardized BLS scenario using a 10-item assessment scale. The quality of CC, student satisfaction, and confidence levels were secondary outcomes and assessed through superiority analyses.
Results
Data from 155 participants were analyzed, with 84 in the mixed reality group and 71 in the manikin group. The mean overall BLS performance score was 6.4 (mixed reality) vs. 6.5 (manikin), (mean difference, –0.1; 95% confidence interval [CI], –0.45 to +∞). CC depth was greater in the manikin group (50.3 mm vs. 46.6 mm; mean difference, –3.7 mm; 95% CI, –6.5 to –0.9), with 61.2% achieving optimal depth compared to 43.8% in the mixed reality group (mean difference, 17.4%; 95% CI, –29.3 to –5.5). Satisfaction was higher in the mixed reality group (4.9/5 vs. 4.7/5 in the manikin group; difference, 0.2; 95% CI, 0.07 to 0.33), as was confidence in performing BLS (3.9/5 vs. 3.6/5; difference, 0.3; 95% CI, 0.11 to 0.58). No other significant differences were observed for secondary outcomes.
Conclusion
Mixed reality is non-inferior to manikin simulation in terms of overall BLS performance score assessed at 1 month.
The effect of strengthening nurse practitioners’ competency in occupational health services for agricultural workers exposed to pesticides in primary care units, Thailand: a before-and-after study  
Napamon Pumsopa, Ann Jirapongsuwan, Surintorn Kalampakorn, Sukhontha Siri
J Educ Eval Health Prof. 2025;22:14.   Published online April 21, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.14
  • 647 View
  • 162 Download
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to evaluate the effect of the Strengthening Nurse Practitioners’ Competency in Occupational Health Service (SNPCOHS) program. It was hypothesized that nurse practitioners (NPs) participating in the program would demonstrate increased competency in providing occupational health services to agricultural workers exposed to pesticides in primary care units (PCUs) compared to their baseline competency and to a comparison group.
Methods
A quasi-experimental study was conducted between August and December 2023. The 4-week intervention included 5 hours of an e-learning program, 3 hours of online discussion, and 2 hours dedicated to completing an assignment. The program was evaluated at 3 time points: pre-intervention, post-intervention (week 4), and follow-up (week 8). Sixty NPs volunteered to participate, with 30 in the experimental group and 30 in the comparison group. Data on demographics, professional attributes, knowledge, skills, and perceived self-efficacy were collected using self-administered questionnaires via Google Forms. Data analysis involved descriptive statistics, independent t-tests, and repeated measures analysis of variance.
Results
The experimental group demonstrated significantly higher mean scores in professional attributes, knowledge, skills, and perceived self-efficacy in providing occupational health services to agricultural workers exposed to pesticides compared to the comparison group at both week 4 and week 8 post-intervention.
Conclusion
The SNPCOHS program is well-suited for self-directed learning for nurses in PCUs, supporting effective occupational health service delivery. It should be disseminated and supported as an e-learning resource for NPs in PCUs (Thai Clinical Trials Registry: TCTR20250115004).
Assessing genetic and genomic literacy concepts among Albanian nursing and midwifery students: a cross-sectional study
Elona Gaxhja, Mitilda Gugu, Angelo Dante, Armelda Teta, Armela Kapaj, Liljana Ramasaco
J Educ Eval Health Prof. 2025;22:13.   Published online April 21, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.13
  • 748 View
  • 119 Download
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to adapt and validate the Albanian version of the Genomic Nursing Concept Inventory (GNCI) and to assess the level of genomic literacy among nursing and midwifery students.
Methods
Data were collected via a monocentric online cross-sectional study using the Albanian version of the GNCI. Participants included first-, second-, and third-year nursing and midwifery students. Demographic data such as age, sex, year level, and prior exposure to genetics were collected. The Kruskal-Wallis, Mann-Whitney U, and chi-square tests were used to compare demographic characteristics and GNCI scores between groups.
Results
Among the 715 participants, most were female (88.5%) with a median age of 19 years. Most respondents (65%) had not taken a genetics course, and 83.5% had not attended any related training. The mean score was 7.49, corresponding to a scale difficulty of 24.38% correct responses.
Conclusion
The findings reveal a low foundational knowledge of genetics/genomics among future nurses and midwives. It is essential to enhance learning strategies and update curricula to prepare a competent healthcare workforce in precision health.
Evaluation of a virtual objective structured clinical examination in the metaverse (Second Life) to assess the clinical skills in emergency radiology of medical students in Spain: a cross-sectional study  
Alba Virtudes Perez-Baena, Teodoro Rudolphi-Solero, Rocio Lorenzo-Alvarez, Dolores Dominguez-Pinos, Miguel Jose Ruiz-Gomez, Francisco Sendra-Portero
J Educ Eval Health Prof. 2025;22:12.   Published online April 21, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.12
  • 720 View
  • 123 Download
AbstractAbstract PDFSupplementary Material
Purpose
The objective structured clinical examination (OSCE) is an effective but resource-intensive tool for assessing clinical competence. This study hypothesized that implementing a virtual OSCE in the Second Life (SL) platform in the metaverse as a cost-effective alternative will effectively assess and enhance clinical skills in emergency radiology while being feasible and well-received. The aim was to evaluate a virtual radiology OSCE in SL as a formative assessment, focusing on feasibility, educational impact, and students’ perceptions.
Methods
Two virtual 6-station OSCE rooms dedicated to emergency radiology were developed in SL. Sixth-year medical students completed the OSCE during a 1-hour session in 2022–2023, followed by feedback including a correction checklist, individual scores, and group comparisons. Students completed a questionnaire with Likert-scale questions, a 10-point rating, and open-ended comments. Quantitative data were analyzed using the Student t-test and the Mann-Whitney U test, and qualitative data through thematic analysis.
Results
In total, 163 students participated, achieving mean scores of 5.1±1.4 and 4.9±1.3 (out of 10) in the 2 virtual OSCE rooms, respectively (P=0.287). One hundred seventeen students evaluated the OSCE, praising the teaching staff (9.3±1.0), project organization (8.8±1.2), OSCE environment (8.7±1.5), training usefulness (8.6±1.5), and formative self-assessment (8.5±1.4). Likert-scale questions and students’ open-ended comments highlighted the virtual environment’s attractiveness, case selection, self-evaluation usefulness, project excellence, and training impact. Technical difficulties were reported by 13 students (8%).
Conclusion
This study demonstrated the feasibility of incorporating formative OSCEs in SL as a useful teaching tool for undergraduate radiology education, which was cost-effective and highly valued by students.
A nationwide survey on the curriculum and educational resources related to the Clinical Skills Test of the Korean Medical Licensing Examination: a cross-sectional descriptive study  
Eun-Kyung Chung, Seok Hoon Kang, Do-Hoon Kim, MinJeong Kim, Ji-Hyun Seo, Keunmi Lee, Eui-Ryoung Han
J Educ Eval Health Prof. 2025;22:11.   Published online March 13, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.11
  • 1,115 View
  • 194 Download
AbstractAbstract PDFSupplementary Material
Purpose
The revised Clinical Skills Test (CST) of the Korean Medical Licensing Exam aims to provide a better assessment of physicians’ clinical competence and ability to interact with patients. This study examined the impact of the revised CST on medical education curricula and resources nationwide, while also identifying areas for improvement within the revised CST.
Methods
This study surveyed faculty responsible for clinical clerkships at 40 medical schools throughout Korea to evaluate the status and changes in clinical skills education, assessment, and resources related to the CST. The researchers distributed the survey via email through regional consortia between December 7, 2023 and January 19, 2024.
Results
Nearly all schools implemented preliminary student–patient encounters during core clinical rotations. Schools primarily conducted clinical skills assessments in the third and fourth years, with a simplified form introduced in the first and second years. Remedial education was conducted through various methods, including one-on-one feedback from faculty after the assessment. All schools established clinical skills centers and made ongoing improvements. Faculty members did not perceive the CST revisions as significantly altering clinical clerkship or skills assessments. They suggested several improvements, including assessing patient records to improve accuracy and increasing the objectivity of standardized patient assessments to ensure fairness.
Conclusion
During the CST, students’ involvement in patient encounters and clinical skills education increased, improving the assessment and feedback processes for clinical skills within the curriculum. To enhance students’ clinical competencies and readiness, strengthening the validity and reliability of the CST is essential.
Correlation between a motion analysis method and Global Operative Assessment of Laparoscopic Skills for assessing interns’ performance in a simulated peg transfer task in Jordan: a validation study  
Esraa Saleh Abdelall, Shadi Mohammad Hamouri, Abdallah Fawaz Al Dwairi, Omar Mefleh Al- Araidah
J Educ Eval Health Prof. 2025;22:10.   Published online March 6, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.10
  • 1,192 View
  • 188 Download
AbstractAbstract PDFSupplementary Material
Purpose
This study aims to validate the use of ProAnalyst (Xcitex Inc.), a software for professional motion analysts to assess the performance of surgical interns while performing the peg transfer task in a simulator box for safe practice in real minimally invasive surgery.
Methods
A correlation study was conducted in a multidisciplinary skills simulation lab at the Faculty of Medicine, Jordan University of Science and Technology from October 2019 to February 2020. Forty-one interns (i.e., novices and intermediates) were recruited and an expert surgeon participated as a reference benchmark. Videos of participants’ performance were analyzed through the ProAnalyst and Global Operative Assessment of Laparoscopic Skills (GOALS). Two results were s analyzed for correlation.
Results
The motion analysis scores by Proanalyst were correlated with those by GOALS for novices (r=–0.62925, P=0.009), and Intermediates (r= –0.53422, P=0.033). Both assessment methods differentiated the participants’ performance based on their experience level.
Conclusion
The motion analysis scoring method with Proanalyst provides an objective, time-efficient, and reproducible assessment of interns’ performance, and comparable to GOALS. It may require initial training and set-up; however, it eliminates the need for expert surgeon judgment.
Correspondence
Accuracy of ChatGPT in answering cardiology board-style questions
Albert Andrew
J Educ Eval Health Prof. 2025;22:9.   Published online February 27, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.9
  • 1,879 View
  • 149 Download
PDFSupplementary Material
Research article
Simulation-based teaching versus traditional small group teaching for first-year medical students among high and low scorers in respiratory physiology, India: a randomized controlled trial  
Nalini Yelahanka Channegowda, Dinker Ramanand Pai, Shivasakthy Manivasakan
J Educ Eval Health Prof. 2025;22:8.   Published online February 21, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.8
  • 857 View
  • 185 Download
  • 1 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
Although it is widely utilized in clinical subjects for skill training, using simulation-based education (SBE) for teaching basic science concepts to phase I medical students or pre-clinical students is limited. Simulation-based education/teaching is preferred in cardiovascular and respiratory physiology when compared to other systems because it is easy to recreate both the normal physiological component and alterations in the simulated environment, thus a promoting deep understanding of the core concepts.
Methods
A block randomized study was conducted among 107 phase 1 (first-year) medical undergraduate students at a Deemed to be University in India. Group A received SBE and Group B traditional small group teaching. The effectiveness of the teaching intervention was assessed using pre- and post-tests. Student feedback was obtained through a self administered structured questionnaire via an anonymous online survey and by in-depth interview.
Results
The intervention group showed a statistically significant improvement in post-test scores compared to the control group. A sub-analysis revealed that high scorers performed better than low scorers in both groups, but the knowledge gain among low scorers was more significant in the intervention group.
Conclusion
This teaching strategy offers a valuable supplement to traditional methods, fostering a deeper comprehension of clinical concepts from the outset of medical training.

Citations

Citations to this article as recorded by  
  • Simulation and Augmented Reality on Academic Performance and Engagement in Grade 11 Earth and Life Science
    Abigail G. Dumaguing, Wilfred G. Alava Jr.
    International Journal of Innovative Science and Research Technology.2025; : 2817.     CrossRef
Editorial
Research articles
Empirical effect of the Dr LEE Jong-wook Fellowship Program to empower sustainable change for the health workforce in Tanzania: a mixed-methods study  
Masoud Dauda, Swabaha Aidarus Yusuph, Harouni Yasini, Issa Mmbaga, Perpetua Mwambinngu, Hansol Park, Gyeongbae Seo, Kyoung Kyun Oh
J Educ Eval Health Prof. 2025;22:6.   Published online January 20, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.6
  • 1,487 View
  • 229 Download
AbstractAbstract PDFSupplementary Material
Purpose
This study evaluated the Dr LEE Jong-wook Fellowship Program’s impact on Tanzania’s health workforce, focusing on relevance, effectiveness, efficiency, impact, and sustainability in addressing healthcare gaps.
Methods
A mixed-methods research design was employed. Data were collected from 97 out of 140 alumni through an online survey, 35 in-depth interviews, and one focus group discussion. The study was conducted from November to December 2023 and included alumni from 2009 to 2022. Measurement instruments included structured questionnaires for quantitative data and semi-structured guides for qualitative data. Quantitative analysis involved descriptive and inferential statistics (Spearman’s rank correlation, non-parametric tests) using Python ver. 3.11.0 and Stata ver. 14.0. Thematic analysis was employed to analyze qualitative data using NVivo ver. 12.0.
Results
Findings indicated high relevance (mean=91.6, standard deviation [SD]=8.6), effectiveness (mean=86.1, SD=11.2), efficiency (mean=82.7, SD=10.2), and impact (mean=87.7, SD=9.9), with improved skills, confidence, and institutional service quality. However, sustainability had a lower score (mean=58.0, SD=11.1), reflecting challenges in follow-up support and resource allocation. Effectiveness strongly correlated with impact (ρ=0.746, P<0.001). The qualitative findings revealed that participants valued tailored training but highlighted barriers, such as language challenges and insufficient practical components. Alumni-led initiatives contributed to knowledge sharing, but limited resources constrained sustainability.
Conclusion
The Fellowship Program enhanced Tanzania’s health workforce capacity, but it requires localized curricula and strengthened alumni networks for sustainability. These findings provide actionable insights for improving similar programs globally, confirming the hypothesis that tailored training positively influences workforce and institutional outcomes.
Reliability and construct validation of the Blended Learning Usability Evaluation–Questionnaire with interprofessional clinicians in Canada: a methodological study  
Anish Kumar Arora, Jeff Myers, Tavis Apramian, Kulamakan Kulasegaram, Daryl Bainbridge, Hsien Seow
J Educ Eval Health Prof. 2025;22:5.   Published online January 16, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.5
  • 1,078 View
  • 199 Download
AbstractAbstract PDFSupplementary Material
Purpose
To generate Cronbach’s alpha and further mixed methods construct validity evidence for the Blended Learning Usability Evaluation–Questionnaire (BLUE-Q).
Methods
Forty interprofessional clinicians completed the BLUE-Q after finishing a 3-month long blended learning professional development program in Ontario, Canada. Reliability was assessed with Cronbach’s α for each of the 3 sections of the BLUE-Q and for all quantitative items together. Construct validity was evaluated through the Grand-Guillaume-Perrenoud et al. framework, which consists of 3 elements: congruence, convergence, and credibility. To compare quantitative and qualitative results, descriptive statistics, including means and standard deviations for each Likert scale item of the BLUE-Q were calculated.
Results
Cronbach’s α was 0.95 for the pedagogical usability section, 0.85 for the synchronous modality section, 0.93 for the asynchronous modality section, and 0.96 for all quantitative items together. Mean ratings (with standard deviations) were 4.77 (0.506) for pedagogy, 4.64 (0.654) for synchronous learning, and 4.75 (0.536) for asynchronous learning. Of the 239 qualitative comments received, 178 were identified as substantive, of which 88% were considered congruent and 79% were considered convergent with the high means. Among all congruent responses, 69% were considered confirming statements and 31% were considered clarifying statements, suggesting appropriate credibility. Analysis of the clarifying statements assisted in identifying 5 categories of suggestions for program improvement.
Conclusion
The BLUE-Q demonstrates high reliability and appropriate construct validity in the context of a blended learning program with interprofessional clinicians, making it a valuable tool for comprehensive program evaluation, quality improvement, and evaluative research in health professions education.
Educational/Faculty development material
The role of large language models in the peer-review process: opportunities and challenges for medical journal reviewers and editors  
Jisoo Lee, Jieun Lee, Jeong-Ju Yoo
J Educ Eval Health Prof. 2025;22:4.   Published online January 16, 2025
DOI: https://doi.org/10.3352/jeehp.2025.22.4
  • 3,156 View
  • 242 Download
AbstractAbstract PDFSupplementary Material
The peer review process ensures the integrity of scientific research. This is particularly important in the medical field, where research findings directly impact patient care. However, the rapid growth of publications has strained reviewers, causing delays and potential declines in quality. Generative artificial intelligence, especially large language models (LLMs) such as ChatGPT, may assist researchers with efficient, high-quality reviews. This review explores the integration of LLMs into peer review, highlighting their strengths in linguistic tasks and challenges in assessing scientific validity, particularly in clinical medicine. Key points for integration include initial screening, reviewer matching, feedback support, and language review. However, implementing LLMs for these purposes will necessitate addressing biases, privacy concerns, and data confidentiality. We recommend using LLMs as complementary tools under clear guidelines to support, not replace, human expertise in maintaining rigorous peer review standards.

JEEHP : Journal of Educational Evaluation for Health Professions
TOP