Purpose This study investigated the longitudinal relationships between performance on 3 examinations assessing medical knowledge and clinical skills among Korean medical students in the clinical phase. This study addressed the stability of each examination score and the interrelationships among examinations over time.
Methods A retrospective longitudinal study was conducted at Yonsei University College of Medicine in Korea with a cohort of 112 medical students over 2 years. The students were in their third year in 2022 and progressed to the fourth year in 2023. We obtained comprehensive clinical science examination (CCSE) and progress test (PT) scores 3 times (T1–T3), and clinical performance examination (CPX) scores twice (T1 and T2). Autoregressive cross-lagged models were fitted to analyze their relationships.
Results For each of the 3 examinations, the score at 1 time point predicted the subsequent score. Regarding cross-lagged effects, the CCSE at T1 predicted PT at T2 (β=0.472, P<0.001) and CCSE at T2 predicted PT at T3 (β=0.527, P<0.001). The CPX at T1 predicted the CCSE at T2 (β=0.163, P=0.006), and the CPX at T2 predicted the CCSE at T3 (β=0.154, P=0.006). The PT at T1 predicted the CPX at T2 (β=0.273, P=0.006).
Conclusion The study identified each examination’s stability and the complexity of the longitudinal relationships between them. These findings may help predict medical students’ performance on subsequent examinations, potentially informing the provision of necessary student support.
Radiotorax.es is a free, non-profit web-based tool designed to support formative self-assessment in chest X-ray interpretation. This article presents its structure, educational applications, and usage data from 11 years of continuous operation. Users complete interpretation rounds of 20 clinical cases, compare their reports with expert evaluations, and conduct a structured self-assessment. From 2011 to 2022, 14,389 users registered, and 7,726 completed at least one session. Most were medical students (75.8%), followed by residents (15.2%) and practicing physicians (9.0%). The platform has been integrated into undergraduate medical curricula and used in various educational contexts, including tutorials, peer and expert review, and longitudinal tracking. Its flexible design supports self-directed learning, instructor-guided use, and multicenter research. As a freely accessible resource based on real clinical cases, Radiotorax.es provides a scalable, realistic, and well-received training environment that promotes diagnostic skill development, reflection, and educational innovation in radiology education.
Purpose This study aimed to evaluate the feasibility of general-purpose large language models (LLMs) in addressing inequities in medical licensure exam preparation for Thailand’s National Medical Licensing Examination (ThaiNLE), which currently lacks standardized public study materials.
Methods We assessed 4 multi-modal LLMs (GPT-4, Claude 3 Opus, Gemini 1.0/1.5 Pro) using a 304-question ThaiNLE Step 1 mock examination (10.2% image-based), applying deterministic API configurations and 5 inference repetitions per model. Performance was measured via micro- and macro-accuracy metrics compared against historical passing thresholds.
Results All models exceeded passing scores, with GPT-4 achieving the highest accuracy (88.9%; 95% confidence interval, 88.7–89.1), surpassing Thailand’s national average by more than 2 standard deviations. Claude 3.5 Sonnet (80.1%) and Gemini 1.5 Pro (72.8%) followed hierarchically. Models demonstrated robustness across 17 of 20 medical domains, but variability was noted in genetics (74.0%) and cardiovascular topics (58.3%). While models demonstrated proficiency with images (Gemini 1.0 Pro: +9.9% vs. text), text-only accuracy remained superior (GPT-4o: 90.0% vs. 82.6%).
Conclusion General-purpose LLMs show promise as equitable preparatory tools for ThaiNLE Step 1. However, domain-specific knowledge gaps and inconsistent multi-modal integration warrant refinement before clinical deployment.
Purpose To compare the effectiveness of mixed reality with traditional manikin-based simulation in basic life support (BLS) training, testing the hypothesis that mixed reality is non-inferior to manikin-based simulation.
Methods A non-inferiority randomized controlled trial was conducted. Third-year medical students were randomized into 2 groups. The mixed reality group received 32 minutes of individual training using a virtual reality headset and a torso for chest compressions (CC). The manikin group participated in 2 hours of group training consisting of theoretical and practical sessions using a low-fidelity manikin. The primary outcome was the overall BLS performance score, assessed at 1 month through a standardized BLS scenario using a 10-item assessment scale. The quality of CC, student satisfaction, and confidence levels were secondary outcomes and assessed through superiority analyses.
Results Data from 155 participants were analyzed, with 84 in the mixed reality group and 71 in the manikin group. The mean overall BLS performance score was 6.4 (mixed reality) vs. 6.5 (manikin), (mean difference, –0.1; 95% confidence interval [CI], –0.45 to +∞). CC depth was greater in the manikin group (50.3 mm vs. 46.6 mm; mean difference, –3.7 mm; 95% CI, –6.5 to –0.9), with 61.2% achieving optimal depth compared to 43.8% in the mixed reality group (mean difference, 17.4%; 95% CI, –29.3 to –5.5). Satisfaction was higher in the mixed reality group (4.9/5 vs. 4.7/5 in the manikin group; difference, 0.2; 95% CI, 0.07 to 0.33), as was confidence in performing BLS (3.9/5 vs. 3.6/5; difference, 0.3; 95% CI, 0.11 to 0.58). No other significant differences were observed for secondary outcomes.
Conclusion Mixed reality is non-inferior to manikin simulation in terms of overall BLS performance score assessed at 1 month.
Purpose This study aimed to evaluate the effect of the Strengthening Nurse Practitioners’ Competency in Occupational Health Service (SNPCOHS) program. It was hypothesized that nurse practitioners (NPs) participating in the program would demonstrate increased competency in providing occupational health services to agricultural workers exposed to pesticides in primary care units (PCUs) compared to their baseline competency and to a comparison group.
Methods A quasi-experimental study was conducted between August and December 2023. The 4-week intervention included 5 hours of an e-learning program, 3 hours of online discussion, and 2 hours dedicated to completing an assignment. The program was evaluated at 3 time points: pre-intervention, post-intervention (week 4), and follow-up (week 8). Sixty NPs volunteered to participate, with 30 in the experimental group and 30 in the comparison group. Data on demographics, professional attributes, knowledge, skills, and perceived self-efficacy were collected using self-administered questionnaires via Google Forms. Data analysis involved descriptive statistics, independent t-tests, and repeated measures analysis of variance.
Results The experimental group demonstrated significantly higher mean scores in professional attributes, knowledge, skills, and perceived self-efficacy in providing occupational health services to agricultural workers exposed to pesticides compared to the comparison group at both week 4 and week 8 post-intervention.
Conclusion The SNPCOHS program is well-suited for self-directed learning for nurses in PCUs, supporting effective occupational health service delivery. It should be disseminated and supported as an e-learning resource for NPs in PCUs (Thai Clinical Trials Registry: TCTR20250115004).
Purpose This study aimed to adapt and validate the Albanian version of the Genomic Nursing Concept Inventory (GNCI) and to assess the level of genomic literacy among nursing and midwifery students.
Methods Data were collected via a monocentric online cross-sectional study using the Albanian version of the GNCI. Participants included first-, second-, and third-year nursing and midwifery students. Demographic data such as age, sex, year level, and prior exposure to genetics were collected. The Kruskal-Wallis, Mann-Whitney U, and chi-square tests were used to compare demographic characteristics and GNCI scores between groups.
Results Among the 715 participants, most were female (88.5%) with a median age of 19 years. Most respondents (65%) had not taken a genetics course, and 83.5% had not attended any related training. The mean score was 7.49, corresponding to a scale difficulty of 24.38% correct responses.
Conclusion The findings reveal a low foundational knowledge of genetics/genomics among future nurses and midwives. It is essential to enhance learning strategies and update curricula to prepare a competent healthcare workforce in precision health.
Purpose The objective structured clinical examination (OSCE) is an effective but resource-intensive tool for assessing clinical competence. This study hypothesized that implementing a virtual OSCE in the Second Life (SL) platform in the metaverse as a cost-effective alternative will effectively assess and enhance clinical skills in emergency radiology while being feasible and well-received. The aim was to evaluate a virtual radiology OSCE in SL as a formative assessment, focusing on feasibility, educational impact, and students’ perceptions.
Methods Two virtual 6-station OSCE rooms dedicated to emergency radiology were developed in SL. Sixth-year medical students completed the OSCE during a 1-hour session in 2022–2023, followed by feedback including a correction checklist, individual scores, and group comparisons. Students completed a questionnaire with Likert-scale questions, a 10-point rating, and open-ended comments. Quantitative data were analyzed using the Student t-test and the Mann-Whitney U test, and qualitative data through thematic analysis.
Results In total, 163 students participated, achieving mean scores of 5.1±1.4 and 4.9±1.3 (out of 10) in the 2 virtual OSCE rooms, respectively (P=0.287). One hundred seventeen students evaluated the OSCE, praising the teaching staff (9.3±1.0), project organization (8.8±1.2), OSCE environment (8.7±1.5), training usefulness (8.6±1.5), and formative self-assessment (8.5±1.4). Likert-scale questions and students’ open-ended comments highlighted the virtual environment’s attractiveness, case selection, self-evaluation usefulness, project excellence, and training impact. Technical difficulties were reported by 13 students (8%).
Conclusion This study demonstrated the feasibility of incorporating formative OSCEs in SL as a useful teaching tool for undergraduate radiology education, which was cost-effective and highly valued by students.
Purpose The revised Clinical Skills Test (CST) of the Korean Medical Licensing Exam aims to provide a better assessment of physicians’ clinical competence and ability to interact with patients. This study examined the impact of the revised CST on medical education curricula and resources nationwide, while also identifying areas for improvement within the revised CST.
Methods This study surveyed faculty responsible for clinical clerkships at 40 medical schools throughout Korea to evaluate the status and changes in clinical skills education, assessment, and resources related to the CST. The researchers distributed the survey via email through regional consortia between December 7, 2023 and January 19, 2024.
Results Nearly all schools implemented preliminary student–patient encounters during core clinical rotations. Schools primarily conducted clinical skills assessments in the third and fourth years, with a simplified form introduced in the first and second years. Remedial education was conducted through various methods, including one-on-one feedback from faculty after the assessment. All schools established clinical skills centers and made ongoing improvements. Faculty members did not perceive the CST revisions as significantly altering clinical clerkship or skills assessments. They suggested several improvements, including assessing patient records to improve accuracy and increasing the objectivity of standardized patient assessments to ensure fairness.
Conclusion During the CST, students’ involvement in patient encounters and clinical skills education increased, improving the assessment and feedback processes for clinical skills within the curriculum. To enhance students’ clinical competencies and readiness, strengthening the validity and reliability of the CST is essential.
Purpose This study aims to validate the use of ProAnalyst (Xcitex Inc.), a software for professional motion analysts to assess the performance of surgical interns while performing the peg transfer task in a simulator box for safe practice in real minimally invasive surgery.
Methods A correlation study was conducted in a multidisciplinary skills simulation lab at the Faculty of Medicine, Jordan University of Science and Technology from October 2019 to February 2020. Forty-one interns (i.e., novices and intermediates) were recruited and an expert surgeon participated as a reference benchmark. Videos of participants’ performance were analyzed through the ProAnalyst and Global Operative Assessment of Laparoscopic Skills (GOALS). Two results were s analyzed for correlation.
Results The motion analysis scores by Proanalyst were correlated with those by GOALS for novices (r=–0.62925, P=0.009), and Intermediates (r= –0.53422, P=0.033). Both assessment methods differentiated the participants’ performance based on their experience level.
Conclusion The motion analysis scoring method with Proanalyst provides an objective, time-efficient, and reproducible assessment of interns’ performance, and comparable to GOALS. It may require initial training and set-up; however, it eliminates the need for expert surgeon judgment.
Purpose Although it is widely utilized in clinical subjects for skill training, using simulation-based education (SBE) for teaching basic science concepts to phase I medical students or pre-clinical students is limited. Simulation-based education/teaching is preferred in cardiovascular and respiratory physiology when compared to other systems because it is easy to recreate both the normal physiological component and alterations in the simulated environment, thus a promoting deep understanding of the core concepts.
Methods A block randomized study was conducted among 107 phase 1 (first-year) medical undergraduate students at a Deemed to be University in India. Group A received SBE and Group B traditional small group teaching. The effectiveness of the teaching intervention was assessed using pre- and post-tests. Student feedback was obtained through a self administered structured questionnaire via an anonymous online survey and by in-depth interview.
Results The intervention group showed a statistically significant improvement in post-test scores compared to the control group. A sub-analysis revealed that high scorers performed better than low scorers in both groups, but the knowledge gain among low scorers was more significant in the intervention group.
Conclusion This teaching strategy offers a valuable supplement to traditional methods, fostering a deeper comprehension of clinical concepts from the outset of medical training.
Citations
Citations to this article as recorded by
Simulation and Augmented Reality on Academic Performance and Engagement in Grade 11 Earth and Life Science Abigail G. Dumaguing, Wilfred G. Alava Jr. International Journal of Innovative Science and Research Technology.2025; : 2817. CrossRef
Purpose This study evaluated the Dr LEE Jong-wook Fellowship Program’s impact on Tanzania’s health workforce, focusing on relevance, effectiveness, efficiency, impact, and sustainability in addressing healthcare gaps.
Methods A mixed-methods research design was employed. Data were collected from 97 out of 140 alumni through an online survey, 35 in-depth interviews, and one focus group discussion. The study was conducted from November to December 2023 and included alumni from 2009 to 2022. Measurement instruments included structured questionnaires for quantitative data and semi-structured guides for qualitative data. Quantitative analysis involved descriptive and inferential statistics (Spearman’s rank correlation, non-parametric tests) using Python ver. 3.11.0 and Stata ver. 14.0. Thematic analysis was employed to analyze qualitative data using NVivo ver. 12.0.
Results Findings indicated high relevance (mean=91.6, standard deviation [SD]=8.6), effectiveness (mean=86.1, SD=11.2), efficiency (mean=82.7, SD=10.2), and impact (mean=87.7, SD=9.9), with improved skills, confidence, and institutional service quality. However, sustainability had a lower score (mean=58.0, SD=11.1), reflecting challenges in follow-up support and resource allocation. Effectiveness strongly correlated with impact (ρ=0.746, P<0.001). The qualitative findings revealed that participants valued tailored training but highlighted barriers, such as language challenges and insufficient practical components. Alumni-led initiatives contributed to knowledge sharing, but limited resources constrained sustainability.
Conclusion The Fellowship Program enhanced Tanzania’s health workforce capacity, but it requires localized curricula and strengthened alumni networks for sustainability. These findings provide actionable insights for improving similar programs globally, confirming the hypothesis that tailored training positively influences workforce and institutional outcomes.
Purpose To generate Cronbach’s alpha and further mixed methods construct validity evidence for the Blended Learning Usability Evaluation–Questionnaire (BLUE-Q).
Methods Forty interprofessional clinicians completed the BLUE-Q after finishing a 3-month long blended learning professional development program in Ontario, Canada. Reliability was assessed with Cronbach’s α for each of the 3 sections of the BLUE-Q and for all quantitative items together. Construct validity was evaluated through the Grand-Guillaume-Perrenoud et al. framework, which consists of 3 elements: congruence, convergence, and credibility. To compare quantitative and qualitative results, descriptive statistics, including means and standard deviations for each Likert scale item of the BLUE-Q were calculated.
Results Cronbach’s α was 0.95 for the pedagogical usability section, 0.85 for the synchronous modality section, 0.93 for the asynchronous modality section, and 0.96 for all quantitative items together. Mean ratings (with standard deviations) were 4.77 (0.506) for pedagogy, 4.64 (0.654) for synchronous learning, and 4.75 (0.536) for asynchronous learning. Of the 239 qualitative comments received, 178 were identified as substantive, of which 88% were considered congruent and 79% were considered convergent with the high means. Among all congruent responses, 69% were considered confirming statements and 31% were considered clarifying statements, suggesting appropriate credibility. Analysis of the clarifying statements assisted in identifying 5 categories of suggestions for program improvement.
Conclusion The BLUE-Q demonstrates high reliability and appropriate construct validity in the context of a blended learning program with interprofessional clinicians, making it a valuable tool for comprehensive program evaluation, quality improvement, and evaluative research in health professions education.
The peer review process ensures the integrity of scientific research. This is particularly important in the medical field, where research findings directly impact patient care. However, the rapid growth of publications has strained reviewers, causing delays and potential declines in quality. Generative artificial intelligence, especially large language models (LLMs) such as ChatGPT, may assist researchers with efficient, high-quality reviews. This review explores the integration of LLMs into peer review, highlighting their strengths in linguistic tasks and challenges in assessing scientific validity, particularly in clinical medicine. Key points for integration include initial screening, reviewer matching, feedback support, and language review. However, implementing LLMs for these purposes will necessitate addressing biases, privacy concerns, and data confidentiality. We recommend using LLMs as complementary tools under clear guidelines to support, not replace, human expertise in maintaining rigorous peer review standards.