Search
- Page Path
-
HOME
> Search
Research articles
-
Comparison between GPT-4 and human raters in grading pharmacy students’ exam responses in Malaysia: a cross-sectional study
-
Wuan Shuen Yap
, Pui San Saw
, Li Ling Yeap
, Shaun Wen Huey Lee
, Wei Jin Wong
, Ronald Fook Seng Lee
-
J Educ Eval Health Prof. 2025;22:20. Published online July 28, 2025
-
DOI: https://doi.org/10.3352/jeehp.2025.22.20
-
-
2,391
View
-
239
Download
-
1
Web of Science
-
Abstract
PDF
Supplementary Material
- Purpose
Manual grading is time-consuming and prone to inconsistencies, prompting the exploration of generative artificial intelligence tools such as GPT-4 to enhance efficiency and reliability. This study investigated GPT-4’s potential in grading pharmacy students’ exam responses, focusing on the impact of optimized prompts. Specifically, it evaluated the alignment between GPT-4 and human raters, assessed GPT-4’s consistency over time, and determined its error rates in grading pharmacy students’ exam responses.
Methods
We conducted a comparative study using past exam responses graded by university-trained raters and by GPT-4. Responses were randomized before evaluation by GPT-4, accessed via a Plus account between April and September 2024. Prompt optimization was performed on 16 responses, followed by evaluation of 3 prompt delivery methods. We then applied the optimized approach across 4 item types. Intraclass correlation coefficients and error analyses were used to assess consistency and agreement between GPT-4 and human ratings.
Results
GPT-4’s ratings aligned reasonably well with human raters, demonstrating moderate to excellent reliability (intraclass correlation coefficient=0.617–0.933), depending on item type and the optimized prompt. When stratified by grade bands, GPT-4 was less consistent in marking high-scoring responses (Z=–5.71–4.62, P<0.001). Overall, despite achieving substantial alignment with human raters in many cases, discrepancies across item types and a tendency to commit basic errors necessitate continued educator involvement to ensure grading accuracy.
Conclusion
With optimized prompts, GPT-4 shows promise as a supportive tool for grading pharmacy students’ exam responses, particularly for objective tasks. However, its limitations—including errors and variability in grading high-scoring responses—require ongoing human oversight. Future research should explore advanced generative artificial intelligence models and broader assessment formats to further enhance grading reliability.
-
Pharmacy students’ perspective on remote flipped classrooms in Malaysia: a qualitative study
-
Wei Jin Wong
, Shaun Wen Huey Lee
, Ronald Fook Seng Lee
-
J Educ Eval Health Prof. 2025;22:2. Published online January 14, 2025
-
DOI: https://doi.org/10.3352/jeehp.2025.22.2
-
-
Abstract
PDF
Supplementary Material
- Purpose
This study aimed to explore pharmacy students’ perceptions of remote flipped classrooms in Malaysia, focusing on their learning experiences and identifying areas for potential improvement to inform future educational strategies.
Methods
A qualitative approach was employed, utilizing inductive thematic analysis. Twenty Bachelor of Pharmacy students (18 women, 2 men; age range, 19–24 years) from Monash University participated in 8 focus group discussions over 2 rounds during the coronavirus disease 2019 pandemic. Participants were recruited via convenience sampling. The focus group discussions, led by experienced academics, were conducted in English via Zoom, recorded, and transcribed for analysis using NVivo. Themes were identified through emergent coding and iterative discussions to ensure thematic saturation.
Results
Five major themes emerged: flexibility, communication, technological challenges, skill-based learning challenges, and time-based effects. Students appreciated the flexibility of accessing and reviewing pre-class materials at their convenience. Increased engagement through anonymous question submission was noted, yet communication difficulties and lack of non-verbal cues in remote workshops were significant drawbacks. Technological issues, such as internet connectivity problems, hindered learning, especially during assessments. Skill-based learning faced challenges in remote settings, including lab activities and clinical examinations. Additionally, prolonged remote learning led to feelings of isolation, fatigue, and a desire to return to in-person interactions.
Conclusion
Remote flipped classrooms offer flexibility and engagement benefits but present notable challenges related to communication, technology, and skill-based learning. To improve remote education, institutions should integrate robust technological support, enhance communication strategies, and incorporate virtual simulations for practical skills. Balancing asynchronous and synchronous methods while addressing academic success and socioemotional wellness is essential for effective remote learning environments.
TOP