Journal of Educational Evaluation for Health Professions

Research articles

Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study: Aleksandra Ignjatović, Lazar Stevanović; J Educ Eval Health Prof. 2023;20:28. Published online October 16, 2023; DOI: https://doi.org/10.3352/jeehp.2023.20.28

1,854 View
168 Download
1 Web of Science
2 Crossref

Purpose
This study aimed to assess the performance of ChatGPT (GPT-3.5 and GPT-4) as a study tool in solving biostatistical problems and to identify any potential drawbacks that might arise from using ChatGPT in medical education, particularly in solving practical biostatistical problems.
Methods
ChatGPT was tested to evaluate its ability to solve biostatistical problems from the Handbook of Medical Statistics by Peacock and Peacock in this descriptive study. Tables from the problems were transformed into textual questions. Ten biostatistical problems were randomly chosen and used as text-based input for conversation with ChatGPT (versions 3.5 and 4).
Results
GPT-3.5 solved 5 practical problems in the first attempt, related to categorical data, cross-sectional study, measuring reliability, probability properties, and the t-test. GPT-3.5 failed to provide correct answers regarding analysis of variance, the chi-square test, and sample size within 3 attempts. GPT-4 also solved a task related to the confidence interval in the first attempt and solved all questions within 3 attempts, with precise guidance and monitoring.
Conclusion
The assessment of both versions of ChatGPT performance in 10 biostatistical problems revealed that GPT-3.5 and 4’s performance was below average, with correct response rates of 5 and 6 out of 10 on the first attempt. GPT-4 succeeded in providing all correct answers within 3 attempts. These findings indicate that students must be aware that this tool, even when providing and calculating different statistical analyses, can be wrong, and they should be aware of ChatGPT’s limitations and be careful when incorporating this model into medical education.

Citations

Citations to this article as recorded by

Can Generative AI and ChatGPT Outperform Humans on Cognitive-Demanding Problem-Solving Tasks in Science?
Xiaoming Zhai, Matthew Nyaaba, Wenchao Ma
Science & Education.2024;[Epub] CrossRef
Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
Xiaojun Xu, Yixiao Chen, Jing Miao
Journal of Educational Evaluation for Health Professions.2024; 21: 6. CrossRef

Training in statistical analysis reduces the framing effect among medical students and residents in Argentina: Raúl Alfredo Borracci, Eduardo Benigno Arribalzaga, Jorge Thierer; J Educ Eval Health Prof. 2020;17:25. Published online September 1, 2020; DOI: https://doi.org/10.3352/jeehp.2020.17.25

4,554 View
129 Download
1 Web of Science
1 Crossref

Abstract PDF Supplementary Material

Purpose
The framing effect refers to a phenomenon wherein, when the same problem is presented using different representations of information, people make significant changes in their decisions. This study aimed to explore whether the framing effect could be reduced in medical students and residents by teaching them the statistical concepts of effect size, probability, and sampling for use in the medical decision-making process.
Methods
Ninety-five second-year medical students and 100 second-year medical residents of Austral University and Buenos Aires University, Argentina were invited to participate in the study between March and June 2017. A questionnaire was developed to assess the different types of framing effects in medical situations. After an initial administration of the survey, students and residents were taught statistical concepts including effect size, probability, and sampling during 2 individual independent official biostatistics courses. After these interventions, the same questionnaire was randomly administered again, and pre- and post-intervention outcomes were compared among students and residents.
Results
Almost every type of framing effect was reproduced either in the students or in the residents. After teaching medical students and residents the analytical process behind statistical concepts, a significant reduction in sample-size, risky-choice, pseudo-certainty, number-size, attribute, goal, and probabilistic formulation framing effects was observed.
Conclusion
The decision-making of medical students and residents in simulated medical situations may be affected by different frame descriptions, and these framing effects can be partially reduced by training individuals in probability analysis and statistical sampling methods.

Citations

Citations to this article as recorded by

Numeracy Education for Health Care Providers: A Scoping Review
Casey Goldstein, Nicole Woods, Rebecca MacKinnon, Rouhi Fazelzad, Bhajan Gill, Meredith Elana Giuliani, Tina Papadakos, Qinge Wei, Janet Papadakos
Journal of Continuing Education in the Health Professions.2024; 44(1): 35. CrossRef

Brief Report

An objective structured biostatistics examination: a pilot study based on computer-assisted evaluation for undergraduates: Abdul Sattar Khan, Hamit Acemoglu, Zekeriya Akturk; J Educ Eval Health Prof. 2012;9:9. Published online July 17, 2012; DOI: https://doi.org/10.3352/jeehp.2012.9.9

26,862 View
166 Download
1 Crossref

Abstract PDF

We designed and evaluated an objective structured biostatistics examination (OSBE) on a trial basis to determine whether it was feasible for formative or summative assessment. At Ataturk University, we have a seminar system for curriculum for every cohort of all five years undergraduate education. Each seminar consists of an integrated system for different subjects, every year three to six seminars that meet for six to eight weeks, and at the end of each seminar term we conduct an examination as a formative assessment. In 2010, 201 students took the OSBE, and in 2011, 211 students took the same examination at the end of a seminar that had biostatistics as one module. The examination was conducted in four groups and we examined two groups together. Each group had to complete 5 stations in each row therefore we had two parallel lines with different instructions to be followed, thus we simultaneously examined 10 students in these two parallel lines. The students were invited after the examination to receive feedback from the examiners and provide their reflections. There was a significant (P= 0.004) difference between male and female scores in the 2010 students, but no gender difference was found in 2011. The comparison among the parallel lines and among the four groups showed that two groups, A and B, did not show a significant difference (P> 0.05) in either class. Nonetheless, among the four groups, there was a significant difference in both 2010 (P= 0.001) and 2011 (P= 0.001). The inter-rater reliability coefficient was 0.60. Overall, the students were satisfied with the testing method; however, they felt some stress. The overall experience of the OSBE was useful in terms of learning, as well as for assessment.

Citations

Citations to this article as recorded by

THE COMPARISON OF DIFFERENT ASSESSMENT TECHNIQUES USED IN PHYSIOLOGY PRACTICAL ASSESSMENT
Ksh. Lakshmikumari, Sarada N, Lalit Kumar L
INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH.2022; : 7. CrossRef

First
Prev
Page of 1
Next
Last

Search