Skip Navigation
Skip to contents

JEEHP : Journal of Educational Evaluation for Health Professions

OPEN ACCESS
SEARCH
Search

Search

Page Path
HOME > Search
3 "Biostatistics"
Filter
Filter
Article category
Keywords
Publication year
Authors
Funded articles
Research articles
Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study  
Aleksandra Ignjatović, Lazar Stevanović
J Educ Eval Health Prof. 2023;20:28.   Published online October 16, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.28
  • 4,662 View
  • 233 Download
  • 13 Web of Science
  • 13 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to assess the performance of ChatGPT (GPT-3.5 and GPT-4) as a study tool in solving biostatistical problems and to identify any potential drawbacks that might arise from using ChatGPT in medical education, particularly in solving practical biostatistical problems.
Methods
ChatGPT was tested to evaluate its ability to solve biostatistical problems from the Handbook of Medical Statistics by Peacock and Peacock in this descriptive study. Tables from the problems were transformed into textual questions. Ten biostatistical problems were randomly chosen and used as text-based input for conversation with ChatGPT (versions 3.5 and 4).
Results
GPT-3.5 solved 5 practical problems in the first attempt, related to categorical data, cross-sectional study, measuring reliability, probability properties, and the t-test. GPT-3.5 failed to provide correct answers regarding analysis of variance, the chi-square test, and sample size within 3 attempts. GPT-4 also solved a task related to the confidence interval in the first attempt and solved all questions within 3 attempts, with precise guidance and monitoring.
Conclusion
The assessment of both versions of ChatGPT performance in 10 biostatistical problems revealed that GPT-3.5 and 4’s performance was below average, with correct response rates of 5 and 6 out of 10 on the first attempt. GPT-4 succeeded in providing all correct answers within 3 attempts. These findings indicate that students must be aware that this tool, even when providing and calculating different statistical analyses, can be wrong, and they should be aware of ChatGPT’s limitations and be careful when incorporating this model into medical education.

Citations

Citations to this article as recorded by  
  • From statistics to deep learning: Using large language models in psychiatric research
    Yining Hua, Andrew Beam, Lori B. Chibnik, John Torous
    International Journal of Methods in Psychiatric Research.2025;[Epub]     CrossRef
  • Assessing the Current Limitations of Large Language Models in Advancing Health Care Education
    JaeYong Kim, Bathri Narayan Vajravelu
    JMIR Formative Research.2025; 9: e51319.     CrossRef
  • ChatGPT for Univariate Statistics: Validation of AI-Assisted Data Analysis in Healthcare Research
    Michael R Ruta, Tony Gaidici, Chase Irwin, Jonathan Lifshitz
    Journal of Medical Internet Research.2025; 27: e63550.     CrossRef
  • Can Generative AI and ChatGPT Outperform Humans on Cognitive-Demanding Problem-Solving Tasks in Science?
    Xiaoming Zhai, Matthew Nyaaba, Wenchao Ma
    Science & Education.2024;[Epub]     CrossRef
  • Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
    Xiaojun Xu, Yixiao Chen, Jing Miao
    Journal of Educational Evaluation for Health Professions.2024; 21: 6.     CrossRef
  • Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom’s Taxonomy
    Ambadasu Bharatha, Nkemcho Ojeh, Ahbab Mohammad Fazle Rabbi, Michael Campbell, Kandamaran Krishnamurthy, Rhaheem Layne-Yarde, Alok Kumar, Dale Springer, Kenneth Connell, Md Anwarul Majumder
    Advances in Medical Education and Practice.2024; Volume 15: 393.     CrossRef
  • Revolutionizing Cardiology With Words: Unveiling the Impact of Large Language Models in Medical Science Writing
    Abhijit Bhattaru, Naveena Yanamala, Partho P. Sengupta
    Canadian Journal of Cardiology.2024; 40(10): 1950.     CrossRef
  • ChatGPT in medicine: prospects and challenges: a review article
    Songtao Tan, Xin Xin, Di Wu
    International Journal of Surgery.2024; 110(6): 3701.     CrossRef
  • In-depth analysis of ChatGPT’s performance based on specific signaling words and phrases in the question stem of 2377 USMLE step 1 style questions
    Leonard Knoedler, Samuel Knoedler, Cosima C. Hoch, Lukas Prantl, Konstantin Frank, Laura Soiderer, Sebastian Cotofana, Amir H. Dorafshar, Thilo Schenck, Felix Vollbach, Giuseppe Sofo, Michael Alfertshofer
    Scientific Reports.2024;[Epub]     CrossRef
  • Evaluating the quality of responses generated by ChatGPT
    Danimir Mandić, Gordana Miščević, Ljiljana Bujišić
    Metodicka praksa.2024; 27(1): 5.     CrossRef
  • A Comparative Evaluation of Statistical Product and Service Solutions (SPSS) and ChatGPT-4 in Statistical Analyses
    Al Imran Shahrul, Alizae Marny F Syed Mohamed
    Cureus.2024;[Epub]     CrossRef
  • ChatGPT and Other Large Language Models in Medical Education — Scoping Literature Review
    Alexandra Aster, Matthias Carl Laupichler, Tamina Rockwell-Kollmann, Gilda Masala, Ebru Bala, Tobias Raupach
    Medical Science Educator.2024;[Epub]     CrossRef
  • Exploring the potential of large language models for integration into an academic statistical consulting service–the EXPOLS study protocol
    Urs Alexander Fichtner, Jochen Knaus, Erika Graf, Georg Koch, Jörg Sahlmann, Dominikus Stelzer, Martin Wolkewitz, Harald Binder, Susanne Weber, Bekalu Tadesse Moges
    PLOS ONE.2024; 19(12): e0308375.     CrossRef
Training in statistical analysis reduces the framing effect among medical students and residents in Argentina  
Raúl Alfredo Borracci, Eduardo Benigno Arribalzaga, Jorge Thierer
J Educ Eval Health Prof. 2020;17:25.   Published online September 1, 2020
DOI: https://doi.org/10.3352/jeehp.2020.17.25
  • 5,852 View
  • 140 Download
  • 1 Web of Science
  • 1 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
The framing effect refers to a phenomenon wherein, when the same problem is presented using different representations of information, people make significant changes in their decisions. This study aimed to explore whether the framing effect could be reduced in medical students and residents by teaching them the statistical concepts of effect size, probability, and sampling for use in the medical decision-making process.
Methods
Ninety-five second-year medical students and 100 second-year medical residents of Austral University and Buenos Aires University, Argentina were invited to participate in the study between March and June 2017. A questionnaire was developed to assess the different types of framing effects in medical situations. After an initial administration of the survey, students and residents were taught statistical concepts including effect size, probability, and sampling during 2 individual independent official biostatistics courses. After these interventions, the same questionnaire was randomly administered again, and pre- and post-intervention outcomes were compared among students and residents.
Results
Almost every type of framing effect was reproduced either in the students or in the residents. After teaching medical students and residents the analytical process behind statistical concepts, a significant reduction in sample-size, risky-choice, pseudo-certainty, number-size, attribute, goal, and probabilistic formulation framing effects was observed.
Conclusion
The decision-making of medical students and residents in simulated medical situations may be affected by different frame descriptions, and these framing effects can be partially reduced by training individuals in probability analysis and statistical sampling methods.

Citations

Citations to this article as recorded by  
  • Numeracy Education for Health Care Providers: A Scoping Review
    Casey Goldstein, Nicole Woods, Rebecca MacKinnon, Rouhi Fazelzad, Bhajan Gill, Meredith Elana Giuliani, Tina Papadakos, Qinge Wei, Janet Papadakos
    Journal of Continuing Education in the Health Professions.2024; 44(1): 35.     CrossRef
Brief Report
An objective structured biostatistics examination: a pilot study based on computer-assisted evaluation for undergraduates
Abdul Sattar Khan, Hamit Acemoglu, Zekeriya Akturk
J Educ Eval Health Prof. 2012;9:9.   Published online July 17, 2012
DOI: https://doi.org/10.3352/jeehp.2012.9.9
  • 27,624 View
  • 168 Download
  • 1 Crossref
AbstractAbstract PDF
We designed and evaluated an objective structured biostatistics examination (OSBE) on a trial basis to determine whether it was feasible for formative or summative assessment. At Ataturk University, we have a seminar system for curriculum for every cohort of all five years undergraduate education. Each seminar consists of an integrated system for different subjects, every year three to six seminars that meet for six to eight weeks, and at the end of each seminar term we conduct an examination as a formative assessment. In 2010, 201 students took the OSBE, and in 2011, 211 students took the same examination at the end of a seminar that had biostatistics as one module. The examination was conducted in four groups and we examined two groups together. Each group had to complete 5 stations in each row therefore we had two parallel lines with different instructions to be followed, thus we simultaneously examined 10 students in these two parallel lines. The students were invited after the examination to receive feedback from the examiners and provide their reflections. There was a significant (P= 0.004) difference between male and female scores in the 2010 students, but no gender difference was found in 2011. The comparison among the parallel lines and among the four groups showed that two groups, A and B, did not show a significant difference (P> 0.05) in either class. Nonetheless, among the four groups, there was a significant difference in both 2010 (P= 0.001) and 2011 (P= 0.001). The inter-rater reliability coefficient was 0.60. Overall, the students were satisfied with the testing method; however, they felt some stress. The overall experience of the OSBE was useful in terms of learning, as well as for assessment.

Citations

Citations to this article as recorded by  
  • THE COMPARISON OF DIFFERENT ASSESSMENT TECHNIQUES USED IN PHYSIOLOGY PRACTICAL ASSESSMENT
    Ksh. Lakshmikumari, Sarada N, Lalit Kumar L
    INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH.2022; : 7.     CrossRef

JEEHP : Journal of Educational Evaluation for Health Professions
TOP