Purpose With the coronavirus disease 2019 pandemic, online high-stakes exams have become a viable alternative. This study evaluated the feasibility of computer-based testing (CBT) for medical residency applications in Brazil and its impacts on item quality and applicants’ access compared to paper-based testing.
Methods In 2020, an online CBT was conducted in a Ribeirao Preto Clinical Hospital in Brazil. In total, 120 multiple-choice question items were constructed. Two years later, the exam was performed as paper-based testing. Item construction processes were similar for both exams. Difficulty and discrimination indexes, point-biserial coefficient, difficulty, discrimination, guessing parameters, and Cronbach’s α coefficient were measured based on the item response and classical test theories. Internet stability for applicants was monitored.
Results In 2020, 4,846 individuals (57.1% female, mean age of 26.64±3.37 years) applied to the residency program, versus 2,196 individuals (55.2% female, mean age of 26.47±3.20 years) in 2022. For CBT, there was an increase of 2,650 applicants (120.7%), albeit with significant differences in demographic characteristics. There was a significant increase in applicants from more distant and lower-income Brazilian regions, such as the North (5.6% vs. 2.7%) and Northeast (16.9% vs. 9.0%). No significant differences were found in difficulty and discrimination indexes, point-biserial coefficients, and Cronbach’s α coefficients between the 2 exams.
Conclusion Online CBT with multiple-choice questions was a viable format for a residency application exam, improving accessibility without compromising exam integrity and quality.
Purpose Computerized adaptive testing (CAT) has been adopted in licensing examinations because it improves the efficiency and accuracy of the tests, as shown in many studies. This simulation study investigated CAT scoring and item selection methods for the Korean Medical Licensing Examination (KMLE).
Methods This study used a post-hoc (real data) simulation design. The item bank used in this study included all items from the January 2017 KMLE. All CAT algorithms for this study were implemented using the ‘catR’ package in the R program.
Results In terms of accuracy, the Rasch and 2-parametric logistic (PL) models performed better than the 3PL model. The ‘modal a posteriori’ and ‘expected a posterior’ methods provided more accurate estimates than maximum likelihood estimation or weighted likelihood estimation. Furthermore, maximum posterior weighted information and minimum expected posterior variance performed better than other item selection methods. In terms of efficiency, the Rasch model is recommended to reduce test length.
Conclusion Before implementing live CAT, a simulation study should be performed under varied test conditions. Based on a simulation study, and based on the results, specific scoring and item selection methods should be predetermined.
Citations
Citations to this article as recorded by
Large-Scale Parallel Cognitive Diagnostic Test Assembly Using A Dual-Stage Differential Evolution-Based Approach Xi Cao, Ying Lin, Dong Liu, Henry Been-Lirn Duh, Jun Zhang IEEE Transactions on Artificial Intelligence.2024; 5(6): 3120. CrossRef
Assessing the Potentials of Compurized Adaptive Testing to Enhance Mathematics and Science Student’t Achievement in Secondary Schools Mary Patrick Uko, I.O. Eluwa, Patrick J. Uko European Journal of Theoretical and Applied Sciences.2024; 2(4): 85. CrossRef
Comparison of real data and simulated data analysis of a stopping rule based on the standard error of measurement in computerized adaptive testing for medical examinations in Korea: a psychometric study Dong Gi Seo, Jeongwook Choi, Jinha Kim Journal of Educational Evaluation for Health Professions.2024; 21: 18. CrossRef
Presidential address: improving item validity and adopting computer-based testing, clinical skills assessments, artificial intelligence, and virtual reality in health professions licensing examinations in Korea Hyunjoo Pai Journal of Educational Evaluation for Health Professions.2023; 20: 8. CrossRef
Developing Computerized Adaptive Testing for a National Health Professionals Exam: An Attempt from Psychometric Simulations Lingling Xu, Zhehan Jiang, Yuting Han, Haiying Liang, Jinying Ouyang Perspectives on Medical Education.2023;[Epub] CrossRef
Optimizing Computer Adaptive Test Performance: A Hybrid Simulation Study to Customize the Administration Rules of the CAT-EyeQ in Macular Edema Patients T. Petra Rausch-Koster, Michiel A. J. Luijten, Frank D. Verbraak, Ger H. M. B. van Rens, Ruth M. A. van Nispen Translational Vision Science & Technology.2022; 11(11): 14. CrossRef
The accuracy and consistency of mastery for each content domain using the Rasch and deterministic inputs, noisy “and” gate diagnostic classification models: a simulation study and a real-world analysis using data from the Korean Medical Licensing Examinat Dong Gi Seo, Jae Kum Kim Journal of Educational Evaluation for Health Professions.2021; 18: 15. CrossRef
A Seed Usage Issue on Using catR for Simulation and the Solution Zhongmin Cui Applied Psychological Measurement.2020; 44(5): 409. CrossRef
Linear programming method to construct equated item sets for the implementation of periodical computer-based testing for the Korean Medical Licensing Examination Dong Gi Seo, Myeong Gi Kim, Na Hui Kim, Hye Sook Shin, Hyun Jung Kim Journal of Educational Evaluation for Health Professions.2018; 15: 26. CrossRef
Funding information of the article entitled “Post-hoc simulation study of computerized adaptive testing for the Korean Medical Licensing Examination” Dong Gi Seo, Jeongwook Choi Journal of Educational Evaluation for Health Professions.2018; 15: 27. CrossRef
Updates from 2018: Being indexed in Embase, becoming an affiliated journal of the World Federation for Medical Education, implementing an optional open data policy, adopting principles of transparency and best practice in scholarly publishing, and appreci Sun Huh Journal of Educational Evaluation for Health Professions.2018; 15: 36. CrossRef
Computerized adaptive testing (CAT) greatly improves measurement efficiency in high-stakes testing operations through the selection and administration of test items with the difficulty level that is most relevant to each individual test taker. This paper explains the 3 components of a conventional CAT item selection algorithm: test content balancing, the item selection criterion, and item exposure control. Several noteworthy methodologies underlie each component. The test script method and constrained CAT method are used for test content balancing. Item selection criteria include the maximized Fisher information criterion, the b-matching method, the astratification method, the weighted likelihood information criterion, the efficiency balanced information criterion, and the KullbackLeibler information criterion. The randomesque method, the Sympson-Hetter method, the unconditional and conditional multinomial methods, and the fade-away method are used for item exposure control. Several holistic approaches to CAT use automated test assembly methods, such as the shadow test approach and the weighted deviation model. Item usage and exposure count vary depending on the item selection criterion and exposure control method. Finally, other important factors to consider when determining an appropriate CAT design are the computer resources requirement, the size of item pools, and the test length. The logic of CAT is now being adopted in the field of adaptive learning, which integrates the learning aspect and the (formative) assessment aspect of education into a continuous, individualized learning experience. Therefore, the algorithms and technologies described in this review may be able to help medical health educators and high-stakes test developers to adopt CAT more actively and efficiently.
Citations
Citations to this article as recorded by
Utilizing Real-Time Test Data to Solve Attenuation Paradox in Computerized Adaptive Testing to Enhance Optimal Design Jyun-Hong Chen, Hsiu-Yi Chao Journal of Educational and Behavioral Statistics.2024; 49(4): 630. CrossRef
A Context-based Question Selection Model to Support the Adaptive Assessment of Learning: A study of online learning assessment in elementary schools in Indonesia Umi Laili Yuhana, Eko Mulyanto Yuniarno, Wenny Rahayu, Eric Pardede Education and Information Technologies.2024; 29(8): 9517. CrossRef
A shortened test is feasible: Evaluating a large-scale multistage adaptive English language assessment Shangchao Min, Kyoungwon Bishop Language Testing.2024; 41(3): 627. CrossRef
Efficiency of PROMIS MCAT Assessments for Orthopaedic Care Michael Bass, Scott Morris, Sheng Zhang Measurement: Interdisciplinary Research and Perspectives.2024; : 1. CrossRef
The Effects of Different Item Selection Methods on Test Information and Test Efficiency in Computer Adaptive Testing Merve ŞAHİN KÜRŞAD Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi.2023; 14(1): 33. CrossRef
Presidential address: improving item validity and adopting computer-based testing, clinical skills assessments, artificial intelligence, and virtual reality in health professions licensing examinations in Korea Hyunjoo Pai Journal of Educational Evaluation for Health Professions.2023; 20: 8. CrossRef
Remote Symptom Monitoring With Ecological Momentary Computerized Adaptive Testing: Pilot Cohort Study of a Platform for Frequent, Low-Burden, and Personalized Patient-Reported Outcome Measures Conrad Harrison, Ryan Trickett, Justin Wormald, Thomas Dobbs, Przemysław Lis, Vesselin Popov, David J Beard, Jeremy Rodrigues Journal of Medical Internet Research.2023; 25: e47179. CrossRef
Evaluating a Computerized Adaptive Testing Version of a Cognitive Ability Test Using a Simulation Study Ioannis Tsaousis, Georgios D. Sideridis, Hannan M. AlGhamdi Journal of Psychoeducational Assessment.2021; 39(8): 954. CrossRef
Developing Multistage Tests Using D-Scoring Method Kyung (Chris) T. Han, Dimiter M. Dimitrov, Faisal Al-Mashary Educational and Psychological Measurement.2019; 79(5): 988. CrossRef
Conducting simulation studies for computerized adaptive testing using SimulCAT: an instructional piece Kyung (Chris) Tyek Han Journal of Educational Evaluation for Health Professions.2018; 15: 20. CrossRef
Updates from 2018: Being indexed in Embase, becoming an affiliated journal of the World Federation for Medical Education, implementing an optional open data policy, adopting principles of transparency and best practice in scholarly publishing, and appreci Sun Huh Journal of Educational Evaluation for Health Professions.2018; 15: 36. CrossRef
Computerized adaptive testing (CAT) has been implemented in high-stakes examinations such as the National Council Licensure Examination-Registered Nurses in the United States since 1994. Subsequently, the National Registry of Emergency Medical Technicians in the United States adopted CAT for certifying emergency medical technicians in 2007. This was done with the goal of introducing the implementation of CAT for medical health licensing examinations. Most implementations of CAT are based on item response theory, which hypothesizes that both the examinee and items have their own characteristics that do not change. There are 5 steps for implementing CAT: first, determining whether the CAT approach is feasible for a given testing program; second, establishing an item bank; third, pretesting, calibrating, and linking item parameters via statistical analysis; fourth, determining the specification for the final CAT related to the 5 components of the CAT algorithm; and finally, deploying the final CAT after specifying all the necessary components. The 5 components of the CAT algorithm are as follows: item bank, starting item, item selection rule, scoring procedure, and termination criterion. CAT management includes content balancing, item analysis, item scoring, standard setting, practice analysis, and item bank updates. Remaining issues include the cost of constructing CAT platforms and deploying the computer technology required to build an item bank. In conclusion, in order to ensure more accurate estimations of examinees’ ability, CAT may be a good option for national licensing examinations. Measurement theory can support its implementation for high-stakes examinations.
Citations
Citations to this article as recorded by
Validation of the cognitive section of the Penn computerized adaptive test for neurocognitive and clinical psychopathology assessment (CAT-CCNB) Akira Di Sandro, Tyler M. Moore, Eirini Zoupou, Kelly P. Kennedy, Katherine C. Lopez, Kosha Ruparel, Lucky J. Njokweni, Sage Rush, Tarlan Daryoush, Olivia Franco, Alesandra Gorgone, Andrew Savino, Paige Didier, Daniel H. Wolf, Monica E. Calkins, J. Cobb S Brain and Cognition.2024; 174: 106117. CrossRef
Comparison of real data and simulated data analysis of a stopping rule based on the standard error of measurement in computerized adaptive testing for medical examinations in Korea: a psychometric study Dong Gi Seo, Jeongwook Choi, Jinha Kim Journal of Educational Evaluation for Health Professions.2024; 21: 18. CrossRef
The current utilization of the patient-reported outcome measurement information system (PROMIS) in isolated or combined total knee arthroplasty populations Puneet Gupta, Natalia Czerwonka, Sohil S. Desai, Alirio J. deMeireles, David P. Trofa, Alexander L. Neuwirth Knee Surgery & Related Research.2023;[Epub] CrossRef
Evaluating a Computerized Adaptive Testing Version of a Cognitive Ability Test Using a Simulation Study Ioannis Tsaousis, Georgios D. Sideridis, Hannan M. AlGhamdi Journal of Psychoeducational Assessment.2021; 39(8): 954. CrossRef
Accuracy and Efficiency of Web-based Assessment Platform (LIVECAT) for Computerized Adaptive Testing Do-Gyeong Kim, Dong-Gi Seo The Journal of Korean Institute of Information Technology.2020; 18(4): 77. CrossRef
Transformaciones en educación médica: innovaciones en la evaluación de los aprendizajes y avances tecnológicos (parte 2) Veronica Luna de la Luz, Patricia González-Flores Investigación en Educación Médica.2020; 9(34): 87. CrossRef
Introduction to the LIVECAT web-based computerized adaptive testing platform Dong Gi Seo, Jeongwook Choi Journal of Educational Evaluation for Health Professions.2020; 17: 27. CrossRef
Computerised adaptive testing accurately predicts CLEFT-Q scores by selecting fewer, more patient-focused questions Conrad J. Harrison, Daan Geerards, Maarten J. Ottenhof, Anne F. Klassen, Karen W.Y. Wong Riff, Marc C. Swan, Andrea L. Pusic, Chris J. Sidey-Gibbons Journal of Plastic, Reconstructive & Aesthetic Surgery.2019; 72(11): 1819. CrossRef
Presidential address: Preparing for permanent test centers and computerized adaptive testing Chang Hwi Kim Journal of Educational Evaluation for Health Professions.2018; 15: 1. CrossRef
Updates from 2018: Being indexed in Embase, becoming an affiliated journal of the World Federation for Medical Education, implementing an optional open data policy, adopting principles of transparency and best practice in scholarly publishing, and appreci Sun Huh Journal of Educational Evaluation for Health Professions.2018; 15: 36. CrossRef
Linear programming method to construct equated item sets for the implementation of periodical computer-based testing for the Korean Medical Licensing Examination Dong Gi Seo, Myeong Gi Kim, Na Hui Kim, Hye Sook Shin, Hyun Jung Kim Journal of Educational Evaluation for Health Professions.2018; 15: 26. CrossRef
The aim of this study was to investigate respondents’ satisfaction with smart device-based testing (SBT), as well as its convenience and advantages, in order to improve its implementation. The survey was conducted among 108 junior medical students at Kyungpook National University School of Medicine, Korea, who took a practice licensing examination using SBT in September 2015. The survey contained 28 items scored using a 5-point Likert scale. The items were divided into the following three categories: satisfaction with SBT administration, convenience of SBT features, and advantages of SBT compared to paper-and-pencil testing or computer-based testing. The reliability of the survey was 0.95. Of the three categories, the convenience of the SBT features received the highest mean (M) score (M= 3.75, standard deviation [SD]= 0.69), while the category of satisfaction with SBT received the lowest (M= 3.13, SD= 1.07). No statistically significant differences across these categories with respect to sex, age, or experience were observed. These results indicate that SBT was practical and effective to take and to administer.
Citations
Citations to this article as recorded by
(NON)COMPUTER-ORIENTED TESTING IN HIGHER EDUCATION: VIEWS OF THE PARTICIPANTS OF THE EDUCATIONAL PROCESS ON (IN)CONVENIENCE USING Volodymyr Starosta OPEN EDUCATIONAL E-ENVIRONMENT OF MODERN UNIVERSITY.2024; (16): 173. CrossRef
Survey of dental students’ perception of ubiquitous-based test (UBT) Hyoung Seok Shin, Jae-Hoon Kim The Journal of The Korean Dental Association.2024; 62(5): 270. CrossRef
Development and application of a mobile-based multimedia nursing competency evaluation system for nursing students: A mixed-method randomized controlled study Soyoung Jang, Eunyoung E. Suh Nurse Education in Practice.2022; 64: 103458. CrossRef
Effects of School-Based Exercise Program on Obesity and Physical Fitness of Urban Youth: A Quasi-Experiment Ji Hwan Song, Ho Hyun Song, Sukwon Kim Healthcare.2021; 9(3): 358. CrossRef
Development, Application, and Effectiveness of a Smart Device-based Nursing Competency Evaluation Test Soyoung Jang, Eunyoung E. Suh CIN: Computers, Informatics, Nursing.2021; 39(11): 634. CrossRef
Evaluation of Student Satisfaction with Ubiquitous-Based Tests in Women’s Health Nursing Course Mi-Young An, Yun-Mi Kim Healthcare.2021; 9(12): 1664. CrossRef
How to Deal with the Concept of Authorship and the Approval of an Institutional Review Board When Writing and Editing Journal Articles Sun Huh Laboratory Medicine and Quality Assurance.2020; 42(2): 63. CrossRef
Evaluation of usefulness of smart device-based testing: a survey study of Korean medical students Youngsup Christopher Lee, Oh Young Kwon, Ho Jin Hwang, Seok Hoon Ko Korean Journal of Medical Education.2020; 32(3): 213. CrossRef
Presidential address: Preparing for permanent test centers and computerized adaptive testing Chang Hwi Kim Journal of Educational Evaluation for Health Professions.2018; 15: 1. CrossRef
Journal Metrics of Infection & Chemotherapy and Current Scholarly Journal Publication Issues Sun Huh Infection & Chemotherapy.2018; 50(3): 219. CrossRef
The relationship of examinees’ individual characteristics and perceived acceptability of smart device-based testing to test scores on the practice test of the Korea Emergency Medicine Technician Licensing Examination Eun Young Lim, Mi Kyoung Yim, Sun Huh Journal of Educational Evaluation for Health Professions.2018; 15: 33. CrossRef
A variety of structured assessment tools for use in surgical training have been reported, but extant assessment tools often employ paper-based rating forms. Digital assessment forms for evaluating surgical skills could potentially offer advantages over paper-based forms, especially in complex assessment situations. In this paper, we report on the development of cross-platform digital assessment forms for use with multiple raters in order to facilitate the automatic processing of surgical skills assessments that include structured ratings. The FileMaker 13 platform was used to create a database containing the digital assessment forms, because this software has cross-platform functionality on both desktop computers and handheld devices. The database is hosted online, and the rating forms can therefore also be accessed through most modern web browsers. Cross-platform digital assessment forms were developed for the rating of surgical skills. The database platform used in this study was reasonably priced, intuitive for the user, and flexible. The forms have been provided online as free downloads that may serve as the basis for further development or as inspiration for future efforts. In conclusion, digital assessment forms can be used for the structured rating of surgical skills and have the potential to be especially useful in complex assessment situations with multiple raters, repeated assessments in various times and locations, and situations requiring substantial subsequent data processing or complex score calculations.