Computerized adaptive testing (CAT) has become a widely adopted test design for high-stakes licensing and certification exams, particularly in the health professions in the United States, due to its ability to tailor test difficulty in real time, reducing testing time while providing precise ability estimates. A key component of CAT is item response theory (IRT), which facilitates the dynamic selection of items based on examinees' ability levels during a test. Accurate estimation of item and ability parameters is essential for successful CAT implementation, necessitating convenient and reliable software to ensure precise parameter estimation. This paper introduces the irtQ R package (http://CRAN.R-project.org/), which simplifies IRT-based analysis and item calibration under unidimensional IRT models. While it does not directly simulate CAT, it provides essential tools to support CAT development, including parameter estimation using marginal maximum likelihood estimation via the expectation-maximization algorithm, pretest item calibration through fixed item parameter calibration and fixed ability parameter calibration methods, and examinee ability estimation. The package also enables users to compute item and test characteristic curves and information functions necessary for evaluating the psychometric properties of a test. This paper illustrates the key features of the irtQ package through examples using simulated datasets, demonstrating its utility in IRT applications such as test data analysis and ability scoring. By providing a user-friendly environment for IRT analysis, irtQ significantly enhances the capacity for efficient adaptive testing research and operations. Finally, the paper highlights additional core functionalities of irtQ, emphasizing its broader applicability to the development and operation of IRT-based assessments.
Purpose This study aimed to compare and evaluate the efficiency and accuracy of computerized adaptive testing (CAT) under 2 stopping rules (standard error of measurement [SEM]=0.3 and 0.25) using both real and simulated data in medical examinations in Korea.
Methods This study employed post-hoc simulation and real data analysis to explore the optimal stopping rule for CAT in medical examinations. The real data were obtained from the responses of 3rd-year medical students during examinations in 2020 at Hallym University College of Medicine. Simulated data were generated using estimated parameters from a real item bank in R. Outcome variables included the number of examinees’ passing or failing with SEM values of 0.25 and 0.30, the number of items administered, and the correlation. The consistency of real CAT result was evaluated by examining consistency of pass or fail based on a cut score of 0.0. The efficiency of all CAT designs was assessed by comparing the average number of items administered under both stopping rules.
Results Both SEM 0.25 and SEM 0.30 provided a good balance between accuracy and efficiency in CAT. The real data showed minimal differences in pass/fail outcomes between the 2 SEM conditions, with a high correlation (r=0.99) between ability estimates. The simulation results confirmed these findings, indicating similar average item numbers between real and simulated data.
Conclusion The findings suggest that both SEM 0.25 and 0.30 are effective termination criteria in the context of the Rasch model, balancing accuracy and efficiency in CAT.
Background In the Iranian context, no 360-degree evaluation tool has been developed to assess the performance of prehospital medical emergency students in clinical settings. This article describes the development of a 360-degree evaluation tool and presents its first psychometric evaluation.
Methods There were 2 steps in this study: step 1 involved developing the instrument (i.e., generating the items) and step 2 constituted the psychometric evaluation of the instrument. We performed exploratory and confirmatory factor analyses and also evaluated the instrument’s face, content, and convergent validity and reliability.
Results The instrument contains 55 items across 6 domains, including leadership, management, and teamwork (19 items), consciousness and responsiveness (14 items), clinical and interpersonal communication skills (8 items), integrity (7 items), knowledge and accountability (4 items), and loyalty and transparency (3 items). The instrument was confirmed to be a valid measure, as the 6 domains had eigenvalues over Kaiser’s criterion of 1 and in combination explained 60.1% of the variance (Bartlett’s test of sphericity [1,485]=19,867.99, P<0.01). Furthermore, this study provided evidence for the instrument’s convergent validity and internal consistency (α=0.98), suggesting its suitability for assessing student performance.
Conclusion We found good evidence for the validity and reliability of the instrument. Our instrument can be used to make future evaluations of student performance in the clinical setting more structured, transparent, informative, and comparable.
Purpose There is limited literature related to the assessment of electronic medical record (EMR)-related competencies. To address this gap, this study explored the feasibility of an EMR objective structured clinical examination (OSCE) station to evaluate medical students’ communication skills by psychometric analyses and standardized patients’ (SPs) perspectives on EMR use in an OSCE.
Methods An OSCE station that incorporated the use of an EMR was developed and pilot-tested in March 2020. Students’ communication skills were assessed by SPs and physician examiners. Students’ scores were compared between the EMR station and 9 other stations. A psychometric analysis, including item total correlation, was done. SPs participated in a post-OSCE focus group to discuss their perception of EMRs’ effect on communication.
Results Ninety-nine 3rd-year medical students participated in a 10-station OSCE that included the use of the EMR station. The EMR station had an acceptable item total correlation (0.217). Students who leveraged graphical displays in counseling received higher OSCE station scores from the SPs (P=0.041). The thematic analysis of SPs’ perceptions of students’ EMR use from the focus group revealed the following domains of themes: technology, communication, case design, ownership of health information, and timing of EMR usage.
Conclusion This study demonstrated the feasibility of incorporating EMR in assessing learner communication skills in an OSCE. The EMR station had acceptable psychometric characteristics. Some medical students were able to efficiently use the EMRs as an aid in patient counseling. Teaching students how to be patient-centered even in the presence of technology may promote engagement.
Purpose This study aimed to develop a test scale to measure the character qualities of medical students as a follow-up study on the 8 core character qualities revealed in a previous report.
Methods In total, 160 preliminary items were developed to measure 8 core character qualities. Twenty questions were assigned to each quality, and a questionnaire survey was conducted among 856 students in 5 medical schools in Korea. Using the partial credit model, polytomous item response theory analysis was carried out to analyze the goodness-of-fit, followed by exploratory factor analysis. Finally, confirmatory factor and reliability analyses were conducted with the final selected items.
Results The preliminary items for the 8 core character qualities were administered to the participants. Data from 767 students were included in the final analysis. Of the 160 preliminary items, 25 were removed by classical test theory analysis and 17 more by polytomous item response theory assessment. A total of 118 items and sub-factors were selected for exploratory factor analysis. Finally, 79 items were selected, and the validity and reliability were confirmed through confirmatory factor analysis and intra-item relevance analysis.
Conclusion The character qualities test scale developed through this study can be used to measure the character qualities corresponding to the educational goals and visions of individual medical schools in Korea. Furthermore, this measurement tool can serve as primary data for developing character qualities tools tailored to each medical school’s vision and educational goals.
Purpose This study investigated whether the reliability was acceptable when the number of cases in the objective structured clinical examination (OSCE) decreased from 12 to 8 using generalizability theory (GT).
Methods This psychometric study analyzed the OSCE data of 439 fourth-year medical students conducted in the Busan and Gyeongnam areas of South Korea from July 12 to 15, 2021. The generalizability study (G-study) considered 3 facets—students (p), cases (c), and items (i)—and designed the analysis as p×(i:c) due to items being nested in a case. The acceptable generalizability (G) coefficient was set to 0.70. The G-study and decision study (D-study) were performed using G String IV ver. 6.3.8 (Papawork, Hamilton, ON, Canada).
Results All G coefficients except for July 14 (0.69) were above 0.70. The major sources of variance components (VCs) were items nested in cases (i:c), from 51.34% to 57.70%, and residual error (pi:c), from 39.55% to 43.26%. The proportion of VCs in cases was negligible, ranging from 0% to 2.03%.
Conclusion The case numbers decreased in the 2021 Busan and Gyeongnam OSCE. However, the reliability was acceptable. In the D-study, reliability was maintained at 0.70 or higher if there were more than 21 items/case in 8 cases and more than 18 items/case in 9 cases. However, according to the G-study, increasing the number of items nested in cases rather than the number of cases could further improve reliability. The consortium needs to maintain a case bank with various items to implement a reliable blueprinting combination for the OSCE.
Citations
Citations to this article as recorded by
Applying the Generalizability Theory to Identify the Sources of Validity Evidence for the Quality of Communication Questionnaire Flávia Del Castanhel, Fernanda R. Fonseca, Luciana Bonnassis Burg, Leonardo Maia Nogueira, Getúlio Rodrigues de Oliveira Filho, Suely Grosseman American Journal of Hospice and Palliative Medicine®.2024; 41(7): 792. CrossRef
Purpose Diagnostic classification models (DCMs) were developed to identify the mastery or non-mastery of the attributes required for solving test items, but their application has been limited to very low-level attributes, and the accuracy and consistency of high-level attributes using DCMs have rarely been reported compared with classical test theory (CTT) and item response theory models. This paper compared the accuracy of high-level attribute mastery between deterministic inputs, noisy “and” gate (DINA) and Rasch models, along with sub-scores based on CTT.
Methods First, a simulation study explored the effects of attribute length (number of items per attribute) and the correlations among attributes with respect to the accuracy of mastery. Second, a real-data study examined model and item fit and investigated the consistency of mastery for each attribute among the 3 models using the 2017 Korean Medical Licensing Examination with 360 items.
Results Accuracy of mastery increased with a higher number of items measuring each attribute across all conditions. The DINA model was more accurate than the CTT and Rasch models for attributes with high correlations (>0.5) and few items. In the real-data analysis, the DINA and Rasch models generally showed better item fits and appropriate model fit. The consistency of mastery between the Rasch and DINA models ranged from 0.541 to 0.633 and the correlations of person attribute scores between the Rasch and DINA models ranged from 0.579 to 0.786.
Conclusion Although all 3 models provide a mastery decision for each examinee, the individual mastery profile using the DINA model provides more accurate decisions for attributes with high correlations than the CTT and Rasch models. The DINA model can also be directly applied to tests with complex structures, unlike the CTT and Rasch models, and it provides different diagnostic information from the CTT and Rasch models.
Citations
Citations to this article as recorded by
Stable Knowledge Tracing Using Causal Inference Jia Zhu, Xiaodong Ma, Changqin Huang IEEE Transactions on Learning Technologies.2024; 17: 124. CrossRef
Development of a character qualities test for medical students in Korea using polytomous item response theory and factor analysis: a preliminary scale development study Yera Hur, Dong Gi Seo Journal of Educational Evaluation for Health Professions.2023; 20: 20. CrossRef
Purpose The aim of this study was to develop and validate a scale to measure nursing students’ readiness for the flipped classroom in Sri Lanka.
Methods A literature review provided the theoretical framework for developing the Nursing Students’ Readiness for Flipped Classroom (NSR-FC) questionnaire. Five content experts evaluated the NSR-FC, and content validity indices (CVI) were calculated. Cross-sectional surveys among 355 undergraduate nursing students from 3 state universities in Sri Lanka were carried out to assess the psychometric properties of the NSR-FC. Principal component analysis (PCA, n=265), internal consistency (using the Cronbach α coefficient, n=265), and confirmatory factor analysis (CFA, n=90) were done to test construct validity and reliability.
Results Thirty-seven items were included in the NSR-FC for content validation, resulting in an average scale CVI of 0.94. Two items received item level CVI of less than 0.78. The factor structures of the 35 items were explored through PCA with orthogonal factor rotation, culminating in the identification of 5 factors. These factors were classified as technological readiness, environmental readiness, personal readiness, pedagogical readiness, and interpersonal readiness. The NSR-FC also showed an overall acceptable level of internal consistency (Cronbach α=0.9). CFA verified a 4-factor model (excluding the interpersonal readiness factor) and 20 items that achieved acceptable fit (standardized root mean square residual=0.08, root mean square error of approximation=0.08, comparative fit index=0.87, and χ2/degrees of freedom=1.57).
Conclusion The NSR-FC, as a 4-factor model, is an acceptable measurement scale for assessing nursing students’ readiness for the flipped classroom in terms of its construct validity and reliability.
Citations
Citations to this article as recorded by
Design and validation of a preliminary instrument to contextualize interactions through information technologies of health professionals José Fidencio López Luna, Eddie Nahúm Armendáriz Mireles, Marco Aurelio Nuño Maganda, Hiram Herrera Rivas, Rubén Machucho Cadena, Jorge Arturo Hernández Almazán Health Informatics Journal.2024;[Epub] CrossRef
AI readiness scale for teachers: Development and validation Mehmet Ramazanoglu, Tayfun Akın Education and Information Technologies.2024;[Epub] CrossRef
Content validity of the Constructivist Learning in Higher Education Settings (CLHES) scale in the context of the flipped classroom in higher education Turki Mesfer Alqahtani, Farrah Dina Yusop, Siti Hajar Halili Humanities and Social Sciences Communications.2023;[Epub] CrossRef
The intensivist's assessment of gastrointestinal function: A pilot study Varsha M. Asrani, Colin McArthur, Ian Bissett, John A. Windsor Australian Critical Care.2022; 35(6): 636. CrossRef
Psychometric evidence of a perception scale about covid-19 vaccination process in Peruvian dentists: a preliminary validation César F. Cayo-Rojas, Nancy Córdova-Limaylla, Gissela Briceño-Vergel, Marysela Ladera-Castañeda, Hernán Cachay-Criado, Carlos López-Gurreonero, Alberto Cornejo-Pinto, Luis Cervantes-Ganoza BMC Health Services Research.2022;[Epub] CrossRef
Implementation of a Web-Based Educational Intervention for Promoting Flipped Classroom Pedagogy: A Mixed-Methods Study Punithalingam Youhasan, Mataroria P. Lyndon, Yan Chen, Marcus A. Henning Medical Science Educator.2022; 33(1): 91. CrossRef
Assess the feasibility of flipped classroom pedagogy in undergraduate nursing education in Sri Lanka: A mixed-methods study Punithalingam Youhasan, Yan Chen, Mataroria Lyndon, Marcus A. Henning, Gwo-Jen Hwang PLOS ONE.2021; 16(11): e0259003. CrossRef
Newly appointed medical faculty members’ self-evaluation of their educational roles at the Catholic University of Korea College of Medicine in 2020 and 2021: a cross-sectional survey-based study Sun Kim, A Ra Cho, Chul Woon Chung Journal of Educational Evaluation for Health Professions.2021; 18: 28. CrossRef
This study introduces LIVECAT, a web-based computerized adaptive testing platform. This platform provides many functions, including writing item content, managing an item bank, creating and administering a test, reporting test results, and providing information about a test and examinees. The LIVECAT provides examination administrators with an easy and flexible environment for composing and managing examinations. It is available at http://www.thecatkorea.com/. Several tools were used to program LIVECAT, as follows: operating system, Amazon Linux; web server, nginx 1.18; WAS, Apache Tomcat 8.5; database, Amazon RDMS—Maria DB; and languages, JAVA8, HTML5/CSS, Javascript, and jQuery. The LIVECAT platform can be used to implement several item response theory (IRT) models such as the Rasch and 1-, 2-, 3-parameter logistic models. The administrator can choose a specific model of test construction in LIVECAT. Multimedia data such as images, audio files, and movies can be uploaded to items in LIVECAT. Two scoring methods (maximum likelihood estimation and expected a posteriori) are available in LIVECAT and the maximum Fisher information item selection method is applied to every IRT model in LIVECAT. The LIVECAT platform showed equal or better performance compared with a conventional test platform. The LIVECAT platform enables users without psychometric expertise to easily implement and perform computerized adaptive testing at their institutions. The most recent LIVECAT version only provides a dichotomous item response model and the basic components of CAT. Shortly, LIVECAT will include advanced functions, such as polytomous item response models, weighted likelihood estimation method, and content balancing method.
Citations
Citations to this article as recorded by
Comparison of real data and simulated data analysis of a stopping rule based on the standard error of measurement in computerized adaptive testing for medical examinations in Korea: a psychometric study Dong Gi Seo, Jeongwook Choi, Jinha Kim Journal of Educational Evaluation for Health Professions.2024; 21: 18. CrossRef
Educational Technology in the University: A Comprehensive Look at the Role of a Professor and Artificial Intelligence Cheolkyu Shin, Dong Gi Seo, Seoyeon Jin, Soo Hwa Lee, Hyun Je Park IEEE Access.2024; 12: 116727. CrossRef
The irtQ R package: a user-friendly tool for item response theory-based test data analysis and calibration Hwanggyu Lim, Kyungseok Kang Journal of Educational Evaluation for Health Professions.2024; 21: 23. CrossRef
Presidential address: improving item validity and adopting computer-based testing, clinical skills assessments, artificial intelligence, and virtual reality in health professions licensing examinations in Korea Hyunjoo Pai Journal of Educational Evaluation for Health Professions.2023; 20: 8. CrossRef
Patient-reported outcome measures in cancer care: Integration with computerized adaptive testing Minyu Liang, Zengjie Ye Asia-Pacific Journal of Oncology Nursing.2023; 10(12): 100323. CrossRef
Development of a character qualities test for medical students in Korea using polytomous item response theory and factor analysis: a preliminary scale development study Yera Hur, Dong Gi Seo Journal of Educational Evaluation for Health Professions.2023; 20: 20. CrossRef
Purpose Moral courage refers to the conviction to take action on one’s ethical beliefs despite the risk of adverse consequences. This study aimed to evaluate correlations between social desirability scores and moral courage scores among medical residents and fellows, and to explore gender- and specialty-based differences in moral courage scores.
Methods In April 2018, the Moral Courage Scale for Physicians (MCSP), the Professional Moral Courage (PMC) scale and the Marlowe-Crowne scale to measure social desirability were administered to 87 medical residents from Hospital Alemán in Buenos Aires, Argentina.
Results The Cronbach α coefficients were 0.78, 0.74, and 0.81 for the Marlowe-Crowne, MCSP, and PMC scales, respectively. Correlation analysis showed that moral courage scores were weakly correlated with social desirability scores, while both moral courage scales were strongly correlated with each other. Physicians who were training in a surgical specialty showed lower moral courage scores than nonsurgical specialty trainees, and men from any specialty tended to have lower moral courage scores than women. Specifically, individuals training in surgical specialties ranked lower on assessments of the “multiple values,” “endurance of threats,” and “going beyond compliance” dimensions of the PMC scale. Men tended to rank lower than women on the “multiple values,” “moral goals,” and “endurance of threats” dimensions.
Conclusion There was a poor correlation between 2 validated moral courage scores and social desirability scores among medical residents and fellows in Argentina. Conversely, both moral courage tools showed a close correlation and concordance, suggesting that these scales are reasonably interchangeable.
Citations
Citations to this article as recorded by
Moral courage level of nurses: a systematic review and meta-analysis Hang Li, JuLan Guo, ZhiRong Ren, Dingxi Bai, Jing Yang, Wei Wang, Han Fu, Qing Yang, Chaoming Hou, Jing Gao BMC Nursing.2024;[Epub] CrossRef
CESARET NEDİR? CESARET TANIMLARININ İÇERİK ANALİZİ İbrahim Sani MERT Uluslararası İktisadi ve İdari Bilimler Dergisi.2023; 9(2): 126. CrossRef
The Impact of Active Bystander Training on Officer Confidence and Ability to Address Ethical Challenges Travis Taniguchi, Heather Vovak, Gary Cordner, Karen Amendola, Yukun Yang, Katherine Hoogesteyn, Martin Bartness Policing: A Journal of Policy and Practice.2022; 16(3): 508. CrossRef
The Role of Academic Medicine in the Call for Justice Danielle Laraque-Arena, Ilene Fennoy, Leslie L. Davidson Journal of the National Medical Association.2021; 113(4): 388. CrossRef
Can Careproviders Still Bond with Patients after They Are Turned Down for a Treatment They Need? Edmund G. Howe The Journal of Clinical Ethics.2021; 32(3): 185. CrossRef
Computerized adaptive testing (CAT) has been implemented in high-stakes examinations such as the National Council Licensure Examination-Registered Nurses in the United States since 1994. Subsequently, the National Registry of Emergency Medical Technicians in the United States adopted CAT for certifying emergency medical technicians in 2007. This was done with the goal of introducing the implementation of CAT for medical health licensing examinations. Most implementations of CAT are based on item response theory, which hypothesizes that both the examinee and items have their own characteristics that do not change. There are 5 steps for implementing CAT: first, determining whether the CAT approach is feasible for a given testing program; second, establishing an item bank; third, pretesting, calibrating, and linking item parameters via statistical analysis; fourth, determining the specification for the final CAT related to the 5 components of the CAT algorithm; and finally, deploying the final CAT after specifying all the necessary components. The 5 components of the CAT algorithm are as follows: item bank, starting item, item selection rule, scoring procedure, and termination criterion. CAT management includes content balancing, item analysis, item scoring, standard setting, practice analysis, and item bank updates. Remaining issues include the cost of constructing CAT platforms and deploying the computer technology required to build an item bank. In conclusion, in order to ensure more accurate estimations of examinees’ ability, CAT may be a good option for national licensing examinations. Measurement theory can support its implementation for high-stakes examinations.
Citations
Citations to this article as recorded by
Validation of the cognitive section of the Penn computerized adaptive test for neurocognitive and clinical psychopathology assessment (CAT-CCNB) Akira Di Sandro, Tyler M. Moore, Eirini Zoupou, Kelly P. Kennedy, Katherine C. Lopez, Kosha Ruparel, Lucky J. Njokweni, Sage Rush, Tarlan Daryoush, Olivia Franco, Alesandra Gorgone, Andrew Savino, Paige Didier, Daniel H. Wolf, Monica E. Calkins, J. Cobb S Brain and Cognition.2024; 174: 106117. CrossRef
Comparison of real data and simulated data analysis of a stopping rule based on the standard error of measurement in computerized adaptive testing for medical examinations in Korea: a psychometric study Dong Gi Seo, Jeongwook Choi, Jinha Kim Journal of Educational Evaluation for Health Professions.2024; 21: 18. CrossRef
The current utilization of the patient-reported outcome measurement information system (PROMIS) in isolated or combined total knee arthroplasty populations Puneet Gupta, Natalia Czerwonka, Sohil S. Desai, Alirio J. deMeireles, David P. Trofa, Alexander L. Neuwirth Knee Surgery & Related Research.2023;[Epub] CrossRef
Evaluating a Computerized Adaptive Testing Version of a Cognitive Ability Test Using a Simulation Study Ioannis Tsaousis, Georgios D. Sideridis, Hannan M. AlGhamdi Journal of Psychoeducational Assessment.2021; 39(8): 954. CrossRef
Accuracy and Efficiency of Web-based Assessment Platform (LIVECAT) for Computerized Adaptive Testing Do-Gyeong Kim, Dong-Gi Seo The Journal of Korean Institute of Information Technology.2020; 18(4): 77. CrossRef
Transformaciones en educación médica: innovaciones en la evaluación de los aprendizajes y avances tecnológicos (parte 2) Veronica Luna de la Luz, Patricia González-Flores Investigación en Educación Médica.2020; 9(34): 87. CrossRef
Introduction to the LIVECAT web-based computerized adaptive testing platform Dong Gi Seo, Jeongwook Choi Journal of Educational Evaluation for Health Professions.2020; 17: 27. CrossRef
Computerised adaptive testing accurately predicts CLEFT-Q scores by selecting fewer, more patient-focused questions Conrad J. Harrison, Daan Geerards, Maarten J. Ottenhof, Anne F. Klassen, Karen W.Y. Wong Riff, Marc C. Swan, Andrea L. Pusic, Chris J. Sidey-Gibbons Journal of Plastic, Reconstructive & Aesthetic Surgery.2019; 72(11): 1819. CrossRef
Presidential address: Preparing for permanent test centers and computerized adaptive testing Chang Hwi Kim Journal of Educational Evaluation for Health Professions.2018; 15: 1. CrossRef
Updates from 2018: Being indexed in Embase, becoming an affiliated journal of the World Federation for Medical Education, implementing an optional open data policy, adopting principles of transparency and best practice in scholarly publishing, and appreci Sun Huh Journal of Educational Evaluation for Health Professions.2018; 15: 36. CrossRef
Linear programming method to construct equated item sets for the implementation of periodical computer-based testing for the Korean Medical Licensing Examination Dong Gi Seo, Myeong Gi Kim, Na Hui Kim, Hye Sook Shin, Hyun Jung Kim Journal of Educational Evaluation for Health Professions.2018; 15: 26. CrossRef
Purpose Prior descriptions of the psychometric properties of validated knowledge assessment tools designed to determine Emergency medicine (EM) residents understanding of physiologic and clinical concepts related to mechanical ventilation are lacking. In this setting, we have performed this study to describe the psychometric and performance properties of a novel knowledge assessment tool that measures EM residents’ knowledge of topics in mechanical ventilation.
Methods Results from a multicenter, prospective, survey study involving 219 EM residents from 8 academic hospitals in northeastern United States were analyzed to quantify reliability, item difficulty, and item discrimination of each of the 9 questions included in the knowledge assessment tool for 3 weeks, beginning in January 2013.
Results The response rate for residents completing the knowledge assessment tool was 68.6% (214 out of 312 EM residents). Reliability was assessed by both Cronbach’s alpha coefficient (0.6293) and the Spearman-Brown coefficient (0.6437). Item difficulty ranged from 0.39 to 0.96, with a mean item difficulty of 0.75 for all 9 questions. Uncorrected item discrimination values ranged from 0.111 to 0.556. Corrected item-total correlations were determined by removing the question being assessed from analysis, resulting in a range of item discrimination from 0.139 to 0.498.
Conclusion Reliability, item difficulty and item discrimination were within satisfactory ranges in this study, demonstrating acceptable psychometric properties of this knowledge assessment tool. This assessment indicates that this knowledge assessment tool is sufficiently rigorous for use in future research studies or for assessment of EM residents for evaluative purposes.
Citations
Citations to this article as recorded by
Comparison of three methods for teaching mechanical ventilation in an emergency setting to sixth-year medical students: a randomized trial Fernando Sabia Tallo, Letícia Sandre Vendrame, André Luciano Baitello Revista da Associação Médica Brasileira.2020; 66(10): 1409. CrossRef
Critical Appraisal of Emergency Medicine Educational Research: The Best Publications of 2016 Nicole M. Dubosh, Jaime Jordan, Lalena M. Yarris, Edward Ullman, Joshua Kornegay, Daniel Runde, Amy Miller Juve, Jonathan Fisher, Teresa Chan AEM Education and Training.2019; 3(1): 58. CrossRef
Mechanical Ventilation Training During Graduate Medical Education: Perspectives and Review of the Literature Jonathan M. Keller, Dru Claar, Juliana Carvalho Ferreira, David C. Chu, Tanzib Hossain, William Graham Carlos, Jeffrey A. Gold, Stephanie A. Nonas, Nitin Seam Journal of Graduate Medical Education.2019; 11(4): 389. CrossRef
Development and validation of a questionnaire to assess the knowledge of mechanical ventilation in urgent care among students in their last-year medical course in Brazil Fernando Sabia Tallo, Simone de Campos Vieira Abib, Andre Luciano Baitello, Renato Delascio Lopes Clinics.2019; 74: e663. CrossRef
Purpose The aim of this paper is to provide evidence for the validity and reliability of a questionnaire for assessing the implementation of problem-based learning (PBL). This questionnaire was developed to assess the quality of PBL implementation from the perspective of medical school graduates. Methods: A confirmatory factor analysis was conducted to assess the validity of the questionnaire. The analysis was based on a survey of 225 graduates of a problem-based medical school in Indonesia. Results: The results showed that the confirmatory factor analysis model had a good fit to the data. Further, the values of the standardized loading estimates, the squared inter-construct correlations, the average variances extracted, and the composite reliabilities all provided evidence of construct validity. Conclusion: The PBL implementation questionnaire was found to be valid and reliable, making it suitable for evaluation purposes.
Citations
Citations to this article as recorded by
Changes in Learning Outcomes of Students Participating in Problem-Based Learning for the First Time: A Case Study of a Financial Management Course Yung-Chuan Lee The Asia-Pacific Education Researcher.2024;[Epub] CrossRef
After briefly reviewing theories of standard setting we analyzed the problems of the current cut scores. Then, we reported the results of need assessment on the standard setting among medical educators and psychometricians. Analyses of the standard setting methods of developed countries were reported as well. Based on these findings, we suggested the Bookmark and the modified Angoff methods as alternative methods for setting standard. Possible problems and challenges were discussed when these methods were applied to the National Medical Licensing Examination.
Citations
Citations to this article as recorded by
Predicting medical graduates’ clinical performance using national competency examination results in Indonesia Prattama Santoso Utomo, Amandha Boy Timor Randita, Rilani Riskiyana, Felicia Kurniawan, Irwin Aras, Cholis Abrori, Gandes Retno Rahayu BMC Medical Education.2022;[Epub] CrossRef
Possibility of independent use of the yes/no Angoff and Hofstee methods for the standard setting of the Korean Medical Licensing Examination written test: a descriptive study Do-Hwan Kim, Ye Ji Kang, Hoon-Ki Park Journal of Educational Evaluation for Health Professions.2022; 19: 33. CrossRef
Applying the Bookmark method to medical education: Standard setting for an aseptic technique station Monica L. Lypson, Steven M. Downing, Larry D. Gruppen, Rachel Yudkowsky Medical Teacher.2013; 35(7): 581. CrossRef
Standard Setting in Student Assessment: Is a Defensible Method Yet to Come? A Barman Annals of the Academy of Medicine, Singapore.2008; 37(11): 957. CrossRef