Skip Navigation
Skip to contents

JEEHP : Journal of Educational Evaluation for Health Professions

OPEN ACCESS
SEARCH
Search

Search

Page Path
HOME > Search
14 "Psychometrics"
Filter
Filter
Article category
Keywords
Publication year
Authors
Funded articles
Software report
The irtQ R package: a user-friendly tool for item response theory-based test data analysis and calibration  
Hwanggyu Lim, Kyungseok Kang
J Educ Eval Health Prof. 2024;21:23.   Published online September 12, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.23
  • 769 View
  • 189 Download
AbstractAbstract PDFSupplementary Material
Computerized adaptive testing (CAT) has become a widely adopted test design for high-stakes licensing and certification exams, particularly in the health professions in the United States, due to its ability to tailor test difficulty in real time, reducing testing time while providing precise ability estimates. A key component of CAT is item response theory (IRT), which facilitates the dynamic selection of items based on examinees' ability levels during a test. Accurate estimation of item and ability parameters is essential for successful CAT implementation, necessitating convenient and reliable software to ensure precise parameter estimation. This paper introduces the irtQ R package (http://CRAN.R-project.org/), which simplifies IRT-based analysis and item calibration under unidimensional IRT models. While it does not directly simulate CAT, it provides essential tools to support CAT development, including parameter estimation using marginal maximum likelihood estimation via the expectation-maximization algorithm, pretest item calibration through fixed item parameter calibration and fixed ability parameter calibration methods, and examinee ability estimation. The package also enables users to compute item and test characteristic curves and information functions necessary for evaluating the psychometric properties of a test. This paper illustrates the key features of the irtQ package through examples using simulated datasets, demonstrating its utility in IRT applications such as test data analysis and ability scoring. By providing a user-friendly environment for IRT analysis, irtQ significantly enhances the capacity for efficient adaptive testing research and operations. Finally, the paper highlights additional core functionalities of irtQ, emphasizing its broader applicability to the development and operation of IRT-based assessments.
Research articles
Comparison of real data and simulated data analysis of a stopping rule based on the standard error of measurement in computerized adaptive testing for medical examinations in Korea: a psychometric study  
Dong Gi Seo, Jeongwook Choi, Jinha Kim
J Educ Eval Health Prof. 2024;21:18.   Published online July 9, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.18
  • 830 View
  • 301 Download
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to compare and evaluate the efficiency and accuracy of computerized adaptive testing (CAT) under 2 stopping rules (standard error of measurement [SEM]=0.3 and 0.25) using both real and simulated data in medical examinations in Korea.
Methods
This study employed post-hoc simulation and real data analysis to explore the optimal stopping rule for CAT in medical examinations. The real data were obtained from the responses of 3rd-year medical students during examinations in 2020 at Hallym University College of Medicine. Simulated data were generated using estimated parameters from a real item bank in R. Outcome variables included the number of examinees’ passing or failing with SEM values of 0.25 and 0.30, the number of items administered, and the correlation. The consistency of real CAT result was evaluated by examining consistency of pass or fail based on a cut score of 0.0. The efficiency of all CAT designs was assessed by comparing the average number of items administered under both stopping rules.
Results
Both SEM 0.25 and SEM 0.30 provided a good balance between accuracy and efficiency in CAT. The real data showed minimal differences in pass/fail outcomes between the 2 SEM conditions, with a high correlation (r=0.99) between ability estimates. The simulation results confirmed these findings, indicating similar average item numbers between real and simulated data.
Conclusion
The findings suggest that both SEM 0.25 and 0.30 are effective termination criteria in the context of the Rasch model, balancing accuracy and efficiency in CAT.
Development and psychometric evaluation of a 360-degree evaluation instrument to assess medical students’ performance in clinical settings at the emergency medicine department in Iran: a methodological study  
Golnaz Azami, Sanaz Aazami, Boshra Ebrahimy, Payam Emami
J Educ Eval Health Prof. 2024;21:7.   Published online April 1, 2024
DOI: https://doi.org/10.3352/jeehp.2024.21.7
  • 1,358 View
  • 256 Download
AbstractAbstract PDFSupplementary Material
Background
In the Iranian context, no 360-degree evaluation tool has been developed to assess the performance of prehospital medical emergency students in clinical settings. This article describes the development of a 360-degree evaluation tool and presents its first psychometric evaluation.
Methods
There were 2 steps in this study: step 1 involved developing the instrument (i.e., generating the items) and step 2 constituted the psychometric evaluation of the instrument. We performed exploratory and confirmatory factor analyses and also evaluated the instrument’s face, content, and convergent validity and reliability.
Results
The instrument contains 55 items across 6 domains, including leadership, management, and teamwork (19 items), consciousness and responsiveness (14 items), clinical and interpersonal communication skills (8 items), integrity (7 items), knowledge and accountability (4 items), and loyalty and transparency (3 items). The instrument was confirmed to be a valid measure, as the 6 domains had eigenvalues over Kaiser’s criterion of 1 and in combination explained 60.1% of the variance (Bartlett’s test of sphericity [1,485]=19,867.99, P<0.01). Furthermore, this study provided evidence for the instrument’s convergent validity and internal consistency (α=0.98), suggesting its suitability for assessing student performance.
Conclusion
We found good evidence for the validity and reliability of the instrument. Our instrument can be used to make future evaluations of student performance in the clinical setting more structured, transparent, informative, and comparable.
Experience of introducing an electronic health records station in an objective structured clinical examination to evaluate medical students’ communication skills in Canada: a descriptive study  
Kuan-chin Jean Chen, Ilona Bartman, Debra Pugh, David Topps, Isabelle Desjardins, Melissa Forgie, Douglas Archibald
J Educ Eval Health Prof. 2023;20:22.   Published online July 4, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.22
  • 3,722 View
  • 150 Download
  • 1 Web of Science
AbstractAbstract PDFSupplementary Material
Purpose
There is limited literature related to the assessment of electronic medical record (EMR)-related competencies. To address this gap, this study explored the feasibility of an EMR objective structured clinical examination (OSCE) station to evaluate medical students’ communication skills by psychometric analyses and standardized patients’ (SPs) perspectives on EMR use in an OSCE.
Methods
An OSCE station that incorporated the use of an EMR was developed and pilot-tested in March 2020. Students’ communication skills were assessed by SPs and physician examiners. Students’ scores were compared between the EMR station and 9 other stations. A psychometric analysis, including item total correlation, was done. SPs participated in a post-OSCE focus group to discuss their perception of EMRs’ effect on communication.
Results
Ninety-nine 3rd-year medical students participated in a 10-station OSCE that included the use of the EMR station. The EMR station had an acceptable item total correlation (0.217). Students who leveraged graphical displays in counseling received higher OSCE station scores from the SPs (P=0.041). The thematic analysis of SPs’ perceptions of students’ EMR use from the focus group revealed the following domains of themes: technology, communication, case design, ownership of health information, and timing of EMR usage.
Conclusion
This study demonstrated the feasibility of incorporating EMR in assessing learner communication skills in an OSCE. The EMR station had acceptable psychometric characteristics. Some medical students were able to efficiently use the EMRs as an aid in patient counseling. Teaching students how to be patient-centered even in the presence of technology may promote engagement.
Development of a character qualities test for medical students in Korea using polytomous item response theory and factor analysis: a preliminary scale development study  
Yera Hur, Dong Gi Seo
J Educ Eval Health Prof. 2023;20:20.   Published online June 26, 2023
DOI: https://doi.org/10.3352/jeehp.2023.20.20
  • 1,903 View
  • 123 Download
AbstractAbstract PDFSupplementary Material
Purpose
This study aimed to develop a test scale to measure the character qualities of medical students as a follow-up study on the 8 core character qualities revealed in a previous report.
Methods
In total, 160 preliminary items were developed to measure 8 core character qualities. Twenty questions were assigned to each quality, and a questionnaire survey was conducted among 856 students in 5 medical schools in Korea. Using the partial credit model, polytomous item response theory analysis was carried out to analyze the goodness-of-fit, followed by exploratory factor analysis. Finally, confirmatory factor and reliability analyses were conducted with the final selected items.
Results
The preliminary items for the 8 core character qualities were administered to the participants. Data from 767 students were included in the final analysis. Of the 160 preliminary items, 25 were removed by classical test theory analysis and 17 more by polytomous item response theory assessment. A total of 118 items and sub-factors were selected for exploratory factor analysis. Finally, 79 items were selected, and the validity and reliability were confirmed through confirmatory factor analysis and intra-item relevance analysis.
Conclusion
The character qualities test scale developed through this study can be used to measure the character qualities corresponding to the educational goals and visions of individual medical schools in Korea. Furthermore, this measurement tool can serve as primary data for developing character qualities tools tailored to each medical school’s vision and educational goals.
Acceptability of the 8-case objective structured clinical examination of medical students in Korea using generalizability theory: a reliability study  
Song Yi Park, Sang-Hwa Lee, Min-Jeong Kim, Ki-Hwan Ji, Ji Ho Ryu
J Educ Eval Health Prof. 2022;19:26.   Published online September 8, 2022
DOI: https://doi.org/10.3352/jeehp.2022.19.26
  • 2,948 View
  • 221 Download
  • 1 Web of Science
  • 1 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
This study investigated whether the reliability was acceptable when the number of cases in the objective structured clinical examination (OSCE) decreased from 12 to 8 using generalizability theory (GT).
Methods
This psychometric study analyzed the OSCE data of 439 fourth-year medical students conducted in the Busan and Gyeongnam areas of South Korea from July 12 to 15, 2021. The generalizability study (G-study) considered 3 facets—students (p), cases (c), and items (i)—and designed the analysis as p×(i:c) due to items being nested in a case. The acceptable generalizability (G) coefficient was set to 0.70. The G-study and decision study (D-study) were performed using G String IV ver. 6.3.8 (Papawork, Hamilton, ON, Canada).
Results
All G coefficients except for July 14 (0.69) were above 0.70. The major sources of variance components (VCs) were items nested in cases (i:c), from 51.34% to 57.70%, and residual error (pi:c), from 39.55% to 43.26%. The proportion of VCs in cases was negligible, ranging from 0% to 2.03%.
Conclusion
The case numbers decreased in the 2021 Busan and Gyeongnam OSCE. However, the reliability was acceptable. In the D-study, reliability was maintained at 0.70 or higher if there were more than 21 items/case in 8 cases and more than 18 items/case in 9 cases. However, according to the G-study, increasing the number of items nested in cases rather than the number of cases could further improve reliability. The consortium needs to maintain a case bank with various items to implement a reliable blueprinting combination for the OSCE.

Citations

Citations to this article as recorded by  
  • Applying the Generalizability Theory to Identify the Sources of Validity Evidence for the Quality of Communication Questionnaire
    Flávia Del Castanhel, Fernanda R. Fonseca, Luciana Bonnassis Burg, Leonardo Maia Nogueira, Getúlio Rodrigues de Oliveira Filho, Suely Grosseman
    American Journal of Hospice and Palliative Medicine®.2024; 41(7): 792.     CrossRef
The accuracy and consistency of mastery for each content domain using the Rasch and deterministic inputs, noisy “and” gate diagnostic classification models: a simulation study and a real-world analysis using data from the Korean Medical Licensing Examination  
Dong Gi Seo, Jae Kum Kim
J Educ Eval Health Prof. 2021;18:15.   Published online July 5, 2021
DOI: https://doi.org/10.3352/jeehp.2021.18.15
  • 4,965 View
  • 295 Download
  • 2 Web of Science
  • 2 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
Diagnostic classification models (DCMs) were developed to identify the mastery or non-mastery of the attributes required for solving test items, but their application has been limited to very low-level attributes, and the accuracy and consistency of high-level attributes using DCMs have rarely been reported compared with classical test theory (CTT) and item response theory models. This paper compared the accuracy of high-level attribute mastery between deterministic inputs, noisy “and” gate (DINA) and Rasch models, along with sub-scores based on CTT.
Methods
First, a simulation study explored the effects of attribute length (number of items per attribute) and the correlations among attributes with respect to the accuracy of mastery. Second, a real-data study examined model and item fit and investigated the consistency of mastery for each attribute among the 3 models using the 2017 Korean Medical Licensing Examination with 360 items.
Results
Accuracy of mastery increased with a higher number of items measuring each attribute across all conditions. The DINA model was more accurate than the CTT and Rasch models for attributes with high correlations (>0.5) and few items. In the real-data analysis, the DINA and Rasch models generally showed better item fits and appropriate model fit. The consistency of mastery between the Rasch and DINA models ranged from 0.541 to 0.633 and the correlations of person attribute scores between the Rasch and DINA models ranged from 0.579 to 0.786.
Conclusion
Although all 3 models provide a mastery decision for each examinee, the individual mastery profile using the DINA model provides more accurate decisions for attributes with high correlations than the CTT and Rasch models. The DINA model can also be directly applied to tests with complex structures, unlike the CTT and Rasch models, and it provides different diagnostic information from the CTT and Rasch models.

Citations

Citations to this article as recorded by  
  • Stable Knowledge Tracing Using Causal Inference
    Jia Zhu, Xiaodong Ma, Changqin Huang
    IEEE Transactions on Learning Technologies.2024; 17: 124.     CrossRef
  • Development of a character qualities test for medical students in Korea using polytomous item response theory and factor analysis: a preliminary scale development study
    Yera Hur, Dong Gi Seo
    Journal of Educational Evaluation for Health Professions.2023; 20: 20.     CrossRef
Development and validation of a measurement scale to assess nursing students’ readiness for the flipped classroom in Sri Lanka  
Punithalingam Youhasan, Yan Chen, Mataroria Lyndon, Marcus Alexander Henning
J Educ Eval Health Prof. 2020;17:41.   Published online December 14, 2020
DOI: https://doi.org/10.3352/jeehp.2020.17.41
  • 7,053 View
  • 270 Download
  • 9 Web of Science
  • 8 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
The aim of this study was to develop and validate a scale to measure nursing students’ readiness for the flipped classroom in Sri Lanka.
Methods
A literature review provided the theoretical framework for developing the Nursing Students’ Readiness for Flipped Classroom (NSR-FC) questionnaire. Five content experts evaluated the NSR-FC, and content validity indices (CVI) were calculated. Cross-sectional surveys among 355 undergraduate nursing students from 3 state universities in Sri Lanka were carried out to assess the psychometric properties of the NSR-FC. Principal component analysis (PCA, n=265), internal consistency (using the Cronbach α coefficient, n=265), and confirmatory factor analysis (CFA, n=90) were done to test construct validity and reliability.
Results
Thirty-seven items were included in the NSR-FC for content validation, resulting in an average scale CVI of 0.94. Two items received item level CVI of less than 0.78. The factor structures of the 35 items were explored through PCA with orthogonal factor rotation, culminating in the identification of 5 factors. These factors were classified as technological readiness, environmental readiness, personal readiness, pedagogical readiness, and interpersonal readiness. The NSR-FC also showed an overall acceptable level of internal consistency (Cronbach α=0.9). CFA verified a 4-factor model (excluding the interpersonal readiness factor) and 20 items that achieved acceptable fit (standardized root mean square residual=0.08, root mean square error of approximation=0.08, comparative fit index=0.87, and χ2/degrees of freedom=1.57).
Conclusion
The NSR-FC, as a 4-factor model, is an acceptable measurement scale for assessing nursing students’ readiness for the flipped classroom in terms of its construct validity and reliability.

Citations

Citations to this article as recorded by  
  • Design and validation of a preliminary instrument to contextualize interactions through information technologies of health professionals
    José Fidencio López Luna, Eddie Nahúm Armendáriz Mireles, Marco Aurelio Nuño Maganda, Hiram Herrera Rivas, Rubén Machucho Cadena, Jorge Arturo Hernández Almazán
    Health Informatics Journal.2024;[Epub]     CrossRef
  • AI readiness scale for teachers: Development and validation
    Mehmet Ramazanoglu, Tayfun Akın
    Education and Information Technologies.2024;[Epub]     CrossRef
  • Content validity of the Constructivist Learning in Higher Education Settings (CLHES) scale in the context of the flipped classroom in higher education
    Turki Mesfer Alqahtani, Farrah Dina Yusop, Siti Hajar Halili
    Humanities and Social Sciences Communications.2023;[Epub]     CrossRef
  • The intensivist's assessment of gastrointestinal function: A pilot study
    Varsha M. Asrani, Colin McArthur, Ian Bissett, John A. Windsor
    Australian Critical Care.2022; 35(6): 636.     CrossRef
  • Psychometric evidence of a perception scale about covid-19 vaccination process in Peruvian dentists: a preliminary validation
    César F. Cayo-Rojas, Nancy Córdova-Limaylla, Gissela Briceño-Vergel, Marysela Ladera-Castañeda, Hernán Cachay-Criado, Carlos López-Gurreonero, Alberto Cornejo-Pinto, Luis Cervantes-Ganoza
    BMC Health Services Research.2022;[Epub]     CrossRef
  • Implementation of a Web-Based Educational Intervention for Promoting Flipped Classroom Pedagogy: A Mixed-Methods Study
    Punithalingam Youhasan, Mataroria P. Lyndon, Yan Chen, Marcus A. Henning
    Medical Science Educator.2022; 33(1): 91.     CrossRef
  • Assess the feasibility of flipped classroom pedagogy in undergraduate nursing education in Sri Lanka: A mixed-methods study
    Punithalingam Youhasan, Yan Chen, Mataroria Lyndon, Marcus A. Henning, Gwo-Jen Hwang
    PLOS ONE.2021; 16(11): e0259003.     CrossRef
  • Newly appointed medical faculty members’ self-evaluation of their educational roles at the Catholic University of Korea College of Medicine in 2020 and 2021: a cross-sectional survey-based study
    Sun Kim, A Ra Cho, Chul Woon Chung
    Journal of Educational Evaluation for Health Professions.2021; 18: 28.     CrossRef
Software report
Introduction to the LIVECAT web-based computerized adaptive testing platform  
Dong Gi Seo, Jeongwook Choi
J Educ Eval Health Prof. 2020;17:27.   Published online September 29, 2020
DOI: https://doi.org/10.3352/jeehp.2020.17.27
  • 6,055 View
  • 143 Download
  • 6 Web of Science
  • 6 Crossref
AbstractAbstract PDFSupplementary Material
This study introduces LIVECAT, a web-based computerized adaptive testing platform. This platform provides many functions, including writing item content, managing an item bank, creating and administering a test, reporting test results, and providing information about a test and examinees. The LIVECAT provides examination administrators with an easy and flexible environment for composing and managing examinations. It is available at http://www.thecatkorea.com/. Several tools were used to program LIVECAT, as follows: operating system, Amazon Linux; web server, nginx 1.18; WAS, Apache Tomcat 8.5; database, Amazon RDMS—Maria DB; and languages, JAVA8, HTML5/CSS, Javascript, and jQuery. The LIVECAT platform can be used to implement several item response theory (IRT) models such as the Rasch and 1-, 2-, 3-parameter logistic models. The administrator can choose a specific model of test construction in LIVECAT. Multimedia data such as images, audio files, and movies can be uploaded to items in LIVECAT. Two scoring methods (maximum likelihood estimation and expected a posteriori) are available in LIVECAT and the maximum Fisher information item selection method is applied to every IRT model in LIVECAT. The LIVECAT platform showed equal or better performance compared with a conventional test platform. The LIVECAT platform enables users without psychometric expertise to easily implement and perform computerized adaptive testing at their institutions. The most recent LIVECAT version only provides a dichotomous item response model and the basic components of CAT. Shortly, LIVECAT will include advanced functions, such as polytomous item response models, weighted likelihood estimation method, and content balancing method.

Citations

Citations to this article as recorded by  
  • Comparison of real data and simulated data analysis of a stopping rule based on the standard error of measurement in computerized adaptive testing for medical examinations in Korea: a psychometric study
    Dong Gi Seo, Jeongwook Choi, Jinha Kim
    Journal of Educational Evaluation for Health Professions.2024; 21: 18.     CrossRef
  • Educational Technology in the University: A Comprehensive Look at the Role of a Professor and Artificial Intelligence
    Cheolkyu Shin, Dong Gi Seo, Seoyeon Jin, Soo Hwa Lee, Hyun Je Park
    IEEE Access.2024; 12: 116727.     CrossRef
  • The irtQ R package: a user-friendly tool for item response theory-based test data analysis and calibration
    Hwanggyu Lim, Kyungseok Kang
    Journal of Educational Evaluation for Health Professions.2024; 21: 23.     CrossRef
  • Presidential address: improving item validity and adopting computer-based testing, clinical skills assessments, artificial intelligence, and virtual reality in health professions licensing examinations in Korea
    Hyunjoo Pai
    Journal of Educational Evaluation for Health Professions.2023; 20: 8.     CrossRef
  • Patient-reported outcome measures in cancer care: Integration with computerized adaptive testing
    Minyu Liang, Zengjie Ye
    Asia-Pacific Journal of Oncology Nursing.2023; 10(12): 100323.     CrossRef
  • Development of a character qualities test for medical students in Korea using polytomous item response theory and factor analysis: a preliminary scale development study
    Yera Hur, Dong Gi Seo
    Journal of Educational Evaluation for Health Professions.2023; 20: 20.     CrossRef
Research article
Correlations between moral courage scores and social desirability scores among medical residents and fellows in Argentina  
Raúl Alfredo Borracci, Graciana Ciambrone, José María Alvarez Gallesio
J Educ Eval Health Prof. 2020;17:6.   Published online February 18, 2020
DOI: https://doi.org/10.3352/jeehp.2020.17.6
  • 7,321 View
  • 191 Download
  • 3 Web of Science
  • 5 Crossref
AbstractAbstract PDFSupplementary Material
Purpose
Moral courage refers to the conviction to take action on one’s ethical beliefs despite the risk of adverse consequences. This study aimed to evaluate correlations between social desirability scores and moral courage scores among medical residents and fellows, and to explore gender- and specialty-based differences in moral courage scores.
Methods
In April 2018, the Moral Courage Scale for Physicians (MCSP), the Professional Moral Courage (PMC) scale and the Marlowe-Crowne scale to measure social desirability were administered to 87 medical residents from Hospital Alemán in Buenos Aires, Argentina.
Results
The Cronbach α coefficients were 0.78, 0.74, and 0.81 for the Marlowe-Crowne, MCSP, and PMC scales, respectively. Correlation analysis showed that moral courage scores were weakly correlated with social desirability scores, while both moral courage scales were strongly correlated with each other. Physicians who were training in a surgical specialty showed lower moral courage scores than nonsurgical specialty trainees, and men from any specialty tended to have lower moral courage scores than women. Specifically, individuals training in surgical specialties ranked lower on assessments of the “multiple values,” “endurance of threats,” and “going beyond compliance” dimensions of the PMC scale. Men tended to rank lower than women on the “multiple values,” “moral goals,” and “endurance of threats” dimensions.
Conclusion
There was a poor correlation between 2 validated moral courage scores and social desirability scores among medical residents and fellows in Argentina. Conversely, both moral courage tools showed a close correlation and concordance, suggesting that these scales are reasonably interchangeable.

Citations

Citations to this article as recorded by  
  • Moral courage level of nurses: a systematic review and meta-analysis
    Hang Li, JuLan Guo, ZhiRong Ren, Dingxi Bai, Jing Yang, Wei Wang, Han Fu, Qing Yang, Chaoming Hou, Jing Gao
    BMC Nursing.2024;[Epub]     CrossRef
  • CESARET NEDİR? CESARET TANIMLARININ İÇERİK ANALİZİ
    İbrahim Sani MERT
    Uluslararası İktisadi ve İdari Bilimler Dergisi.2023; 9(2): 126.     CrossRef
  • The Impact of Active Bystander Training on Officer Confidence and Ability to Address Ethical Challenges
    Travis Taniguchi, Heather Vovak, Gary Cordner, Karen Amendola, Yukun Yang, Katherine Hoogesteyn, Martin Bartness
    Policing: A Journal of Policy and Practice.2022; 16(3): 508.     CrossRef
  • The Role of Academic Medicine in the Call for Justice
    Danielle Laraque-Arena, Ilene Fennoy, Leslie L. Davidson
    Journal of the National Medical Association.2021; 113(4): 388.     CrossRef
  • Can Careproviders Still Bond with Patients after They Are Turned Down for a Treatment They Need?
    Edmund G. Howe
    The Journal of Clinical Ethics.2021; 32(3): 185.     CrossRef
Review article
Overview and current management of computerized adaptive testing in licensing/certification examinations  
Dong Gi Seo
J Educ Eval Health Prof. 2017;14:17.   Published online July 26, 2017
DOI: https://doi.org/10.3352/jeehp.2017.14.17
  • 39,591 View
  • 380 Download
  • 14 Web of Science
  • 11 Crossref
AbstractAbstract PDF
Computerized adaptive testing (CAT) has been implemented in high-stakes examinations such as the National Council Licensure Examination-Registered Nurses in the United States since 1994. Subsequently, the National Registry of Emergency Medical Technicians in the United States adopted CAT for certifying emergency medical technicians in 2007. This was done with the goal of introducing the implementation of CAT for medical health licensing examinations. Most implementations of CAT are based on item response theory, which hypothesizes that both the examinee and items have their own characteristics that do not change. There are 5 steps for implementing CAT: first, determining whether the CAT approach is feasible for a given testing program; second, establishing an item bank; third, pretesting, calibrating, and linking item parameters via statistical analysis; fourth, determining the specification for the final CAT related to the 5 components of the CAT algorithm; and finally, deploying the final CAT after specifying all the necessary components. The 5 components of the CAT algorithm are as follows: item bank, starting item, item selection rule, scoring procedure, and termination criterion. CAT management includes content balancing, item analysis, item scoring, standard setting, practice analysis, and item bank updates. Remaining issues include the cost of constructing CAT platforms and deploying the computer technology required to build an item bank. In conclusion, in order to ensure more accurate estimations of examinees’ ability, CAT may be a good option for national licensing examinations. Measurement theory can support its implementation for high-stakes examinations.

Citations

Citations to this article as recorded by  
  • Validation of the cognitive section of the Penn computerized adaptive test for neurocognitive and clinical psychopathology assessment (CAT-CCNB)
    Akira Di Sandro, Tyler M. Moore, Eirini Zoupou, Kelly P. Kennedy, Katherine C. Lopez, Kosha Ruparel, Lucky J. Njokweni, Sage Rush, Tarlan Daryoush, Olivia Franco, Alesandra Gorgone, Andrew Savino, Paige Didier, Daniel H. Wolf, Monica E. Calkins, J. Cobb S
    Brain and Cognition.2024; 174: 106117.     CrossRef
  • Comparison of real data and simulated data analysis of a stopping rule based on the standard error of measurement in computerized adaptive testing for medical examinations in Korea: a psychometric study
    Dong Gi Seo, Jeongwook Choi, Jinha Kim
    Journal of Educational Evaluation for Health Professions.2024; 21: 18.     CrossRef
  • The current utilization of the patient-reported outcome measurement information system (PROMIS) in isolated or combined total knee arthroplasty populations
    Puneet Gupta, Natalia Czerwonka, Sohil S. Desai, Alirio J. deMeireles, David P. Trofa, Alexander L. Neuwirth
    Knee Surgery & Related Research.2023;[Epub]     CrossRef
  • Evaluating a Computerized Adaptive Testing Version of a Cognitive Ability Test Using a Simulation Study
    Ioannis Tsaousis, Georgios D. Sideridis, Hannan M. AlGhamdi
    Journal of Psychoeducational Assessment.2021; 39(8): 954.     CrossRef
  • Accuracy and Efficiency of Web-based Assessment Platform (LIVECAT) for Computerized Adaptive Testing
    Do-Gyeong Kim, Dong-Gi Seo
    The Journal of Korean Institute of Information Technology.2020; 18(4): 77.     CrossRef
  • Transformaciones en educación médica: innovaciones en la evaluación de los aprendizajes y avances tecnológicos (parte 2)
    Veronica Luna de la Luz, Patricia González-Flores
    Investigación en Educación Médica.2020; 9(34): 87.     CrossRef
  • Introduction to the LIVECAT web-based computerized adaptive testing platform
    Dong Gi Seo, Jeongwook Choi
    Journal of Educational Evaluation for Health Professions.2020; 17: 27.     CrossRef
  • Computerised adaptive testing accurately predicts CLEFT-Q scores by selecting fewer, more patient-focused questions
    Conrad J. Harrison, Daan Geerards, Maarten J. Ottenhof, Anne F. Klassen, Karen W.Y. Wong Riff, Marc C. Swan, Andrea L. Pusic, Chris J. Sidey-Gibbons
    Journal of Plastic, Reconstructive & Aesthetic Surgery.2019; 72(11): 1819.     CrossRef
  • Presidential address: Preparing for permanent test centers and computerized adaptive testing
    Chang Hwi Kim
    Journal of Educational Evaluation for Health Professions.2018; 15: 1.     CrossRef
  • Updates from 2018: Being indexed in Embase, becoming an affiliated journal of the World Federation for Medical Education, implementing an optional open data policy, adopting principles of transparency and best practice in scholarly publishing, and appreci
    Sun Huh
    Journal of Educational Evaluation for Health Professions.2018; 15: 36.     CrossRef
  • Linear programming method to construct equated item sets for the implementation of periodical computer-based testing for the Korean Medical Licensing Examination
    Dong Gi Seo, Myeong Gi Kim, Na Hui Kim, Hye Sook Shin, Hyun Jung Kim
    Journal of Educational Evaluation for Health Professions.2018; 15: 26.     CrossRef
Research Articles
Psychometric properties of a novel knowledge assessment tool of mechanical ventilation for emergency medicine residents in the northeastern United States  
Jeremy B. Richards, Tania D. Strout, Todd A. Seigel, Susan R. Wilcox
J Educ Eval Health Prof. 2016;13:10.   Published online February 16, 2016
DOI: https://doi.org/10.3352/jeehp.2016.13.10
  • 28,174 View
  • 181 Download
  • 4 Web of Science
  • 4 Crossref
AbstractAbstract PDF
Purpose
Prior descriptions of the psychometric properties of validated knowledge assessment tools designed to determine Emergency medicine (EM) residents understanding of physiologic and clinical concepts related to mechanical ventilation are lacking. In this setting, we have performed this study to describe the psychometric and performance properties of a novel knowledge assessment tool that measures EM residents’ knowledge of topics in mechanical ventilation.
Methods
Results from a multicenter, prospective, survey study involving 219 EM residents from 8 academic hospitals in northeastern United States were analyzed to quantify reliability, item difficulty, and item discrimination of each of the 9 questions included in the knowledge assessment tool for 3 weeks, beginning in January 2013.
Results
The response rate for residents completing the knowledge assessment tool was 68.6% (214 out of 312 EM residents). Reliability was assessed by both Cronbach’s alpha coefficient (0.6293) and the Spearman-Brown coefficient (0.6437). Item difficulty ranged from 0.39 to 0.96, with a mean item difficulty of 0.75 for all 9 questions. Uncorrected item discrimination values ranged from 0.111 to 0.556. Corrected item-total correlations were determined by removing the question being assessed from analysis, resulting in a range of item discrimination from 0.139 to 0.498.
Conclusion
Reliability, item difficulty and item discrimination were within satisfactory ranges in this study, demonstrating acceptable psychometric properties of this knowledge assessment tool. This assessment indicates that this knowledge assessment tool is sufficiently rigorous for use in future research studies or for assessment of EM residents for evaluative purposes.

Citations

Citations to this article as recorded by  
  • Comparison of three methods for teaching mechanical ventilation in an emergency setting to sixth-year medical students: a randomized trial
    Fernando Sabia Tallo, Letícia Sandre Vendrame, André Luciano Baitello
    Revista da Associação Médica Brasileira.2020; 66(10): 1409.     CrossRef
  • Critical Appraisal of Emergency Medicine Educational Research: The Best Publications of 2016
    Nicole M. Dubosh, Jaime Jordan, Lalena M. Yarris, Edward Ullman, Joshua Kornegay, Daniel Runde, Amy Miller Juve, Jonathan Fisher, Teresa Chan
    AEM Education and Training.2019; 3(1): 58.     CrossRef
  • Mechanical Ventilation Training During Graduate Medical Education: Perspectives and Review of the Literature
    Jonathan M. Keller, Dru Claar, Juliana Carvalho Ferreira, David C. Chu, Tanzib Hossain, William Graham Carlos, Jeffrey A. Gold, Stephanie A. Nonas, Nitin Seam
    Journal of Graduate Medical Education.2019; 11(4): 389.     CrossRef
  • Development and validation of a questionnaire to assess the knowledge of mechanical ventilation in urgent care among students in their last-year medical course in Brazil
    Fernando Sabia Tallo, Simone de Campos Vieira Abib, Andre Luciano Baitello, Renato Delascio Lopes
    Clinics.2019; 74: e663.     CrossRef
The validity and reliability of a problem-based learning implementation questionnaire  
Bhina Patria
J Educ Eval Health Prof. 2015;12:22.   Published online June 8, 2015
DOI: https://doi.org/10.3352/jeehp.2015.12.22
  • 51,792 View
  • 312 Download
  • 3 Web of Science
  • 1 Crossref
AbstractAbstract PDF
Purpose
The aim of this paper is to provide evidence for the validity and reliability of a questionnaire for assessing the implementation of problem-based learning (PBL). This questionnaire was developed to assess the quality of PBL implementation from the perspective of medical school graduates. Methods: A confirmatory factor analysis was conducted to assess the validity of the questionnaire. The analysis was based on a survey of 225 graduates of a problem-based medical school in Indonesia. Results: The results showed that the confirmatory factor analysis model had a good fit to the data. Further, the values of the standardized loading estimates, the squared inter-construct correlations, the average variances extracted, and the composite reliabilities all provided evidence of construct validity. Conclusion: The PBL implementation questionnaire was found to be valid and reliable, making it suitable for evaluation purposes.

Citations

Citations to this article as recorded by  
  • Changes in Learning Outcomes of Students Participating in Problem-Based Learning for the First Time: A Case Study of a Financial Management Course
    Yung-Chuan Lee
    The Asia-Pacific Education Researcher.2024;[Epub]     CrossRef
Review Article
Reconsidering the Cut Score of Korean National Medical Licensing Examination
Duck Sun Ahn, Sowon Ahn
J Educ Eval Health Prof. 2007;4:1.   Published online April 28, 2007
DOI: https://doi.org/10.3352/jeehp.2007.4.1
  • 42,692 View
  • 181 Download
  • 4 Crossref
AbstractAbstract PDF
After briefly reviewing theories of standard setting we analyzed the problems of the current cut scores. Then, we reported the results of need assessment on the standard setting among medical educators and psychometricians. Analyses of the standard setting methods of developed countries were reported as well. Based on these findings, we suggested the Bookmark and the modified Angoff methods as alternative methods for setting standard. Possible problems and challenges were discussed when these methods were applied to the National Medical Licensing Examination.

Citations

Citations to this article as recorded by  
  • Predicting medical graduates’ clinical performance using national competency examination results in Indonesia
    Prattama Santoso Utomo, Amandha Boy Timor Randita, Rilani Riskiyana, Felicia Kurniawan, Irwin Aras, Cholis Abrori, Gandes Retno Rahayu
    BMC Medical Education.2022;[Epub]     CrossRef
  • Possibility of independent use of the yes/no Angoff and Hofstee methods for the standard setting of the Korean Medical Licensing Examination written test: a descriptive study
    Do-Hwan Kim, Ye Ji Kang, Hoon-Ki Park
    Journal of Educational Evaluation for Health Professions.2022; 19: 33.     CrossRef
  • Applying the Bookmark method to medical education: Standard setting for an aseptic technique station
    Monica L. Lypson, Steven M. Downing, Larry D. Gruppen, Rachel Yudkowsky
    Medical Teacher.2013; 35(7): 581.     CrossRef
  • Standard Setting in Student Assessment: Is a Defensible Method Yet to Come?
    A Barman
    Annals of the Academy of Medicine, Singapore.2008; 37(11): 957.     CrossRef

JEEHP : Journal of Educational Evaluation for Health Professions
TOP