Calibrating the Medical Council of Canada's Qualifying Examination Part I using an integrated item response theory framework: a comparison of models and designs

De Champlain, Andre F; Boulais, Andre Philippe; Dallas, Andrew

J Educ Eval Health Prof. 2016;13:6. 10.3352/jeehp.2016.13.6.

Calibrating the Medical Council of Canada's Qualifying Examination Part I using an integrated item response theory framework: a comparison of models and designs

Affiliations

¹Research & Development, Medical Council of Canada, Ottawa, Ontario, Canada. adechamplain@mcc.ca
²Educational Research Methodology Department, School of Education, University of North Carolina at Greensboro, Greensboro, North Carolina, USA.

KMID: 2413755
DOI: http://doi.org/10.3352/jeehp.2016.13.6

Abstract

PURPOSE
The aim of this research was to compare different methods of calibrating multiple choice question (MCQ) and clinical decision making (CDM) components for the Medical Council of Canada's Qualifying Examination Part I (MCCQEI) based on item response theory.
METHODS
Our data consisted of test results from 8,213 first time applicants to MCCQEI in spring and fall 2010 and 2011 test administrations. The data set contained several thousand multiple choice items and several hundred CDM cases. Four dichotomous calibrations were run using BILOG-MG 3.0. All 3 mixed item format (dichotomous MCQ responses and polytomous CDM case scores) calibrations were conducted using PARSCALE 4.
RESULTS
The 2-PL model had identical numbers of items with chi-square values at or below a Type I error rate of 0.01 (83/3,499 or 0.02). In all 3 polytomous models, whether the MCQs were either anchored or concurrently run with the CDM cases, results suggest very poor fit. All IRT abilities estimated from dichotomous calibration designs correlated very highly with each other. IRT-based pass-fail rates were extremely similar, not only across calibration designs and methods, but also with regard to the actual reported decision to candidates. The largest difference noted in pass rates was 4.78%, which occurred between the mixed format concurrent 2-PL graded response model (pass rate= 80.43%) and the dichotomous anchored 1-PL calibrations (pass rate= 85.21%).
CONCLUSION
Simpler calibration designs with dichotomized items should be implemented. The dichotomous calibrations provided better fit of the item response matrix than more complex, polytomous calibrations.

Keyword

Calibration; Canada; Educational measurement; Item response theory; Licensure

MeSH Terms

Calibration
Canada
Clinical Decision-Making
Dataset
Educational Measurement
Licensure

Figure

Fig. 1. Number of clinical decision making questions for each score category (N = 270).
Fig. 2. Pass/fail rates for each item response theory-based calibration and reported z-score. P: pass, F: fail.

Reference

References

1. Zimowski M, Muraki E, Mislevy R, Bock D. BILOG-MG 3. Multiple-group IRT analysis and test maintenance for binary items. Chicago (IL): Scientific Software International, Inc;2003.

2. Muraki E. PARSCALE 4: IRT based test scoring and item analysis for graded items and rating scales. Chicago (IL): Scientific Software International, Inc;2003.

3. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1997; 33:159–174.
Article

Calibrating the Medical Council of Canada's Qualifying Examination Part I using an integrated item response theory framework: a comparison of models and designs

Abstract

Keyword

MeSH Terms

Figure

Reference

References

Cited

Save citations to file

Email citations