J Educ Eval Health Prof.  2020;17:12. 10.3352/jeehp.2020.17.12.

Performance of the Ebel standard-setting method for the spring 2019 Royal College of Physicians and Surgeons of Canada internal medicine certification examination consisting of multiple-choice questions

Affiliations
  • 1Exam Quality and Analytics Unit, Royal College of Physicians and Surgeons of Canada, Ottawa, ON, Canada
  • 2Department of Medicine, University of Calgary, Calgary, AB, Canada
  • 3Department of Medicine, University of Manitoba, Winnipeg, MB, Canada

Abstract

Purpose
This study aimed to assess the performance of the Ebel standard-setting method for the spring 2019 Royal College of Physicians and Surgeons of Canada internal medicine certification examination consisting of multiple-choice questions. Specifically, the following parameters were evaluated: inter-rater agreement, the correlations between Ebel scores and item facility indices, the impact of raters’ knowledge of correct answers on the Ebel score, and the effects of raters’ specialty on inter-rater agreement and Ebel scores.
Methods
Data were drawn from a Royal College of Physicians and Surgeons of Canada certification exam. The Ebel method was applied to 203 multiple-choice questions by 49 raters. Facility indices came from 194 candidates. We computed the Fleiss kappa and the Pearson correlations between Ebel scores and item facility indices. We investigated differences in the Ebel score according to whether correct answers were provided or not and differences between internists and other specialists using the t-test.
Results
The Fleiss kappa was below 0.15 for both facility and relevance. The correlation between Ebel scores and facility indices was low when correct answers were provided and negligible when they were not. The Ebel score was the same whether the correct answers were provided or not. Inter-rater agreement and Ebel scores were not significantly different between internists and other specialists.
Conclusion
Inter-rater agreement and correlations between item Ebel scores and facility indices were consistently low; furthermore, raters’ knowledge of the correct answers and raters’ specialty had no effect on Ebel scores in the present setting.

Keyword

Canada; Certification; Medicine; Specialization; Standard-setting

Figure

  • Fig. 1. Correlation between Ebel scores and item facility indices.

  • Fig. 2. Bland-Altman plot of the difference between item facility indices and Ebel scores.


Reference

References

1. American Educational Research Association; American Psychological Association; National Council on Measurement in Education. Standards for educational and psychological testing. Washington (DC): American Educational Research Association;2014. p. 230.
2. Cizek GJ, Bunch MB. Standard setting. Thousand Oaks (CA): SAGE Publications Inc.;2007. p. 352.
3. Norcini JJ. Setting standards on educational tests. Med Educ. 2003; 37:464–469. https://doi.org/10.1046/j.1365-2923.2003.01495.x.
Article
4. Ebel RL. Essentials of educational measurement. Englewood Cliff (NJ): Prentice-Hall;1972. p. 634.
5. Goldenberg MG, Garbens A, Szasz P, Hauer T, Grantcharov TP. Systematic review to establish absolute standards for technical performance in surgery. Br J Surg. 2017; 104:13–21. https://doi.org/10.1002/bjs.10313.
Article
6. Park J, Ahn DS, Yim MK, Lee J. Comparison of standard-setting methods for the Korea radiological technologist licensing examination: Angoff, Ebel, Bookmark, and Hofstee. J Educ Eval Health Prof. 2018; 15:32. https://doi.org/10.3352/jeehp.2018.15.32.
Article
7. Shoukri MM. Measures of interobserver agreement and reliability. 2nd ed. Boca Raton (FL): CRC Press;2011. p. 269.
8. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977; 33:159–174. https://doi.org/10.2307/2529310.
Article
9. Cusimano MD. Standard setting in medical education. Acad Med. 1996; 71(10 Suppl):S112–S120. https://doi.org/10.1097/00001888-199610000-00062.
Article
10. Downing SM, Tekian A, Yudkowsky R. Procedures for establishing defensible absolute passing scores on performance examinations in health professions education. Teach Learn Med. 2006; 18:50–57. https://doi.org/10.1207/s15328015tlm1801_11.
Article
11. Homer M, Darling J, Pell G. Psychometric characteristics of integrated multi-specialty examinations: Ebel ratings and unidimensionality. Assess Eval High Educ. 2012; 37:787–804. https://doi.org/10.1080/02602938.2011.573843.
Article
12. Downing SM, Lieska NG, Raible MD. Establishing passing standards for classroom achievement tests in medical education: a comparative study of four methods. Acad Med. 2003; 78(10 Suppl):S85–S87. https://doi.org/10.1097/00001888-200310001-00027.
Article
13. Swanson DB, Dillon GF, Ross LE. Setting content-based standards for national board exams: initial research for the comprehensive part I examination. Acad Med. 1990; 65(9 Suppl):S17–S18. https://doi.org/10.1097/00001888-199009000-00023.
Article
14. Homer M, Darling JC. Setting standards in knowledge assessments: comparing Ebel and Cohen via Rasch. Med Teach. 2016; 38:1267–1277. https://doi.org/10.1080/0142159X.2016.1230184.
Article
15. Wyse AE, Babcock B. A method for detecting regression of hard and easy item Angoff ratings. J Educ Meas. 2019; 56:28–50. https://doi.org/10.1111/jedm.12199.
Article
Full Text Links
  • JEEHP
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr