Korean J Radiol.  2019 Feb;20(2):218-224. 10.3348/kjr.2018.0193.

Interpretive Performance and Inter-Observer Agreement on Digital Mammography Test Sets

Affiliations
  • 1Department of Radiology, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea.
  • 2Department of Radiology, Soonchunhyang University Hospital Bucheon, Soonchunhyang University College of Medicine, Bucheon, Korea. grace927@hanmail.net
  • 3National Cancer Control Institute, National Cancer Center, Goyang, Korea.
  • 4Department of Radiology, Dankook University Hospital, Dankook University College of Medicine, Cheonan, Korea.
  • 5Department of Radiology, Soonchunhyang University Hospital, Soonchunhyang University College of Medicine, Seoul, Korea.
  • 6Department of Radiology, Dong-A University Hospital, Busan, Korea.
  • 7Department of Radiology, Wonkwang University Hospital, Wonkwang University School of Medicine, Iksan, Korea.
  • 8Department of Radiology, Chonbuk National University Hospital, Jeonju, Korea.

Abstract


OBJECTIVE
To evaluate the interpretive performance and inter-observer agreement on digital mammographs among radiologists and to investigate whether radiologist characteristics affect performance and agreement.
MATERIALS AND METHODS
The test sets consisted of full-field digital mammograms and contained 12 cancer cases among 1000 total cases. Twelve radiologists independently interpreted all mammograms. Performance indicators included the recall rate, cancer detection rate (CDR), positive predictive value (PPV), sensitivity, specificity, false positive rate (FPR), and area under the receiver operating characteristic curve (AUC). Inter-radiologist agreement was measured. The reporting radiologist characteristics included number of years of experience interpreting mammography, fellowship training in breast imaging, and annual volume of mammography interpretation.
RESULTS
The mean and range of interpretive performance were as follows: recall rate, 7.5% (3.3-10.2%); CDR, 10.6 (8.0-12.0 per 1000 examinations); PPV, 15.9% (8.8-33.3%); sensitivity, 88.2% (66.7-100%); specificity, 93.5% (90.6-97.8%); FPR, 6.5% (2.2-9.4%); and AUC, 0.93 (0.82-0.99). Radiologists who annually interpreted more than 3000 screening mammograms tended to exhibit higher CDRs and sensitivities than those who interpreted fewer than 3000 mammograms (p = 0.064). The inter-radiologist agreement showed a percent agreement of 77.2-88.8% and a kappa value of 0.27-0.34. Radiologist characteristics did not affect agreement.
CONCLUSION
The interpretative performance of the radiologists fulfilled the mammography screening goal of the American College of Radiology, although there was inter-observer variability. Radiologists who interpreted more than 3000 screening mammograms annually tended to perform better than radiologists who did not.

Keyword

Screening; Medical audit; Radiologists; Observer variation; Sensitivity and specificity

MeSH Terms

Area Under Curve
Breast
Fellowships and Scholarships
Mammography*
Mass Screening
Medical Audit
Observer Variation
ROC Curve
Sensitivity and Specificity

Figure

  • Fig. 1 Areas under curve of twelve radiologists ranged from 0.82 to 0.99 with mean value of 0.93.ROC = receiver-operating-characteristic


Reference

1. Youlden DR, Cramb SM, Yip CH, Baade PD. Incidence and mortality of female breast cancer in the Asia-Pacific region. Cancer Biol Med. 2014; 11:101–115. PMID: 25009752.
2. Leong SP, Shen ZZ, Liu TJ, Agarwal G, Tajima T, Paik NS, et al. Is breast cancer the same disease in Asian and Western countries? World J Surg. 2010; 34:2308–2324. PMID: 20607258.
Article
3. Ohuchi N, Suzuki A, Sobue T, Kawai M, Yamamoto S, Zheng YF, et al. J-START investigator groups. Sensitivity and specificity of mammography and adjunctive ultrasonography to screen for breast cancer in the Japan Strategic Anti-cancer Randomized Trial (J-START): a randomised controlled trial. Lancet. 2016; 387:341–348. PMID: 26547101.
4. American College of Radiology. ACR BI-RADS Atlas®. 5th ed. Reston, VA: American College of Radiology;2013.
5. Lee EH, Kim KW, Kim YJ, Shin DR, Park YM, Lim HS, et al. Performance of screening mammography: a report of the alliance for breast cancer screening in Korea. Korean J Radiol. 2016; 17:489–496. PMID: 27390540.
Article
6. Baker JA, Kornguth PJ, Floyd CE Jr. Breast Imaging Reporting and Data System standardized mammography lexicon: observer variability in lesion description. AJR Am J Roentgenol. 1996; 166:773–778. PMID: 8610547.
Article
7. Lazarus E, Mainiero MB, Schepps B, Koelliker SL, Livingston LS. BI-RADS lexicon for US and mammography: interobserver variability and positive predictive value. Radiology. 2006; 239:385–391. PMID: 16569780.
Article
8. Berg WA, Campassi C, Langenberg P, Sexton MJ. Breast Imaging Reporting and Data System: inter- and intraobserver variability in feature analysis and final assessment. AJR Am J Roentgenol. 2000; 174:1769–1777. PMID: 10845521.
9. Timmers JM, van Doorne-Nagtegaal HJ, Zonderland HM, van Tinteren H, Visser O, Verbeek AL, et al. The Breast Imaging Reporting and Data System (BI-RADS) in the Dutch breast cancer screening programme: its role as an assessment and stratification tool. Eur Radiol. 2012; 22:1717–1723. PMID: 22415412.
Article
10. Duijm LE, Louwman MW, Groenewoud JH, van de Poll-Franse LV, Fracheboud J, Coebergh JW. Inter-observer variability in mammography screening and effect of type and number of readers on screening outcome. Br J Cancer. 2009; 100:901–907. PMID: 19259088.
Article
11. Elmore JG, Jackson SL, Abraham L, Miglioretti DL, Carney PA, Geller BM, et al. Variability in interpretive performance at screening mammography and radiologists' characteristics associated with accuracy. Radiology. 2009; 253:641–651. PMID: 19864507.
Article
12. Barlow WE, Chi C, Carney PA, Taplin SH, D'Orsi C, Cutter G, et al. Accuracy of screening mammography interpretation by characteristics of radiologists. J Natl Cancer Inst. 2004; 96:1840–1850. PMID: 15601640.
Article
13. Kim YJ, Lee EH, Jun JK, Shin DR, Park YM, Kim HW, et al. Analysis of participant factors that affect the diagnostic performance of screening mammography: a report of the Alliance for Breast Cancer Screening in Korea. Korean J Radiol. 2017; 18:624–631. PMID: 28670157.
Article
14. Pisano ED, Gatsonis C, Hendrick E, Yaffe M, Baum JK, Acharyya S, et al. Digital Mammographic Imaging Screening Trial (DMIST) Investigators Group. Diagnostic performance of digital versus film mammography for breast-cancer screening. N Engl J Med. 2005; 353:1773–1783. PMID: 16169887.
15. Hayes AF, Krippendorff K. Answering the call for a standard reliability measure for coding data. Communication Methods and Measures. 2007; 1:77–89.
Article
16. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977; 33:159–174. PMID: 843571.
Article
17. Lehman CD, Arao RF, Sprague BL, Lee JM, Buist DS, Kerlikowske K, et al. National performance benchmarks for modern screening digital mammography: update from the Breast Cancer Surveillance Consortium. Radiology. 2017; 283:49–58. PMID: 27918707.
Article
18. Rickard M, Taylor R, Page A, Estoesta J. Cancer detection and mammogram volume of radiologists in a population-based screening programme. Breast. 2006; 15:39–43. PMID: 16005226.
Article
19. Albert US, Altland H, Duda V, Engel J, Geraedts M, Heywang-Köbrunner S, et al. 2008 update of the guideline: early detection of breast cancer in Germany. J Cancer Res Clin Oncol. 2009; 135:339–354. PMID: 18661152.
Article
20. National Cancer Center. Ministry of Health & Welfare. Quality guidelines of breast cancer screening. 2nd ed. Goyang: National Cancer Center;2018. p. 43.
21. Haneuse S, Buist DS, Miglioretti DL, Anderson ML, Carney PA, Onega T, et al. Mammographic interpretive volume and diagnostic mammogram interpretation performance in community practice. Radiology. 2012; 262:69–79. PMID: 22106351.
Article
22. Berg WA, D'Orsi CJ, Jackson VP, Bassett LW, Beam CA, Lewis RS, et al. Does training in the Breast Imaging Reporting and Data System (BI-RADS) improve biopsy recommendations or feature analysis agreement with experienced breast imagers at mammography? Radiology. 2002; 224:871–880. PMID: 12202727.
Article
23. Nelson HD, Pappas M, Cantor A, Griffin J, Daeges M, Humphrey L. Harms of breast cancer screening: systematic review to update the 2009 U.S. Preventive Services Task Force recommendation. Ann Intern Med. 2016; 164:256–267. PMID: 26756737.
Article
24. Lee EH, Jun JK, Jung SE, Kim YM, Choi N. The efficacy of mammography boot camp to improve the performance of radiologists. Korean J Radiol. 2014; 15:578–585. PMID: 25246818.
Article
25. Elmore JG, Miglioretti DL, Reisch LM, Barton MB, Kreuter W, Christiansen CL, et al. Screening mammograms by community radiologists: variability in false-positive rates. J Natl Cancer Inst. 2002; 94:1373–1380. PMID: 12237283.
Article
Full Text Links
  • KJR
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr