Korean J Radiol.  2019 Mar;20(3):405-410. 10.3348/kjr.2019.0025.

Design Characteristics of Studies Reporting the Performance of Artificial Intelligence Algorithms for Diagnostic Analysis of Medical Images: Results from Recently Published Papers

Affiliations
  • 1Department of Radiology, Taean-gun Health Center and County Hospital, Taean-gun, Korea.
  • 2Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea. seongho@amc.seoul.kr

Abstract


OBJECTIVE
To evaluate the design characteristics of studies that evaluated the performance of artificial intelligence (AI) algorithms for the diagnostic analysis of medical images.
MATERIALS AND METHODS
PubMed MEDLINE and Embase databases were searched to identify original research articles published between January 1, 2018 and August 17, 2018 that investigated the performance of AI algorithms that analyze medical images to provide diagnostic decisions. Eligible articles were evaluated to determine 1) whether the study used external validation rather than internal validation, and in case of external validation, whether the data for validation were collected, 2) with diagnostic cohort design instead of diagnostic case-control design, 3) from multiple institutions, and 4) in a prospective manner. These are fundamental methodologic features recommended for clinical validation of AI performance in real-world practice. The studies that fulfilled the above criteria were identified. We classified the publishing journals into medical vs. non-medical journal groups. Then, the results were compared between medical and non-medical journals.
RESULTS
Of 516 eligible published studies, only 6% (31 studies) performed external validation. None of the 31 studies adopted all three design features: diagnostic cohort design, the inclusion of multiple institutions, and prospective data collection for external validation. No significant difference was found between medical and non-medical journals.
CONCLUSION
Nearly all of the studies published in the study period that evaluated the performance of AI algorithms for diagnostic analysis of medical images were designed as proof-of-concept technical feasibility studies and did not have the design features that are recommended for robust validation of the real-world clinical performance of AI algorithms.

Keyword

Artificial intelligence; Machine learning; Deep learning; Clinical validation; Clinical trial; Accuracy; Study design; Quality; Appropriateness; Systematic review; Meta-analysis

MeSH Terms

Artificial Intelligence*
Case-Control Studies
Cohort Studies
Data Collection
Feasibility Studies
Machine Learning
Prospective Studies

Figure

  • Fig. 1 Flow-chart of article selection based on preferred reporting items for systematic reviews and meta-analyses guidelines.


Cited by  3 articles

Radiomics and Deep Learning: Hepatic Applications
Hyo Jung Park, Bumwoo Park, Seung Soo Lee
Korean J Radiol. 2020;21(4):387-401.    doi: 10.3348/kjr.2019.0752.

What should medical students know about artificial intelligence in medicine?
Seong Ho Park, Kyung-Hyun Do, Sungwon Kim, Joo Hyun Park, Young-Suk Lim, A Ra Cho
J Educ Eval Health Prof. 2019;16:18.    doi: 10.3352/jeehp.2019.16.18.

Applications of Machine Learning in Bone and Mineral Research
Sung Hye Kong, Chan Soo Shin
Endocrinol Metab. 2021;36(5):928-937.    doi: 10.3803/EnM.2021.1111.


Reference

1. Lee JG, Jun S, Cho YW, Lee H, Kim GB, Seo JB, et al. Deep learning in medical imaging: general overview. Korean J Radiol. 2017; 18:570–584. PMID: 28670152.
Article
2. Yamashita R, Nishio M, Do RKG, Togashi K. Convolutional neural networks: an overview and application in radiology. Insights Imaging. 2018; 9:611–629. PMID: 29934920.
Article
3. Chartrand G, Cheng PM, Vorontsov E, Drozdzal M, Turcotte S, Pal CJ, et al. Deep learning: a primer for radiologists. Radiographics. 2017; 37:2113–2131. PMID: 29131760.
Article
4. Choy G, Khalilzadeh O, Michalski M, Do S, Samir AE, Pianykh OS, et al. Current applications and future impact of machine learning in radiology. Radiology. 2018; 288:318–328. PMID: 29944078.
Article
5. SFR-IA Group. CERF. French Radiology Community. Artificial intelligence and medical imaging 2018: French Radiology Community white paper. Diagn Interv Imaging. 2018; 99:727–742. PMID: 30470627.
6. Greaves F, Joshi I, Campbell M, Roberts S, Patel N, Powell J. What is an appropriate level of evidence for a digital health intervention? Lancet. 2019; 392:2665–2667. PMID: 30545779.
Article
7. Maddox TM, Rumsfeld JS, Payne PRO. Questions for artificial intelligence in health care. JAMA. 2019; 321:31–32. PMID: 30535130.
Article
8. Shortliffe EH, Sepu´lveda MJ. Clinical decision support in the era of artificial intelligence. JAMA. 2018; 320:2199–2200. PMID: 30398550.
Article
9. Tang A, Tam R, Cadrin-Chênevert A, Guest W, Chong J, Barfett J, et al. Canadian Association of Radiologists (CAR) Artificial Intelligence Working Group. Canadian Association of Radiologists white paper on artificial intelligence in radiology. Can Assoc Radiol J. 2018; 69:120–135. PMID: 29655580.
Article
10. Park SH, Do KH, Choi JI, Sim JS, Yang DM, Eo H, et al. Principles for evaluating the clinical implementation of novel digital healthcare devices. J Korean Med Assoc. 2018; 61:765–775.
Article
11. Park SH, Kressel HY. Connecting technological innovation in artificial intelligence to real-world medical practice through rigorous clinical validation: what peer-reviewed medical journals could do. J Korean Med Sci. 2018; 33:e152. PMID: 29805337.
Article
12. Park SH, Han K. Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction. Radiology. 2018; 286:800–809. PMID: 29309734.
13. Fryback DG, Thornbury JR. The efficacy of diagnostic imaging. Med Decis Making. 1991; 11:88–94. PMID: 1907710.
Article
14. England JR, Cheng PM. Artificial intelligence for medical image analysis: a guide for authors and reviewers. AJR Am J Roentgenol. 2018; 12. 17. [Epub ahead of print]. DOI: 10.2214/AJR.18.20490.
Article
15. Park SH. Diagnostic case-control versus diagnostic cohort studies for clinical validation of artificial intelligence algorithm performance. Radiology. 2019; 290:272–273. PMID: 30511912.
Article
16. Zech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, Oermann EK. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med. 2018; 15:e1002683. PMID: 30399157.
Article
17. Ting DSW, Cheung CY, Lim G, Tan GSW, Quang ND, Gan A, et al. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA. 2017; 318:2211–2223. PMID: 29234807.
Article
18. Li X, Zhang S, Zhang Q, Wei X, Pan Y, Zhao J, et al. Diagnosis of thyroid cancer using deep convolutional neural network models applied to sonographic images: a retrospective, multicohort, diagnostic study. Lancet Oncol. 2019; 20:193–201. PMID: 30583848.
Article
19. Nsoesie EO. Evaluating artificial intelligence applications in clinical settings. JAMA Netw Open. 2018; 1:e182658. PMID: 30646173.
Article
20. Zou J, Schiebinger L. AI can be sexist and racist - it's time to make it fair. Nature. 2018; 559:324–326. PMID: 30018439.
Article
21. Gianfrancesco MA, Tamang S, Yazdany J, Schmajuk G. Potential biases in machine learning algorithms using electronic health record data. JAMA Intern Med. 2018; 178:1544–1547. PMID: 30128552.
Article
22. AlBadawy EA, Saha A, Mazurowski MA. Deep learning for segmentation of brain tumors: impact of cross-institutional training and testing. Med Phys. 2018; 45:1150–1158. PMID: 29356028.
Article
23. The Lancet. Is digital medicine different? Lancet. 2018; 392:95. PMID: 30017135.
24. AI diagnostics need attention. Nature. 2018; 555:285.
25. Luo W, Phung D, Tran T, Gupta S, Rana S, Karmakar C, et al. Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res. 2016; 18:e323. PMID: 27986644.
Article
26. Rutjes AW, Reitsma JB, Vandenbroucke JP, Glas AS, Bossuyt PM. Case-control and two-gate designs in diagnostic accuracy studies. Clin Chem. 2005; 51:1335–1341. PMID: 15961549.
Article
27. Gill J, Prasad V. Improving observational studies in the era of big data. Lancet. 2018; 392:716–717. PMID: 30191816.
Article
28. Korevaar DA, Hooft L, Askie LM, Barbour V, Faure H, Gatsonis CA, et al. Facilitating prospective registration of diagnostic accuracy studies: a STARD initiative. Clin Chem. 2017; 63:1331–1341. PMID: 28630237.
Article
29. Kang JH, Kim DH, Park SH, Baek JH. Age of data in contemporary research articles published in representative general radiology journals. Korean J Radiol. 2018; 19:1172–1178. PMID: 30386148.
Article
30. INFANT Collaborative Group. Computerised interpretation of fetal heart rate during labour (INFANT): a randomised controlled trial. Lancet. 2017; 389:1719–1729. PMID: 28341515.
Full Text Links
  • KJR
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr