1. Fink MA, Bischoff A, Fink CA, Moll M, Kroschke J, Dulz L, et al. Potential of ChatGPT and GPT-4 for data mining of Free-Text CT reports on lung cancer. Radiology. 2023; 308(3):e231362.
https://doi.org/10.1148/radiol.231362.
Article
2. Gu K, Lee JH, Shin J, Hwang JA, Min JH, Jeong WK, et al. Using GPT-4 for LI-RADS feature extraction and categorization with multilingual free-text reports. Liver Int. 2024; 44(7):1578–87.
https://doi.org/10.1111/liv.15891.
Article
5. Alsentzer E, Rasmussen MJ, Fontoura R, Cull AL, Beaulieu-Jones B, Gray KJ, et al. Zero-shot interpretable phenotyping of postpartum hemorrhage using large language models. NPJ Digit Med. 2023; 6(1):212.
https://doi.org/10.1038/s41746-023-00957-x.
Article
6. Banerjee I, Davis MA, Vey BL, Mazaheri S, Khan F, Zavaletta V, et al. Natural language processing model for identifying critical findings-a multi-institutional study. J Digit Imaging. 2023; 36(1):105–13.
https://doi.org/10.1007/s10278-022-00712-w.
Article
7. Woo KC, Simon GW, Akindutire O, Aphinyanaphongs Y, Austrian JS, Kim JG, et al. Evaluation of GPT-4 ability to identify and generate patient instructions for actionable incidental radiology findings. J Am Med Inform Assoc. 2024; 31(9):1983–93.
https://doi.org/10.1093/jamia/ocae117.
Article
8. Lau W, Payne TH, Uzuner O, Yetisgen M. Extraction and analysis of clinically important follow-up recommendations in a large radiology dataset. AMIA Jt Summits Transl Sci Proc. 2020; 2020:335–44.
9. Irvin J, Rajpurkar P, Ko M, Yu Y, Ciurea-Ilcus S, Chute C, et al. Chexpert: a large chest radiograph dataset with uncertainty labels and expert comparison. Proc AAAI Conf Artif Intell. 2019; 33(1):590–7.
https://doi.org/10.1609/aaai.v33i01.3301590.
Article
10. Adams LC, Truhn D, Busch F, Kader A, Niehues SM, Makowski MR, et al. Leveraging GPT-4 for Post Hoc Transformation of Free-text Radiology Reports into Structured Reporting: A Multilingual Feasibility Study. Radiology. 2023; 307(4):e230725.
https://doi.org/10.1148/radiol.230725.
Article
12. Mukherjee P, Hou B, Lanfredi RB, Summers RM. Feasibility of using the privacy-preserving large language model vicuna for labeling radiology reports. Radiology. 2023; 309(1):e231147.
https://doi.org/10.1148/radiol.231147.
Article
13. Kim S, Kim D, Shin HJ, Lee SH, Kang Y, Jeong S, et al. Large-scale validation of the feasibility of GPT-4 as a proofreading tool for head CT reports. Radiology. 2025; 314(1):e240701.
https://doi.org/10.1148/radiol.240701.
Article
15. Schmidt RA, Seah JC, Cao K, Lim L, Lim W, Yeung J. Generative large language models for detection of speech recognition errors in radiology reports. Radiol Artif Intell. 2024; 6(2):e230205.
https://doi.org/10.1148/ryai.230205.
Article
16. Savage CH, Park H, Kwak K, Smith AD, Rothenberg SA, Parekh VS, et al. General-purpose large language models versus a domain-specific natural language processing tool for label extraction from chest radiograph reports. AJR Am J Roentgenol. 2024; 222(4):e2330573.
https://doi.org/10.2214/AJR.23.30573.
Article
17. Dong Q, Li L, Dai D, Zheng C, Ma J, Li R, et al. A survey on in-context learning [Internet]. Ithaca (NY): arXiv.org;2024. [cited at 2025 Jul 1]. Available from:
https://arxiv.org/abs/2301.00234.
18. Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, et al. Chain-of-thought prompting elicits reasoning in large language models. Adv Neural Inf Process Syst. 2022; 35:24824–37.
19. Liu J, Shen D, Zhang Y, Dolan B, Carin L, Chen W. What makes good in-context examples for GPT-3? [Internet]. Ithaca (NY): arXiv.org;2021. [cited at 2025 Jul 1]. Available from:
https://arxiv.org/abs/2101.06804.
20. Rouzrokh P, Khosravi B, Faghani S, Moassefi M, Vera Garcia DV, Singh Y, et al. Mitigating Bias in Radiology Machine Learning: 1. Data Handling. Radiol Artif Intell. 2022; 4(5):e210290.
https://doi.org/10.1148/ryai.210290.
Article
22. Larson PA, Berland LL, Griffith B, Kahn CE Jr, Liebscher LA. Actionable findings and the role of IT support: report of the ACR Actionable Reporting Work Group. J Am Coll Radiol. 2014; 11(6):552–8.
https://doi.org/10.1016/j.jacr.2013.12.016.
Article
23. Stureborg R, Alikaniotis D, Suhara Y. Large language models are inconsistent and biased evaluators [Internet]. Ithaca (NY): arXiv.org;2024. [cited at 2025 Jul 1]. Available from:
https://arxiv.org/abs/2405.01724.
24. Krishna S, Bhambra N, Bleakney R, Bhayana R. Evaluation of reliability, repeatability, robustness, and confidence of GPT-3.5 and GPT-4 on a radiology board-style examination. Radiology. 2024; 311(2):e232715.
https://doi.org/10.1148/radiol.232715.
Article
25. Yan A, McAuley J, Lu X, Du J, Chang EY, Gentili A, et al. RadBERT: adapting transformer-based language models to radiology. Radiol Artif Intell. 2022; 4(4):e210258.
https://doi.org/10.1148/ryai.210258.
Article
26. Zaman S, Petri C, Vimalesvaran K, Howard J, Bharath A, Francis D, et al. Automatic diagnosis labeling of cardiovascular MRI by using semisupervised natural language processing of text reports. Radiol Artif Intell. 2021; 4(1):e210085.
https://doi.org/10.1148/ryai.210085.
Article
27. Tejani AS, Ng YS, Xi Y, Fielding JR, Browning TG, Rayan JC. Performance of multiple pretrained BERT models to automate and accelerate data annotation for large datasets. Radiol Artif Intell. 2022; 4(4):e220007.
https://doi.org/10.1148/ryai.220007.
Article
28. Weng KH, Liu CF, Chen CJ. Deep learning approach for negation and speculation detection for automated important finding flagging and extraction in radiology report: internal validation and technique comparison study. JMIR Med Inform. 2023; 11:e46348.
https://doi.org/10.2196/46348.
Article
29. Lopez-Ubeda P, Martin-Noguerol T, Luna A. Automatic classification and prioritisation of actionable BI-RADS categories using natural language processing models. Clin Radiol. 2024; 79(1):e1–e7.
https://doi.org/10.1016/j.crad.2023.09.009.
Article
30. Wei J, Wei J, Tay Y, Tran D, Webson A, Lu Y, et al. Larger language models do in-context learning differently [Internet]. Ithaca (NY): arXiv.org;2023. [cited at 2025 Jul 1]. Available from:
https://arxiv.org/abs/2303.03846.