J Korean Med Sci.  2024 Dec;39(46):e291. 10.3346/jkms.2024.39.e291.

Using Large Language Models to Extract Core Injury Information From Emergency Department Notes

Affiliations
  • 1Department of Biomedical Engineering, Seoul National University College of Medicine, Seoul, Korea
  • 2Laboratory of Emergency Medical Services, Seoul National University Hospital Biomedical Research Institute, Seoul, Korea
  • 3Department of Emergency Medicine, Seoul National University Hospital, Seoul, Korea
  • 4Department of Emergency Medicine, Seoul National University College of Medicine, Seoul, Korea
  • 5Office of Hospital Information, Seoul National University Hospital, Seoul, Korea
  • 6Department of Emergency Medicine, Seoul National University Bundang Hospital, Seongnam, Korea
  • 7Disaster Medicine Research Center, Seoul National University Medical Research Center, Seoul, Korea

Abstract

Background
Injuries pose a significant global health challenge due to their high incidence and mortality rates. Although injury surveillance is essential for prevention, it is resource-intensive. This study aimed to develop and validate locally deployable large language models (LLMs) to extract core injury-related information from Emergency Department (ED) clinical notes.
Methods
We conducted a diagnostic study using retrospectively collected data from January 2014 to December 2020 from two urban academic tertiary hospitals. One served as the derivation cohort and the other as the external test cohort. Adult patients presenting to the ED with injury-related complaints were included. Primary outcomes included classification accuracies for information extraction tasks related to injury mechanism, place of occurrence, activity, intent, and severity. We fine-tuned a single generalizable Llama-2 model and five distinct Bidirectional Encoder Representations from Transformers (BERT) models for each task to extract information from initial ED physician notes. The Llama-2 model was able to perform different tasks by modifying the instruction prompt. Data recorded in injury registries provided the gold standard labels. Model performance was assessed using accuracy and macro-average F1 scores.
Results
The derivation and external test cohorts comprised 36,346 and 32,232 patients, respectively. In the derivation cohort’s test set, the Llama-2 model achieved accuracies (95% confidence intervals) of 0.899 (0.889–0.909) for injury mechanism, 0.774 (0.760–0.789) for place of occurrence, 0.679 (0.665–0.694) for activity, 0.972 (0.967–0.977) for intent, and 0.935 (0.926–0.943) for severity. The Llama-2 model outperformed the BERT models in accuracy and macro-average F1 scores across all tasks in both cohorts. Imposing constraints on the Llama-2 model to avoid uncertain predictions further improved its accuracy.
Conclusion
Locally deployable LLMs, trained to extract core injury-related information from free-text ED clinical notes, demonstrated good performance. Generative LLMs can serve as versatile solutions for various injury-related information extraction tasks.

Keyword

Large Language Model; Injuries; Information Extraction; Clinical Note; Emergency Department

Figure

  • Fig. 1 Overall study flowchart.SNUH = Seoul National University Hospital, ED = Emergency Department, SNUBH = Seoul National University Bundang Hospital.

  • Fig. 2 Pipelines for (A) model development and (B) prediction phases.ED = Emergency Department.

  • Fig. 3 Percentages, accuracies, and macro-average F1 scores of Llama-2 predictions made across various probability thresholds in (A) the test set of the derivation cohort and (B) the external test cohort. Cases predicted as “others” are excluded in the analysis for different probability thresholds (0.5, 0.7, and 0.9). The macro-average F1 scores for various thresholds are computed by taking the average of F1 scores, excluding the “others” class. Error bars indicate 95% confidence intervals.


Reference

1. James SL, Castle CD, Dingels ZV, Fox JT, Hamilton EB, Liu Z, et al. Global injury morbidity and mortality from 1990 to 2017: results from the Global Burden of Disease Study 2017. Inj Prev. 2020; 26(Supp 1):i96–i114. PMID: 32332142.
2. Goldstick JE, Cunningham RM, Carter PM. Current causes of death in children and adolescents in the united states. N Engl J Med. 2022; 386(20):1955–1956. PMID: 35443104.
3. GBD 2019 Diseases and Injuries Collaborators. Global burden of 369 diseases and injuries in 204 countries and territories, 1990-2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet. 2020; 396(10258):1204–1222. PMID: 33069326.
4. Peden AE, Cullen P, Bhandari B, Testa L, Wang A, Ma T, et al. A systematic review of the evidence for effectiveness of interventions to address transport and other unintentional injuries among adolescents. J Safety Res. 2023; 85:321–338. PMID: 37330882.
5. Mirani N, Ayatollahi H, Khorasani-Zavareh D. Injury surveillance information system: a review of the system requirements. Chin J Traumatol. 2020; 23(3):168–175. PMID: 32334919.
6. Moore L, Clark DE. The value of trauma registries. Injury. 2008; 39(6):686–695. PMID: 18511052.
7. Chang H, Min JY, Yoo D, Lee SU, Hwang SY, Yoon H, et al. National surveillance of injury in the republic of Korea: Increased injury vulnerability in the late middle age. Int J Environ Res Public Health. 2021; 18(3):1210. PMID: 33572916.
8. Navon L, Chen LH, Cowhig M, Wolkin AF. Two decades of nonfatal injury data: a scoping review of the National Electronic Injury Surveillance System-All Injury Program, 2001-2021. Inj Epidemiol. 2023; 10(1):44. PMID: 37679835.
9. Cameron PA, Finch CF, Gabbe BJ, Collins LJ, Smith KL, McNeil JJ. Developing Australia’s first statewide trauma registry: what are the lessons? ANZ J Surg. 2004; 74(6):424–428. PMID: 15191472.
10. Goldberg SI, Niemierko A, Turchin A. Analysis of data errors in clinical research databases. AMIA Annu Symp Proc. 2008; 2008:242–246. PMID: 18998889.
11. Shah NH, Entwistle D, Pfeffer MA. Creation and adoption of large language models in medicine. JAMA. 2023; 330(9):866–869. PMID: 37548965.
12. Kim H, Jin HM, Jung YB, You SC. Patient-friendly discharge summaries in Korea based on ChatGPT: software development and validation. J Korean Med Sci. 2024; 39(16):e148. PMID: 38685890.
13. Sandmann S, Riepenhausen S, Plagwitz L, Varghese J. Systematic analysis of ChatGPT, Google search and Llama 2 for clinical decision support tasks. Nat Commun. 2024; 15(1):2050. PMID: 38448475.
14. Afshar M, Phillips A, Karnik N, Mueller J, To D, Gonzalez R, et al. Natural language processing and machine learning to identify alcohol misuse from the electronic health record in trauma patients: development and internal validation. J Am Med Inform Assoc. 2019; 26(3):254–261. PMID: 30602031.
15. Kulshrestha S, Dligach D, Joyce C, Baker MS, Gonzalez R, O’Rourke AP, et al. Prediction of severe chest injury using natural language processing from the electronic health record. Injury. 2021; 52(2):205–212. PMID: 33131794.
16. Torres-Lopez VM, Rovenolt GE, Olcese AJ, Garcia GE, Chacko SM, Robinson A, et al. Development and validation of a model to identify critical brain injuries using natural language processing of text computed tomography reports. JAMA Netw Open. 2022; 5(8):e2227109. PMID: 35972739.
17. Collins GS, Moons KG, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024; 385:e078378. PMID: 38626948.
18. Choi DH, Song KJ, Shin SD, Ro YS, Hong KJ, Park JH. Epidemiology and outcomes of sports-related traumatic brain injury in children. J Korean Med Sci. 2019; 34(44):e290. PMID: 31726495.
19. Javali RH, Krishnamoorthy , Patil A, Srinivasarangan M, Suraj , Sriharsha . Comparison of injury severity score, new injury severity score, revised trauma score and trauma and injury severity score for mortality prediction in elderly trauma patients. Indian J Crit Care Med. 2019; 23(2):73–77. PMID: 31086450.
20. Kim H, Song KJ, Hong KJ, Park JH, Kim TH, Lee SG. Effects of transport to trauma centers on survival outcomes among severe trauma patients in Korea: nationwide age-stratified analysis. J Korean Med Sci. 2024; 39(6):e60. PMID: 38374629.
21. Choi DH, Lim MH, Kim KH, Shin SD, Hong KJ, Kim S. Development of an artificial intelligence bacteremia prediction model and evaluation of its impact on physician predictions focusing on uncertainty. Sci Rep. 2023; 13(1):13518. PMID: 37598221.
22. Shin SY, Park YR, Shin Y, Choi HJ, Park J, Lyu Y, et al. A De-identification method for bilingual clinical texts of various note types. J Korean Med Sci. 2015; 30(1):7–15. PMID: 25552878.
23. Palmer C. Major trauma and the injury severity score--where should we set the bar? Annu Proc Assoc Adv Automot Med. 2007; 51:13–29. PMID: 18184482.
24. Langley JD, Chalmers DJ. Coding the circumstances of injury: ICD-10 a step forward or backwards? Inj Prev. 1999; 5(4):247–253. PMID: 10628910.
25. Holder Y. Injury Surveillance Guidelines. Geneva, Switzerland: World Health Organization;2001.
26. Yeates EO, Grigorian A, Barrios C, Schellenberg M, Owattanapanich N, Barmparas G, et al. Changes in traumatic mechanisms of injury in Southern California related to COVID-19: penetrating trauma as a second pandemic. J Trauma Acute Care Surg. 2021; 90(4):714–721. PMID: 33395031.
27. Dettmers T, Pagnoni A, Holtzman A, Zettlemoyer L. QLoRA: efficient finetuning of quantized LLMs. In : Proceedings of the Advances in Neural Information Processing Systems 36; December 10–16, 2023; New Orleans, LA, USA. [place unknown]: Neural Information Processing Systems;2024.
28. Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In : Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); June 2–7, 2019; Minneapolis, MN, USA. Stroudsburg, PA, USA: Association for Computational Linguistics;2019.
29. Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020; 36(4):1234–1240. PMID: 31501885.
30. Mermin-Bunnell K, Zhu Y, Hornback A, Damhorst G, Walker T, Robichaux C, et al. Use of natural language processing of patient-initiated electronic health record messages to identify patients with COVID-19 infection. JAMA Netw Open. 2023; 6(7):e2322299. PMID: 37418261.
31. Pourpanah F, Abdar M, Luo Y, Zhou X, Wang R, Lim CP, et al. A review of generalized zero-shot learning methods. IEEE Trans Pattern Anal Mach Intell. 2023; 45(4):4051–4070. PMID: 35849673.
32. Berg HT, van Bakel B, van de Wouw L, Jie KE, Schipper A, Jansen H, et al. ChatGPT and generating a differential diagnosis early in an emergency department presentation. Ann Emerg Med. 2024; 83(1):83–86. PMID: 37690022.
33. Goodman RS, Patrinely JR, Stone CA Jr, Zimmerman E, Donald RR, Chang SS, et al. Accuracy and reliability of chatbot responses to physician questions. JAMA Netw Open. 2023; 6(10):e2336483. PMID: 37782499.
34. Jeyaraman M, Ramasubramanian S, Balaji S, Jeyaraman N, Nallakumarasamy A, Sharma S. ChatGPT in action: harnessing artificial intelligence potential and addressing ethical challenges in medicine, education, and scientific research. World J Methodol. 2023; 13(4):170–178. PMID: 37771867.
35. Mirani N, Ayatollahi H, Khorasani-Zavareh D, Zeraatkar K. Emergency department-based injury surveillance information system: a conceptual model. BMC Emerg Med. 2023; 23(1):61. PMID: 37259025.
36. Rajpurkar P, Chen E, Banerjee O, Topol EJ. AI in health and medicine. Nat Med. 2022; 28(1):31–38. PMID: 35058619.
Full Text Links
  • JKMS
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr