J Korean Med Sci.  2015 Jan;30(1):7-15. 10.3346/jkms.2015.30.1.7.

A De-identification Method for Bilingual Clinical Texts of Various Note Types

Affiliations
  • 1Department of Biomedical Informatics, Asan Medical Center, Seoul, Korea. rufiji@gmail.com
  • 2Office of Clinical Research Information, Asan Medical Center, Seoul, Korea.
  • 3Department of Clinical Epidemiology and Biostatistics, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea.
  • 4Department of Pulmonary and Critical Care Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea.
  • 5Department of Oncology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea.
  • 6Department of Emergency Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea.
  • 7Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA.

Abstract

De-identification of personal health information is essential in order not to require written patient informed consent. Previous de-identification methods were proposed using natural language processing technology in order to remove the identifiers in clinical narrative text, although these methods only focused on narrative text written in English. In this study, we propose a regular expression-based de-identification method used to address bilingual clinical records written in Korean and English. To develop and validate regular expression rules, we obtained training and validation datasets composed of 6,039 clinical notes of 20 types and 5,000 notes of 33 types, respectively. Fifteen regular expression rules were constructed using the development dataset and those rules achieved 99.87% precision and 96.25% recall for the validation dataset. Our de-identification method successfully removed the identifiers in diverse types of bilingual clinical narrative texts. This method will thus assist physicians to more easily perform retrospective research.

Keyword

De-identification; Anonymization; Clinical Text; Bilingual Text; Patient Privacy; Medical Informatics; Text Mining

MeSH Terms

Algorithms
*Data Anonymization
*Electronic Health Records
*Health Records, Personal
Humans
Multilingualism
Natural Language Processing
Research Design

Figure

  • Fig. 1 The process of developing the de-identification method.

  • Fig. 2 Development and validation dataset.

  • Fig. 3 Fifteen regular expression rules for de-identification.


Cited by  8 articles

Issues and Solutions of Healthcare Data De-identification: the Case of South Korea
Soo-Yong Shin
J Korean Med Sci. 2018;33(5):.    doi: 10.3346/jkms.2018.33.e41.

Prescription Refill Gap of Endocrine Treatment from Electronic Medical Records as a Prognostic Factor in Breast Cancer Patients
Yura Lee, Yu Rang Park, Ji Sung Lee, Sae Byul Lee, Il Yong Chung, Byung Ho Son, Sei Hyun Ahn, Jong Won Lee
J Breast Cancer. 2019;22(1):86-95.    doi: 10.4048/jbc.2019.22.e14.

Status and Direction of Healthcare Data in Korea for Artificial Intelligence
Yu Rang Park, Soo-Yong Shin
Hanyang Med Rev. 2017;37(2):86-92.    doi: 10.7599/hmr.2017.37.2.86.

Data Pseudonymization in a Range That Does Not Affect Data Quality: Correlation with the Degree of Participation of Clinicians
Soo-Yong Shin, Hun-Sung Kim
J Korean Med Sci. 2021;36(44):e299.    doi: 10.3346/jkms.2021.36.e299.

Extracting Structured Genotype Information from Free-Text HLA Reports Using a Rule-Based Approach
Kye Hwa Lee, Hyo Jung Kim, Yi-Jun Kim, Ju Han Kim, Eun Young Song
J Korean Med Sci. 2020;35(12):.    doi: 10.3346/jkms.2020.35.e78.

Protected Health Information Recognition by Fine-Tuning a Pre-training Transformer Model
Seo Hyun Oh, Min Kang, Youngho Lee
Healthc Inform Res. 2022;28(1):16-24.    doi: 10.4258/hir.2022.28.1.16.

Diagnostic Accuracy of Computed Tomography in Predicting Primary Aldosteronism Subtype According to Age
Seung Hun Lee, Jong Woo Kim, Hyun-Ki Yoon, Jung-Min Koh, Chan Soo Shin, Sang Wan Kim, Jung Hee Kim
Endocrinol Metab. 2021;36(2):401-412.    doi: 10.3803/EnM.2020.901.

Contralateral Suppression at Adrenal Venous Sampling Is Associated with Renal Impairment Following Adrenalectomy for Unilateral Primary Aldosteronism
Ye Seul Yang, Seung Hun Lee, Jung Hee Kim, Jee Hee Yoo, Jung Hyun Lee, Seo Young Lee, A Ram Hong, Dong-Hwa Lee, Jung-Min Koh, Jae Hyeon Kim, Sang Wan Kim
Endocrinol Metab. 2021;36(4):875-884.    doi: 10.3803/EnM.2021.1047.


Reference

1. The Office of the National Coordinator for Health Information. Update on the adoption of health information technology and related efforts to facilitate the electronic use and exchange of health information. accessed on 19 February 2014. Available at http://www.healthit.gov/sites/default/files/rtc_adoption_of_healthit_ and_relatedefforts.pdf.
2. Yoon D, Chang BC, Kang SW, Bae H, Park RW. Adoption of electronic health records in Korean tertiary teaching and general hospitals. Int J Med Inform. 2012; 81:196–203.
3. Fuad A, Hsu CY. High rate EHR adoption in Korea and health IT rise in Asia. Int J Med Inform. 2012; 81:649–650.
4. Ryu HJ, Kim WS, Lee JH, Min SW, Kim SJ, Lee YS, Lee YH, Nam SW, Eo GS, Seo SG, et al. Asan medical information system for healthcare quality improvement. Healthc Inform Res. 2010; 16:191–197.
5. Embi PJ, Kaufman SE, Payne PR. Biomedical informatics and outcomes research: enabling knowledge-driven health care. Circulation. 2009; 120:2393–2399.
6. Jensen PB, Jensen LJ, Brunak S. Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet. 2012; 13:395–405.
7. Yoo S, Kim S, Lee KH, Jeong CW, Youn SW, Park KU, Moon SY, Hwang H. Electronically implemented clinical indicators based on a data warehouse in a tertiary hospital: its clinical benefit and effectiveness. Int J Med Inform. 2014; 83:507–516.
8. Shin SY, Kim WS, Lee JH. Characteristics desired in clinical data warehouse for biomedical research. Healthc Inform Res. 2014; 20:109–116.
9. Lyu Y, Shin Y, Choi HJ, Park J, Lee MS, Kim HJ, Shin SY, Lee JH. The analyzing of clinical information requesting pattern for clinical research data warehouse in Asan Medical Center. In : Proceedings of the Korean Society of Medical Informatics 2013 Spring Symposium; 2013. p. 142–143.
10. Choi HJ, Ryu HJ, Lyu Y, Shin Y, Park J, Shin SY, Lee JH. A survey on clinical research using EMR. In : Proceedings of the Korean Society of Medical Informatics 2012 Spring Symposium;
11. The Office for Civil Rights. Guidance regarding methods for de-identification of protected health information in accordance with the health insurance portability and accountability act (HIPAA) privacy rule. accessed on 7 February 2014. Available at http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/De-identification/hhs_deid_guidance.pdf.
12. McGraw D. Building public trust in uses of Health Insurance Portability and Accountability Act de-identified data. J Am Med Inform Assoc. 2013; 20:29–34.
13. Liu J, Erdal S, Silvey SA, Ding J, Riedel JD, Marsh CB, Kamal J. Toward a fully de-identified biomedical information warehouse. AMIA Annu Symp Proc. 2009; 2009:370–374.
14. Shin SY, Lyu Y, Shin Y, Choi HJ, Park J, Kim WS, Lee JH. Lessons learned from development of de-identification system for biomedical research in a Korean Tertiary Hospital. Healthc Inform Res. 2013; 19:102–109.
15. Shin SY, Lyu Y, Shin Y, Choi HJ, Park J, Kim WS, Lee JH. De-identification method for bilingual EMR free texts. In : The American Medical Informatics Association 2013 Symposium; 2013. p. 1290.
16. Uzuner O, Luo Y, Szolovits P. Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc. 2007; 14:550–563.
17. Neamatullah I, Douglass MM, Lehman LW, Reisner A, Villarroel M, Long WJ, Szolovits P, Moody GB, Mark RG, Clifford GD. Automated de-identification of free-text medical records. BMC Med Inform Decis Mak. 2008; 8:32.
18. Loukides G, Gkoulalas-Divanis A, Malin B. Anonymization of electronic medical records for validating genome-wide association studies. Proc Natl Acad Sci U S A. 2010; 107:7898–7903.
19. Meystre SM, Friedlin FJ, South BR, Shen S, Samore MH. Automatic de-identification of textual documents in the electronic health record: a review of recent research. BMC Med Res Methodol. 2010; 10:70.
20. El Emam K. Methods for the de-identification of electronic health records for genomic research. Genome Med. 2011; 3:25.
21. El Emam K, Arbuckle L, Koru G, Eze B, Gaudette L, Neri E, Rose S, Howard J, Gluck J. De-identification methods for open health data: the case of the Heritage Health Prize claims dataset. J Med Internet Res. 2012; 14:e33.
22. Deleger L, Molnar K, Savova G, Xia F, Lingren T, Li Q, Marsolo K, Jegga A, Kaiser M, Stoutenborough L, et al. Large-scale evaluation of automated clinical note de-identification and its impact on information extraction. J Am Med Inform Assoc. 2013; 20:84–94.
23. Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform. 2008; 128–144.
24. Grouin C, Rosier A, Dameron O, Zweigenbaum P. Testing tactics to localize de-identification. Stud Health Technol Inform. 2009; 150:735–739.
25. Velupillai S, Dalianis H, Hassel M, Nilsson GH. Developing a standard for de-identifying electronic patient records written in Swedish: precision, recall and F-measure in a manual and computerized annotation trial. Int J Med Inform. 2009; 78:e19–e26.
26. Ruch P, Baud RH, Rassinoux AM, Bouillon P, Robert G. Medical document anonymization with a semantic lexicon. In : Proc AMIA Symp; 2000. p. 729–733.
27. Kim I, Lee J, Kim I, Kwak Y. A new method of registering the XML-based clinical document architecture supporting pseudonymization in clinical document registry framework. J Korean Institute Inform Sci and Engineers: Software and Applications. 2007; 34:918–928.
28. Lee HJ, Du R. Anonymity of medical brain images. Inst Electron Eng Korea. 2012; 49:81–87.
29. Kwon YJ, Yeon JH, Lee SG. Anonymization techniques suitable for real medical datasets. In : Proceedings of Korea Computer Congress; 2011. p. 80–83.
30. Jurafsky D, Martin JH. Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition. Upper Saddle River, N.J.: Prentice Hall;2000.
31. Turchin A, Kolatkar NS, Grant RW, Makhni EC, Pendergrass ML, Einbinder JS. Using regular expressions to abstract blood pressure and treatment intensification information from the text of physician notes. J Am Med Inform Assoc. 2006; 13:691–695.
32. International Standards Organization. ISO/TS 25237:2008: Health informatics -- Pseudonymization. accessed on 7 February 2014. Available at http://www.iso.org/iso/catalogue_ detail?csnumber=42807.
33. National Institutes of health (U.S.) Department of Health and Human Services. How Can Covered Entities Use and Disclose Protected Health Information for Research and Comply with the Privacy Rule? accessed on 7 February 2014. Available at http://privacyruleandresearch.nih.gov/pr_08.asp.
34. University of California San Francisco, The Committee on Human Research. The human research protection program. accessed on 7 February 2014. Available at http://www.research.ucsf.edu/chr/HIPAA/chrHIPAAfaq.asp.
35. Aberdeen J, Bayer S, Yeniterzi R, Wellner B, Clark C, Hanauer D, Malin B, Hirschman L. The MITRE Identification Scrubber Toolkit: design, training, and assessment. Int J Med Inform. 2010; 79:849–859.
36. Beckwith BA, Mahaadevan R, Balis UJ, Kuo F. Development and evaluation of an open source software tool for de-identification of pathology reports. BMC Med Inform Decis Mak. 2006; 6:12.
37. Szarvas G, Farkas R, Busa-Fekete R. State-of-the-art anonymization of medical records using an iterative machine learning framework. J Am Med Inform Assoc. 2007; 14:574–580.
38. Uzuner O, Sibanda TC, Luo Y, Szolovits P. A de-identifier for medical discharge summaries. Artif Intell Med. 2008; 42:13–35.
39. Morrison FP, Li L, Lai AM, Hripcsak G. Repurposing the clinical record: can an existing natural language processing system de-identify clinical notes? J Am Med Inform Assoc. 2009; 16:37–39.
Full Text Links
  • JKMS
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr