J Korean Med Sci.  2021 Nov;36(44):e299. 10.3346/jkms.2021.36.e299.

Data Pseudonymization in a Range That Does Not Affect Data Quality: Correlation with the Degree of Participation of Clinicians

Affiliations
  • 1Department of Digital Health, Samsung Advanced Institute for Health Sciences & Technology (SAIHST), Sungkyunkwan University, Seoul, Korea
  • 2Center for Research Resource Standardization, Samsung Medical Center, Seoul, Korea
  • 3Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul, Korea
  • 4Division of Endocrinology and Metabolism, Department of Internal Medicine, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea

Abstract

Personal medical information is an essential resource for research; however, there are laws that regulate its use, and it typically has to be pseudonymized or anonymized. When data are anonymized, the quantity and quality of extractable information decrease significantly. From the perspective of a clinical researcher, a method of achieving pseudonymized data without degrading data quality while also preventing data loss is proposed herein. As the level of pseudonymization varies according to the research purpose, the pseudonymization method applied should be carefully chosen. Therefore, the active participation of clinicians is crucial to transform the data according to the research purpose. This can contribute to data security by simply transforming the data through secondary data processing. Case studies demonstrated that, compared with the initial baseline data, there was a clinically significant difference in the number of datapoints added with the participation of a clinician (from 267,979 to 280,127 points, P < 0.001). Thus, depending on the degree of clinician participation, data anonymization may not affect data quality and quantity, and proper data quality management along with data security are emphasized. Although the pseudonymization level and clinical use of data have a trade-off relationship, it is possible to create pseudonymized data while maintaining the data quality required for a given research purpose. Therefore, rather than relying solely on security guidelines, the active participation of clinicians is important.

Keyword

Cardiovascular Diseases; Data Anonymization; Data Quality; De-identification; Electronic Health Records

Figure

  • Fig. 1 Example of changing the height of the patient. Based on the BMI value of 25.0 kg/m2, which is the criterion for obesity, it is important for researchers to consider that the treatment method may be completely different because of this simple conversion of values.BMI = body mass index.

  • Fig. 2 Real clinical example of transformation of data according to research purpose. There was a clinically significant difference in the number of datapoints added from the number of initial baseline data (from 267,979 to 280,127 points, P < 0.001).HDL-C = high-density lipoprotein cholesterol, LDL-C = low-density lipoprotein cholesterol, TC = total cholesterol, TG = triglyceride.

  • Fig. 3 Example of emphasizing anonymization by consolidating diagnosis names. Anonymization is emphasized, but the quality of the data is not affected. All data has been completely changed or added differently from the original data.HF = heart failure, MACE = major adverse cardiac events, MI = myocardial infarction.


Cited by  2 articles

A Study on Methodologies of Drug Repositioning Using Biomedical Big Data: A Focus on Diabetes Mellitus
Suehyun Lee, Seongwoo Jeon, Hun-Sung Kim
Endocrinol Metab. 2022;37(2):195-207.    doi: 10.3803/EnM.2022.1404.

Long-Term Changes in HbA1c According to Blood Glucose Control Status During the First 3 Months After Visiting a Tertiary University Hospital
Hyunah Kim, Da Young Jung, Seung-Hwan Lee, Jae-Hyoung Cho, Hyeon Woo Yim, Hun-Sung Kim
J Korean Med Sci. 2022;37(38):e281.    doi: 10.3346/jkms.2022.37.e281.


Reference

1. Franklin JM, Schneeweiss S. When and how can real world data analyses substitute for randomized controlled trials? Clin Pharmacol Ther. 2017; 102(6):924–933. PMID: 28836267.
Article
2. Kim HS, Kim JH. Proceed with caution when using real world data and real world evidence. J Korean Med Sci. 2019; 34(4):e28. PMID: 30686950.
Article
3. Kim HS, Lee S, Kim JH. Real-world evidence versus randomized controlled trial: clinical research based on electronic medical records. J Korean Med Sci. 2018; 33(34):e213. PMID: 30127705.
Article
4. Mahmoudi E, Kamdar N, Kim N, Gonzales G, Singh K, Waljee AK. Use of electronic medical records in development and validation of risk prediction models of hospital readmission: systematic review. BMJ. 2020; 369:m958. PMID: 32269037.
Article
5. Kim HS, Kim H, Jeong YJ, Kim TM, Yang SJ, Baik SJ, et al. Development of clinical data mart of HMG-CoA reductase inhibitor for varied clinical research. Endocrinol Metab. 2017; 32(1):90–98.
Article
6. Lee J, Kim TM, Kim H, Lee SH, Cho JH, Lee H, et al. Differences in clinical outcomes between patients with and without hypoglycemia during hospitalization: a retrospective study using real-world evidence. Diabetes Metab J. 2020; 44(4):555–565. PMID: 32431110.
Article
7. Choi J, Bove LA, Tarte V, Choi WJ. Impact of simulated electronic health records on informatics competency of students in informatics course. Healthc Inform Res. 2021; 27(1):67–72. PMID: 33611878.
Article
8. Shin SY. Privacy protection and data utilization. Healthc Inform Res. 2021; 27(1):1–2. PMID: 33611870.
Article
9. Shin SY, Park YR, Shin Y, Choi HJ, Park J, Lyu Y, et al. A de-identification method for bilingual clinical texts of various note types. J Korean Med Sci. 2015; 30(1):7–15. PMID: 25552878.
Article
10. Chevrier R, Foufi V, Gaudet-Blavignac C, Robert A, Lovis C. Use and understanding of anonymization and de-identification in the biomedical literature: scoping review. J Med Internet Res. 2019; 21(5):e13484. PMID: 31152528.
Article
11. Korea Legislation Research Institute. Personal Information Protection Act. Updated 2020. Accessed Mar 1, 2021. https://elaw.klri.re.kr/eng_service/lawView.do?hseq=53044&lang=ENG .
12. Ministry of Culture, Sports and Tourism (KR). Three Data Bills. Updated 2020. Accessed Mar 1, 2021. http://www.korea.kr/special/policyCurationView.do?newsId=148867915 .
13. Lee D, Park M, Chang S, Ko H. Protecting and utilizing health and medical big data: policy perspectives from Korea. Healthc Inform Res. 2019; 25(4):239–247. PMID: 31777667.
Article
14. Choi HJ, Lee MJ, Choi CM, Lee J, Shin SY, Lyu Y, et al. Establishing the role of honest broker: bridging the gap between protecting personal health data and clinical research efficiency. PeerJ. 2015; 3:e1506. PMID: 26713253.
Article
15. Personal Information Protection Commission (KR). Pseudonymization, Combination of Pseudonymized Information. Updated 2020. Accessed Mar 1, 2021. https://www.pipc.go.kr/eng/user/lgp/bnp/pseudonymization.do .
16. Ministry of Health and Welfare (KR). Establish guidelines for the use of health care data for safe use of pseudonym information in the field of health care. Updated 2020. Accessed Mar 1, 2021. http://www.mohw.go.kr/react/al/sal0301vw.jsp?PAR_MENU_ID=04&MENU_ID=0403&page=2&CONT_SEQ=360056 .
17. The Hankyoreh. Civil society “The government encourages commercial use of medical information”. Updated 2020. Accessed Mar 1, 2021. http://www.hani.co.kr/arti/economy/it/963693.html .
18. Kim HS, Kim DJ, Yoon KH. Medical big data is not yet available: Why we need realism rather than exaggeration. Endocrinol Metab. 2019; 34(4):349–354.
Article
19. Shin SY. Issues and solutions of healthcare data de-identification: the case of South Korea. J Korean Med Sci. 2018; 33(5):e41. PMID: 29349950.
Article
20. Jones W, Bruce H, Bates MJ, Belkin N, Bergman O, Marshall C. Personal information management in the present and future perfect: reports from a special NSF-sponsored workshop. Proc Am Soc Info Sci Tech. 2005; 42(1):
Article
21. Waling L, Sell A. A new vision on personal information managing and sharing using instant messaging. Updated 2004. Accessed Mar 1, 2021. https://www.researchgate.net/publication/31597236_A_New_Vision_on_Personal_Information_Managing_and_Sharing_Using_Instant_Messaging .
22. Mandl KD, Perakslis ED. HIPAA and the leak of “deidentified” EHR data. N Engl J Med. 2021; 384(23):2171–2173. PMID: 34110112.
Article
23. Kim H, Baik SY, Yang SJ, Kim TM, Lee SH, Cho JH, et al. Clinical experiences and case review of angiotensin II receptor blocker-related angioedema in Korea. Basic Clin Pharmacol Toxicol. 2019; 124(1):115–122. PMID: 30003686.
Article
24. Mehra MR, Ruschitzka F, Patel AN. Retraction-Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis. Lancet. 2020; 395(10240):1820. PMID: 32511943.
Article
25. Kim HS, Kim H, Lee H, Park B, Park S, Lee SH, et al. Analysis and comparison of statin prescription patterns and outcomes according to clinical department. J Clin Pharm Ther. 2016; 41(1):70–77. PMID: 26791968.
Article
26. Huh S. Protection of personal information in medical journal publications. Neurointervention. 2019; 14(1):1–8. PMID: 30776876.
Article
27. Galloway A. Estimating actual height in the older individual. J Forensic Sci. 1988; 33(1):126–136. PMID: 3351449.
Article
28. Hartman T, Howell MD, Dean J, Hoory S, Slyper R, Laish I, et al. Customization scenarios for de-identification of clinical notes. BMC Med Inform Decis Mak. 2020; 20(1):14. PMID: 32000770.
Article
29. Purdam K, Elliot M. A case study of the impact of statistical disclosure control on data quality in the individual UK samples of anonymised records. Environ Plan A Econ Space. 2007; 39(5):1101–1118.
Article
30. Friedewald WT, Levy RI, Fredrickson DS. Estimation of the concentration of low-density lipoprotein cholesterol in plasma, without use of the preparative ultracentrifuge. Clin Chem. 1972; 18(6):499–502. PMID: 4337382.
Article
31. Nam GE, Park HS. Perspective on diagnostic criteria for obesity and abdominal obesity in Korean adults. J Obes Metab Syndr. 2018; 27(3):134–142. PMID: 31089555.
Article
32. El Sanadi CE, Ji X, Kattan MW. 3-point major cardiovascular event outcome for patients with T2D treated with dipeptidyl peptidase-4 inhibitor or glucagon-like peptide-1 receptor agonist in addition to metformin monotherapy. Ann Transl Med. 2020; 8(21):1345. PMID: 33313090.
Article
33. Hermans WR, Foley DP, Rensing BJ, Rutsch W, Heyndrickx GR, Danchin N, et al. Usefulness of quantitative and qualitative angiographic lesion morphology, and clinical characteristics in predicting major adverse cardiac events during and after native coronary balloon angioplasty. Am J Cardiol. 1993; 72(1):14–20. PMID: 8517422.
Article
Full Text Links
  • JKMS
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr