J Korean Med Sci.  2024 Apr;39(14):e127. 10.3346/jkms.2024.39.e127.

Evaluating Linkage Quality of Population-Based Administrative Data for Health Service Research

Affiliations
  • 1Big Data Linkage Division, Health Insurance Review & Assessment Service, Wonju, Korea
  • 2Digital Medical Technology Listing Division, Health Insurance Review & Assessment Service, Wonju, Korea
  • 3DRG Administration Division, Health Insurance Review & Assessment Service, Wonju, Korea
  • 4Center for Research on Big Data Information, Korea Institute for Health and Social Affairs, Sejong, Korea
  • 5Division of Data Science, Yonsei University, Wonju, Korea

Abstract

Background
To overcome the limitations of relying on data from a single institution, many researchers have studied data linkage methodologies. Data linkage includes errors owing to legal issues surrounding personal information and technical issues related to data processing. Linkage errors affect selection bias, and external and internal validity. Therefore, quality verification for each connection method with adherence to personal information protection is an important issue. This study evaluated the linkage quality of linked data and analyzed the potential bias resulting from linkage errors.
Methods
This study analyzed claims data submitted to the Health Insurance Review and Assessment Service (HIRA DATA). The linkage errors of the two deterministic linkage methods were evaluated based on the use of the match key. The first deterministic linkage uses a unique identification number, and the second deterministic linkage uses the name, gender, and date of birth as a set of partial identifiers. The linkage error included in this deterministic linkage method was compared with the absolute standardized difference (ASD) of Cohen’s according to the baseline characteristics, and the linkage quality was evaluated through the following indicators: linked rate, false match rate, missed match rate, positive predictive value, sensitivity, specificity, and F1-score.
Results
For the deterministic linkage method that used the name, gender, and date of birth as a set of partial identifiers, the true match rate was 83.5 and the missed match rate was 16.5. Although there was bias in some characteristics of the data, most of the ASD values were less than 0.1, with no case greater than 0.5. Therefore, it is difficult to determine whether linked data constructed with deterministic linkages have substantial differences.
Conclusion
This study confirms the possibility of building health and medical data at the national level as the first data linkage quality verification study using big data from the HIRA. Analyzing the quality of linkages is crucial for comprehending linkage errors and generating reliable analytical outcomes. Linkers should increase the reliability of linked data by providing linkage error-related information to researchers. The results of this study will serve as reference data to increase the reliability of multicenter data linkage studies.

Keyword

Data Linkage; Deterministic Linkage; Linkage Quality; Linkage Errors; Bias; National Health Claim Data

Reference

1. Kim TJ, Lee JS, Kim JW, Oh MS, Mo H, Lee CH, et al. Building linked big data for stroke in Korea: linkage of stroke registry and national health insurance claims data. J Korean Med Sci. 2018; 33(53):e343. PMID: 30595684.
2. Jutte DP, Roos LL, Brownell MD. Administrative record linkage as a tool for public health research. Annu Rev Public Health. 2011; 32(1):91–108. PMID: 21219160.
3. Lee SM. Personal information utilization and role of government. Regulatory Reform Task and Case Study for promoting New Industries. Seoul, Korea: Jinhan M&B;2019. p. 199–254.
4. Kelman CW, Bass AJ, Holman CD. Research use of linked health data--a best practice protocol. Aust N Z J Public Health. 2002; 26(3):251–255. PMID: 12141621.
5. Gilbert R, Lafferty R, Hagger-Johnson G, Harron K, Zhang LC, Smith P, et al. GUILD: GUidance for Information about Linking Data sets. J Public Health (Oxf). 2018; 40(1):191–198. PMID: 28369581.
6. Personal Information Protection commission. Pseudonymised information procedure guideline. Updated 2022. Accessed March 20, 2023. https://www.pipc.go.kr/np/cop/bbs/selectBoardList.do?bbsId=BS217&mCode=D010030000 .
7. Harron KL, Doidge JC, Knight HE, Gilbert RE, Goldstein H, Cromwell DA, et al. A guide to evaluating linkage quality for the analysis of linked data. Int J Epidemiol. 2017; 46(5):1699–1710. PMID: 29025131.
8. Harron K, Dibben C, Boyd J, Hjern A, Azimaee M, Barreto ML, et al. Challenges in administrative data linkage for research. Big Data Soc. 2017; 4(2):2053951717745678. PMID: 30381794.
9. Harron K, Wade A, Gilbert R, Muller-Pebody B, Goldstein H. Evaluating bias due to data linkage error in electronic healthcare records. BMC Med Res Methodol. 2014; 14(1):36. PMID: 24597489.
10. Sanmartin C, Trudeau R, Trainor C, Dasylva A, Dosman D, Evra R, et al. Record linkage project process model. Updated 2017. Accessed March 20, 2023. https://www150.statcan.gc.ca/n1/pub/12-605-x/12-605-x2017001-eng.pdf .
11. Dunn HL. Record linkage. Am J Public Health Nations Health. 1946; 36(12):1412–1416.
12. Newcombe HB, Kennedy JM, Axford SJ, James AP. Automatic linkage of vital records. Science. 1959; 130(3381):954–959. PMID: 14426783.
13. Fellegi IP, Sunter AB. A theory for record linkage. J Am Stat Assoc. 1969; 64(328):1183–1210.
14. Krewski D, Dewanji A, Wang Y, Bartlett S, Zielinski JM, Mallick R. The effect of record linkage errors on risk estimates in cohort mortality studies. Surv Methodol. 2005; 31(1):13–21.
15. Bohensky MA, Jolley D, Sundararajan V, Evans S, Pilcher DV, Scott I, et al. Data linkage: a powerful research tool with potential problems. BMC Health Serv Res. 2010; 10(1):346. PMID: 21176171.
16. Ford JB, Roberts CL, Taylor LK. Characteristics of unmatched maternal and baby records in linked birth records and hospital discharge data. Paediatr Perinat Epidemiol. 2006; 20(4):329–337. PMID: 16879505.
17. Lariscy JT. Differential record linkage by Hispanic ethnicity and age in linked mortality studies: implications for the epidemiologic paradox. J Aging Health. 2011; 23(8):1263–1284. PMID: 21934120.
18. McGeechan K, Kricker A, Armstrong B, Stubbs J. Evaluation of linked cancer registry and hospital records of breast cancer. Aust N Z J Public Health. 1998; 22(7):765–770. PMID: 9889440.
19. Haas JS, Brandenburg JA, Udvarhelyi IS, Epstein AM. Creating a comprehensive database to evaluate health coverage for pregnant women: the completeness and validity of a computerized linkage algorithm. Med Care. 1994; 32(10):1053–1057. PMID: 7934271.
20. Cohen J. The statistical power of abnormal-social psychological research: a review. J Abnorm Soc Psychol. 1962; 65(3):145–153. PMID: 13880271.
21. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ, USA: Lawrence Erlbaum Associates;1955. p. 181–184.
22. Austin PC. Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat Med. 2009; 28(25):3083–3107. PMID: 19757444.
23. Moore CL, Amin J, Gidding HF, Law MG. A new method for assessing how sensitivity and specificity of linkage studies affects estimation. PLoS One. 2014; 9(7):e103690. PMID: 25068293.
24. Baldi I, Ponti A, Zanetti R, Ciccone G, Merletti F, Gregori D. The impact of record-linkage bias in the Cox model. J Eval Clin Pract. 2010; 16(1):92–96. PMID: 20367819.
25. Paixão ES, Campbell OM, Rodrigues LC, Teixeira MG, Costa MD, Brickley EB, et al. Validating linkage of multiple population-based administrative databases in Brazil. PLoS One. 2019; 14(3):e0214050. PMID: 30921353.
26. Benchimol EI, Smeeth L, Guttmann A, Harron K, Moher D, Petersen I, et al. The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) statement. PLoS Med. 2015; 12(10):e1001885. PMID: 26440803.
27. Neter J, Maynes ES, Ramanathan R. The effect of mismatching on the measurement of response errors. J Am Stat Assoc. 1965; 60(312):1005–1027.
28. Scheuren F, Winkler WE. Regression analysis of data files that are computer matched – part I. Surv Methodol. 1993; 19:39–58.
29. Lahiri P, Larsen M. Regression analysis with linked data. J Am Stat Assoc. 2005; 100(469):222–230.
30. Di Consiglio L, Tuoto T. When adjusting for the bias due to linkage errors: a sensitivity analysis. Stat J IAOS. 2018; 34(4):589–597.
31. Ong TC, Duca LM, Kahn MG, Crume TL. A hybrid approach to record linkage using a combination of deterministic and probabilistic methodology. J Am Med Inform Assoc. 2020; 27(4):505–513. PMID: 32049329.
Full Text Links
  • JKMS
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr