J Korean Med Sci.  2025 Mar;40(8):e110. 10.3346/jkms.2025.40.e110.

Statistical Methods for Baseline Adjustment and Cohort Analysis in Korean National Health Insurance Claims Data: A Review of PSM, IPTW, and Survival Analysis With Future Directions

Affiliations
  • 1Department of Information and Statistics, Department of Bio & Medical Big Data, Research Institute of Natural Science, Gyeongsang National University, Jinju, Korea

Abstract

The utilization of health insurance claims data has expanded significantly, enabling researchers to conduct epidemiological studies on a large scale. This review examines key statistical methods for addressing baseline differences and conducting cohort analyses using Korean National Health Insurance claims data. Propensity score matching and inverse probability of treatment weighting are widely used to mitigate selection bias and enhance causal inference in observational studies. These methods help improve study validity by balancing covariates between treatment and control groups. Additionally, survival analysis techniques, such as the Cox proportional hazards model, are essential for assessing time-toevent outcomes and estimating hazard ratios while accounting for censoring. However, the application of these statistical methods is accompanied by challenges, including unmeasured confounding, instability in weight estimation, and violations of model assumptions. To address these limitations, emerging approaches, such as Doubly robust estimation, machine learning-based causal inference, and the marginal structural model, have gained prominence. These techniques offer greater flexibility and robustness in real-world data analysis. Future research should focus on refining methodologies for integrating highdimensional health datasets and leveraging artificial intelligence to enhance predictive modeling and causal inference. Furthermore, the expansion of international collaborations and the adoption of standardized data models will facilitate large-scale multi-center studies. Ethical considerations, including data privacy and algorithmic transparency, should also be prioritized to ensure responsible data use. Maximizing the utility of health insurance claims data requires interdisciplinary collaboration, methodological advancements, and the implementation of rigorous statistical techniques to support evidence-based healthcare policy and improve public health outcomes.

Keyword

Korean National Health Insurance Claims Data; Selection Bias; Propensity Score; Cox Proportional Hazard Model; Inverse Probability of Treatment Weighting

Reference

1. Mazzali C, Duca P. Use of administrative data in healthcare research. Intern Emerg Med. 2015; 10(4):517–524. PMID: 25711312.
2. Murdoch TB, Detsky AS. The inevitable application of big data to health care. JAMA. 2013; 309(13):1351–1352. PMID: 23549579.
3. Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Health Inf Sci Syst. 2014; 2:3. PMID: 25825667.
4. Khoury MJ, Ioannidis JPA. Big data meets public health. Science. 2014; 346(6213):1054–1055. PMID: 25430753.
5. Friedman C, Rubin J, Brown J, Buntin M, Corn M, Etheredge L, et al. Toward a science of learning systems: a research agenda for the high-functioning Learning Health System. J Am Med Inform Assoc. 2015; 22(1):43–50. PMID: 25342177.
6. Ray WA. Evaluating medication effects outside of clinical trials: new-user designs. Am J Epidemiol. 2003; 158(9):915–920. PMID: 14585769.
7. Stuart EA. Matching methods for causal inference: a review and a look forward. Stat Sci. 2010; 25(1):1–21. PMID: 20871802.
8. Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav Res. 2011; 46(3):399–424. PMID: 21818162.
9. Kyoung DS, Kim HS. Understanding and utilizing claim data from the Korean National Health Insurance Service (NHIS) and Health Insurance Review & Assessment (HIRA) database for research. J Lipid Atheroscler. 2022; 11(2):103–110. PMID: 35656154.
10. Schneeweiss S, Avorn J. A review of uses of health care utilization databases for epidemiologic research on therapeutics. J Clin Epidemiol. 2005; 58(4):323–337. PMID: 15862718.
11. Cox DR, Kartsonaki C, Keogh RH. Big data: some statistical issues. Stat Probab Lett. 2018; 136:111–115. PMID: 29899584.
12. Ellenberg JH. Selection bias in observational and experimental studies. Stat Med. 1994; 13(5-7):557–567. PMID: 8023035.
13. Elze MC, Gregson J, Baber U, Williamson E, Sartori S, Mehran R, et al. Comparison of propensity score methods and covariate adjustment: evaluation in 4 cardiovascular studies. J Am Coll Cardiol. 2017; 69(3):345–357. PMID: 28104076.
14. Williamson E, Morley R, Lucas A, Carpenter J. Propensity scores: from naive enthusiasm to intuitive understanding. Stat Methods Med Res. 2012; 21(3):273–293. PMID: 21262780.
15. Heinze G, Jüni P. An overview of the objectives of and the approaches to propensity score analyses. Eur Heart J. 2011; 32(14):1704–1708. PMID: 21362706.
16. Lim HS, Oh HC, Jang JH, Yoon S, Lee JK, Park S, et al. Research on the development of an analysis method inspection tool to improve the quality of big data research using the National Health Information DB - Methodology review of the literature on the use of the National Health Information DB. Updated 2021. Accessed January 21, 2025. https://repository.nhimc.or.kr/bitstream/2023.oak/185/2/2020-20-015.pdf .
17. Bang H, Robins JM. Doubly robust estimation in missing data and causal inference models. Biometrics. 2005; 61(4):962–973. PMID: 16401269.
18. Tsiatis AA, Davidian M. Comment: demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci. 2007; 22(4):569–573. PMID: 18516239.
19. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015; 521(7553):436–444. PMID: 26017442.
20. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019; 25(1):44–56. PMID: 30617339.
21. Curran GM, Bauer M, Mittman B, Pyne JM, Stetler C. Effectiveness-implementation hybrid designs: combining elements of clinical effectiveness and implementation research to enhance public health impact. Med Care. 2012; 50(3):217–226. PMID: 22310560.
22. Ford I, Norrie J. Pragmatic trials. N Engl J Med. 2016; 375(5):454–463. PMID: 27518663.
23. Hripcsak G, Duke JD, Shah NH, Reich CG, Huser V, Schuemie MJ, et al. Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Stud Health Technol Inform. 2015; 216:574–578. PMID: 26262116.
24. Platt L, Grenfell P, Meiksin R, Elmes J, Sherman SG, Sanders T, et al. Associations between sex work laws and sex workers’ health: a systematic review and meta-analysis of quantitative and qualitative studies. PLoS Med. 2018; 15(12):e1002680. PMID: 30532209.
25. Roth A, Dwork C. The algorithmic foundations of differential privacy. Found Trends Theor Comput Sci. 2013; 9(3-4):211–407.
26. Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell. 2019; 1(5):206–215. PMID: 35603010.
Full Text Links
  • JKMS
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2025 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr