Health Policy Manag.  2020 Mar;30(1):15-25. 10.4332/KJHPA.2020.30.1.15.

A Study on the Application of Natural Language Processing in Health Care Big Data: Focusing on Word Embedding Methods

  • 1Review and Assessment Research Department, Health Insurance Review & Assessment Service, Wonju, Korea
  • 2Department of Data Science, Kookmin University, Seoul, Korea


While healthcare data sets include extensive information about patients, many researchers have limitations in analyzing them due to their intrinsic characteristics such as heterogeneity, longitudinal irregularity, and noise. In particular, since the majority of medical history information is recorded in text codes, the use of such information has been limited due to the high dimensionality of explanatory variables. To address this problem, recent studies applied word embedding techniques, originally developed for natural language processing, and derived positive results in terms of dimensional reduction and accuracy of the prediction model. This paper reviews the deep learning-based natural language processing techniques (word embedding) and summarizes research cases that have used those techniques in the health care field. Then we finally propose a research framework for applying deep learning-based natural language process in the analysis of domestic health insurance data.


Health care big data; High dimensionality; Deep learning; Natural language processing; Word embedding; Word2vec
Full Text Links
  • HPM
export Copy
  • Twitter
  • Facebook
Similar articles
Copyright © 2023 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: