Ewha Med J.  2022 Apr;45(2):46-54. 10.12771/emj.2022.45.2.46.

Topic and Trends of Public Perception and Sentiments of COVID-19 Pandemic in South Korea: A Text Mining Approach

Affiliations
  • 1Department of Environmental Medicine, Ewha Womans University College of Medicine, Seoul, Korea

Abstract


Objectives
Public health risks and anxiety have been increasing since the outbreak of Coronavirus disease 19 (COVID-19). The public expresses questions related to the COVID-19 issue through the web base. The aim of this study was to analyze public perception and sentiments of COVID-19 Pandemic in South Korea.
Methods
We collected the text data (questions: 252,181) related to COVID-19 from Naver Knowledge-iN during January 1, 2020 to December 31, 2020. The search keywords included related to COVID-19 using Korean words for “SARS-Cov-2”, “COVID19”, “COVID-19”, “Wuhan pneumonia”, “Coronavirus”, “Corona”. A topic modeling analysis was used to investigate and search trends of public perception. The sentiment analysis was conducted to analyze of public emotions in the questions related to COVID-19. We performed the Pearson’s correlation analysis between daily number of COVID-19 cases and daily proportion of negative sentiment in documents related to COVID-19 by COVID-19 outbreak period.
Results
A total of 241,776 documents used in this study. The most frequent words in the documents to appear cough, symptoms, tests, confirmed patients, mask and etc. Twenty topics (COVID-test, Economy, School, Hospital/Diagnose, Travel/Overseas, Health, Social issue, Symptom 1 (respiratory), Relationships, Symptom 2 (e.g., fever), Workplace, Mask/Social distancing, infection/Vaccine, Stimulus Package, Family, Delivery Service, Unclassified, Region, Study/Exam, Worry, Anxiety) were extracted using the topic modeling. There was a positive association between the daily counts of COVID-19 patients and proportion of negative sentiment. By COVID-19 period, Stage 4 had the highest correlation.
Conclusion
This study identified the South Korean public’s interest and emotions about COVID-19 during the prolonged pandemic crisis.

Keyword

COVID-19; Data mining; Sentiment analysis; Korea

Figure

  • Fig. 1. The Perplexity of topic modeling related to COVID-19. The x-axis indicates the number of topic. The y-axis indicates perplexity of latent Dirichlet allocation (LDA) models.

  • Fig. 2. Top 20 frequent words related to COVID-19 documents.

  • Fig. 3. Top 20 frequent words’ monthly trend from January 1, 2020 to December 31, 2020. The x-axis indicates months. The y-axis indicates the frequency of words.

  • Fig. 4. The time-plot of 20 topics related to COVID-19 documents from January 1, 2020 to December 31, 2020. The x-axis indicates weeks. The y-axis indicates the probability that each topics appear.

  • Fig. 5. The correlation of between daily COVID-19 cases and proportion of negative sentiment of documents from January 1 2020 to December 31 2020. The x-axis indicates date. The y-axis indicates the daily number of confirmed COVID-19 case (counts) in Korea. The auxiliary axis indicates the proportion of negative sentiment of documents. The blue bar indicates the daily number of confirmed COVID-19 cases. The orange line indicates the proportion of negative sentiment. Five stages are defined by Table 2. COVID-19 outbreak period (Stage 1: 2020.1.20–2.17, Stage 2: 2.18–5.5, Stage 3: 5.6–8.11, Stage 4: 8.12–11.12, Stage 5: 11.13–12.31).


Reference

References

1. Lee SM, Ryu SE, Ahn S. Mass media and social media agenda analysis using text mining: focused on ‘5-day rotation mask distribution system’. J Korea Content Assoc. 2020; 20:460–469.
2. Boon-Itt S, Skunkan Y. Public perception of the COVID-19 pandemic on twitter: sentiment analysis and topic modeling study. JMIR Public Health Surveil. 2020; 6:e21978. DOI: 10.2196/21978. PMID: 33108310. PMCID: PMC7661106.
3. Naseem SS, Kumar D, Parsa MS, Golab L. Text mining of COVID-19 discussions on reddit. In. In : He J, Purohit H, Huang G, Gao X, Deng K, editors. editors. Proceedings of the 2020 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). 2020. 12. 14-17. Melbourne. Piscataway (NJ): IEEE;2020. p. p. 687–691. DOI: 10.1109/WIIAT50758.2020.00104.
4. Jo W, Lee J, Park J, Kim Y. Online information exchange and anxiety spread in the early stage of the novel coronavirus (COVID-19) outbreak in South Korea: structural topic model and network analysis. J Med Internet Res. 2020; 22:e19455. DOI: 10.2196/19455. PMID: 32463367. PMCID: PMC7268668.
5. Shim JG, Ryu KH, Lee SH, Cho EA, Lee YJ, Ahn JH. Text mining approaches to analyze public sentiment changes regarding COVID-19 vaccines on social media in Korea. Int J Environ Res Public Health. 2021; 18:6549. DOI: 10.3390/ijerph18126549. PMID: 34207016. PMCID: PMC8296514.
6. Jo W, Chang D. Political consequences of COVID-19 and media framing in South Korea. Front Public Health. 2020; 8:425. DOI: 10.3389/fpubh.2020.00425. PMID: 32974260. PMCID: PMC7481441.
7. Ramos J. Using TF-IDF to determine word relevance in document queries [Internet]. State College (PA): Citeseer;2013. [cited 2022 Jan 10]. Available from: https://www.semanticscholar.org/paper/Using-TF-IDF-to-Determine-Word-Relevance-in-Queries-Ramos/b3bf6373ff41a115197cb5b30e57830c16130c2c.
8. Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res. 2003; 3:993–1022.
9. Maier D, Waldherr A, Miltner P, Wiedemann G, Niekler A, Keinert A, et al. Applying LDA topic modeling in communication research: toward a valid and reliable methodology. Commun Methods Meas. 2018; 12:93–118. DOI: 10.1080/19312458.2018.1430754.
10. Dubey AD. Twitter sentiment analysis during COVID-19 Outbreak [Internet]. Amsterdam: Social Science Research Network;2020. [cited 2022 Jan 5]. Available from: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3572023. DOI: 10.2139/ssrn.3572023.
11. Li N, Wu DD. Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decis Support Syst. 2010; 48:354–368. DOI: 10.1016/j.dss.2009.09.003.
12. Suh H, So J. A study on the topic and sentiment of national petition data using text analysis. Korean Data Anal Soc. 2020; 22:999–1011. DOI: 10.37727/jkdas.2020.22.3.999.
13. Park SM, Na CW, Choi MS, Lee DH, On BW. KNU Korean sentiment lexicon: Bi-LSTM-based method for building a Korean sentiment lexicon. J Intell Inf Syst. 2018; 24:219–240.
14. Kim Y, Kim YY, Yeom H, Jang J, Hwang I, Park K, et al. COVID-19 1-year outbreak report as of January 19, 2021 in the Republic of Korea. Public Health Wkly Rep. 2021; 14:478–481.
15. Tang J, Meng Z, Nguyen X, Mei Q, Zhang M. Understanding the limiting factors of topic modeling via posterior contraction analysis. In. In : Xing EP, Jebara T, editors. editors. Proceedings of the 31st International Conference on Machine Learning. 2014. 06. 21-26. Stroudsburg (PA): International Machine Learning Society;2014. p. p. 190–198.
Full Text Links
  • EMJ
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr