Healthc Inform Res.  2024 Oct;30(4):398-408. 10.4258/hir.2024.30.4.398.

Cancer-related Keywords in 2023: Insights from Text Mining of a Major Consumer Portal

Affiliations
  • 1Cancer Knowledge & Information Center, National Cancer Control Institute, National Cancer Center, Goyang, Korea

Abstract


Objectives
With the growing importance of monitoring cancer patients’ internet usage, there is an increasing need for technology that expands access to relevant information through text mining. This study analyzed internet articles from portal sites in 2023 to identify trends in the information available to cancer patients and to derive meaningful insights.
Methods
This study analyzed 19,578 news articles published on Naver, a major Korean portal site, from January 1, 2023, to December 31, 2023. Natural language processing, text mining, network analysis, and word cloud analysis were employed. The search term “am” (Korean for “cancer”) was used to identify keywords related to cancer.
Results
In 2023, an average of 1,631 cancer-related articles were published monthly, with a peak of 1,946 in September and a low of 1,371 in February. A total of 132,456 keywords were extracted, with “cure” (2,218 occurrences), “lung cancer” (1,652), and “breast cancer” (1,235) being the most frequent. Term frequency-inverse document frequency analysis ranked “struggle” (1064.172) as the most significant keyword, followed by “lung cancer” (839.988) and “breast cancer” (744.840). Network analysis revealed four distinct clusters focusing on treatment, celebrity-related issues, major cancer types, and cancer-causing factors.
Conclusions
The analysis of cancer-related keywords in 2023 indicates that news articles often prioritize gossip over essential information. These findings provide foundational data for future policy directions and strategies to address misinformation. This study underscores the importance of understanding the nature of cancer-related information consumed by the public and offers insights to guide official policies and healthcare practices.

Keyword

Neoplasms, Data Mining, Newspaper Article, Information Dissemination, Natural Language Processing

Figure

  • Figure 1 Number of relevant news articles published each month.

  • Figure 2 Results of keyword extraction by frequency. The results show a translation from the original Korean (A) to English (B), with personal names anonymized to ensure privacy.

  • Figure 3 Results of network analysis utilizing the top 50 keywords based on term frequency. The results show a translation from the original Korean (A) to English (B), with personal names anonymized to ensure privacy.

  • Figure 4 Results of network analysis utilizing the top 50 keywords based on keyword importance. The results show a translation from the original Korean (A) to English (B), with personal names anonymized to ensure privacy.


Reference

References

1. Khoshnood Z, Dehghan M, Iranmanesh S, Rayyani M. Informational needs of patients with cancer: a qualitative content analysis. Asian Pac J Cancer Prev. 2019; 20(2):557–62. https://doi.org/10.31557/APJCP.2019.20.2.557.
Article
2. Gage-Bouchard EA, LaValley S, Warunek M, Beaupin LK, Mollica M. Is cancer information exchanged on social media scientifically accurate? J Cancer Educ. 2018; 33(6):1328–32. https://doi.org/10.1007/s13187-017-1254-z.
Article
3. Kim JH, Oh KH, Shin HY, Jun JK. How cancer patients get fake cancer information: from TV to YouTube, a qualitative study focusing on fenbendazole scandle. Front Oncol. 2022; 12:942045. https://doi.org/10.3389/fonc.2022.942045.
Article
4. Yoon HY, You KH, Kwon JH, Kim JS, Rha SY, Chang YJ, et al. Understanding the social mechanism of cancer misinformation spread on YouTube and lessons learned: infodemiological study. J Med Internet Res. 2022; 24(11):e39571. https://doi.org/10.2196/39571.
Article
5. Korhonen A, Seaghdha DO, Silins I, Sun L, Hogberg J, Stenius U. Text mining for literature review and knowledge discovery in cancer risk assessment and research. PLoS One. 2012; 7(4):e33427. https://doi.org/10.1371/journal.pone.0033427.
Article
6. Gaikwad SV, Chaugule A, Patil P. Text mining methods and techniques. Int J Comput Appl. 2014; 85(17):42–5.
Article
7. Spasic I, Livsey J, Keane JA, Nenadic G. Text mining of cancer-related information: review of current status and future directions. Int J Med Inform. 2014; 83(9):605–23. https://doi.org/10.1016/j.ijmedinf.2014.06.009.
Article
8. Zhu F, Patumcharoenpol P, Zhang C, Yang Y, Chan J, Meechai A, et al. Biomedical text mining and its applications in cancer research. J Biomed Inform. 2013; 46(2):200–11. https://doi.org/10.1016/j.jbi.2012.10.007.
Article
9. Johnson SB, Bylund CL. Identifying cancer treatment misinformation and strategies to mitigate its effects with improved radiation oncologist-patient communication. Pract Radiat Oncol. 2023; 13(4):282–5. https://doi.org/10.1016/j.prro.2023.01.007.
Article
10. Chen L, Wang P, Ma X, Wang X. Cancer communication and user engagement on Chinese social media: content analysis and topic modeling study. J Med Internet Res. 2021; 23(11):e26310. https://doi.org/10.2196/26310.
Article
11. Blei DM. Probabilistic topic models. Commun ACM. 2012; 55(4):77–84. https://doi.org/10.1145/2133806.2133826.
Article
12. Choi DO. Internet portal competition and economic incentive to tailor news slant [Internet]. Seoul, Korea: Korea Development Institute;2017. [cited at 2024 Oct 1]. Available from: https://www.kdi.re.kr/research/reportView?&pub_no=15184.
13. SO Oh, Park A, Choi JH. Digital news report in Korea 2021 [Internet]. Seoul, Korea: Korea Press Foundation;2021. [cited at 2024 Oct 1]. Available from: https://www.kpf.or.kr/front/research/selfDetail.do?seq=592216.
14. Park S, Bier LM, Park HW. The effects of infotainment on public reaction to North Korea using hybrid text mining: content analysis, machine learning-based sentiment analysis, and co-word analysis. Prof Inf. 2021; 30(3):e300306. https://doi.org/10.3145/epi.2021.may.06.
Article
15. Shamshiri A, Ryu KR, Park JY. Text mining and natural language processing in construction. Autom Constr. 2024; 158:105200. https://doi.org/10.1016/j.autcon.2023.105200.
Article
16. Zanini N, Dhawan V. Text mining: an introduction to theory and some applications. Res Matters. 2015; (19):38–44. https://doi.org/10.17863/CAM.100316.
17. Kao A, Poteet S. Text mining and natural language processing: introduction for the special issue. ACM SIGKDD Explor Newsl. 2005; 7(1):1–2. https://doi.org/10.1145/1089815.1089816.
Article
18. Lochter JV, Silva RM, Almeida TA. Deep learning models for representing out-of-vocabulary words. Cerri R, Prati RC, editors. Intelligent systems. Cham, Switzerland: Springer;2020. p. 418–34. https://doi.org/10.1007/978-3-030-61377-8_29.
Article
19. Park JY, Lee J, Hong B. Keyword network analysis of infusion nursing from posts on the Q&A board in the Intravenous Nurses Café. Healthc Inform Res. 2023; 29(1):75–83. https://doi.org/10.4258/hir.2023.29.1.75.
Article
20. Hevey D. Network analysis: a brief overview and tutorial. Health Psychol Behav Med. 2018; 6(1):301–28. https://doi.org/10.1080/21642850.2018.1521283.
Article
21. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech. 2008; 2008(10):P10008. https://doi.org/10.1088/1742-5468/2008/10/P10008.
Article
22. Loeb S, Sengupta S, Butaney M, Macaluso JN Jr, Czarniecki SW, Robbins R, et al. Dissemination of misinformative and biased information about prostate cancer on YouTube. Eur Urol. 2019; 75(4):564–7. https://doi.org/10.1016/j.eururo.2018.10.056.
Article
23. Shin HS, Lee YJ. Journalists’ awareness of misinformtaion issues: focused on in-depth interviews. Korean J Journal Commun Stud. 2021; 65(4):239–72.
24. Desplenter FA, Laekeman GJ, De Coster S, Simoens SR; VZA Psychiatry Research Group. Information on antidepressants for psychiatric inpatients: the divide between patient needs and professional practice. Pharm Pract (Granada). 2013; 11(2):81–9. https://doi.org/10.4321/s1886-36552013000200004.
Article
25. Ministry of Health and Welfare. Develop a plan to establish a pediatric cancer treatment system ensuring access to treatment for pediatric cancer patients at hospitals near their residence [Internet]. Sejong, Korea: Ministry of Health and Welfare;2023. [cited at 2024 Oct 1]. Available from: https://www.mohw.go.kr/board.es?mid=a10503010100&bid=0027&act=view&list_no=377367.
26. Im YH, Kim E, Kim KH, Kim A. News perceptions and uses among online-news users. Korean J Journal Commun Stud. 2008; 52(4):179–204.
27. Hong M, Ju MJ, Yoon J, Lee W, Lee S, Jo EK, et al. Exposures to humidifier disinfectant and various health conditions in Korean based on personal exposure assessment data of claimants for compensation. BMC Public Health. 2023; 23(1):1800. https://doi.org/10.1186/s12889-023-16389-x.
Article
28. Kim M, Kim Y, Kim AR, Kwon WJ, Lim S, Kim W, et al. Cooking oil fume exposure and Lung-RADS distribution among school cafeteria workers of South Korea. Ann Occup Environ Med. 2024; 36:e2. https://doi.org/10.35371/aoem.2024.36.e2.
Article
29. Lee S, Jeong EL. An integrative approach to examining the celebrity endorsement process in shaping affective destination image: a K-pop culture perspectives. Tour Manag Perspect. 2023. Sep. 1. 48:101150. https://doi.org/10.1016/j.tmp.2023.101150.
Article
30. Larsen K, Rydz E, Peters CE. Inequalities in environmental cancer risk and carcinogen exposures: a scoping review. Int J Environ Res Public Health. 2023; 20(9):5718. https://doi.org/10.3390/ijerph20095718.
Article
Full Text Links
  • HIR
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr