Healthc Inform Res.  2017 Jul;23(3):141-146. 10.4258/hir.2017.23.3.141.

Text Mining in Biomedical Domain with Emphasis on Document Clustering

  • 1Head of Institutional Research, Skyline University College, Sharjah, UAE.


With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents.
This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain.
Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail.
Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise.


Text Mining; Cluster Analysis; Classification; Natural Language Processing; Software

MeSH Terms

Cluster Analysis*
Data Mining*
Natural Language Processing

Cited by  1 articles

Text Mining of Biomedical Articles Using the Konstanz Information Miner (KNIME) Platform: Hemolytic Uremic Syndrome as a Case Study
Ricardo A. Dorr, Juan J. Casal, Roxana Toriano
Healthc Inform Res. 2022;28(3):276-283.    doi: 10.4258/hir.2022.28.3.276.


1. Ananiadou S, McNaught J. Text mining for biology and biomedicine. London: Artech House;2006.
2. Hearst MA. Automatic acquisition of hyponyms from large text corpora. In : Proceedings of the 14th Conference on Computational Linguistics; 1992 Aug 23-28; Nantes, France: p. 539–545.
3. Dorre J, Gerstl P, Seiffert R. Text mining: finding nuggets in mountains of textual data. In : Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 1999 Aug 15-18; San Diego, CA: p. 398–401.
4. Chowdhury GG. Natural language processing. Annu Rev Inf Sci Technol. 2003; 37(1):51–89.
5. Gonzalez GH, Tahsin T, Goodale BC, Greene AC, Greene CS. Recent advances and emerging applications in text and data mining for biomedical discovery. Brief Bioinform. 2016; 17(1):33–42.
6. National Institute of Health. Fact sheet MEDLINE [Internet]. Bethesda (MD): National Institutes of Health;c2017. cited at 2017 Jul 15. Available from:
7. Zhu F, Patumcharoenpol P, Zhang C, Yang Y, Chan J, Meechai A, et al. Biomedical text mining and its applications in cancer research. J Biomed Inform. 2013; 46(2):200–211.
8. Tari L, Anwar S, Liang S, Cai J, Baral C. Discovering drug-drug interactions: a text-mining and reasoning approach based on properties of drug metabolism. Bioinformatics. 2010; 26(18):i547–i553.
9. Zhou D, He Y. Extracting interactions between proteins from the literature. J Biomed Inform. 2008; 41(2):393–407.
10. Lim KM, Li C, Chng KR, Nagarajan N. @MInter: automated text-mining of microbial interactions. Bioinformatics. 2016; 32(19):2981–2987.
11. Higgins JP, Green S. Cochrane handbook for systematic reviews of interventions. 1st ed. Chichester: Wiley-Blackwell;2008.
12. Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform. 2008; 35(8):128–144.
13. Nath C, Albaghdadi MS, Jonnalagadda SR. A natural language processing tool for large-scale data extraction from echocardiography reports. PLoS One. 2016; 11(4):e0153749.
14. Bozkurt S, Gimenez F, Burnside ES, Gulkesen KH, Rubin DL. Using automatically extracted information from mammography reports for decision-support. J Biomed Inform. 2016; 62:224–231.
15. Bravo A, Cases M, Queralt-Rosinach N, Sanz F, Furlong LI. A knowledge-driven approach to extract disease-related biomarkers from the literature. BioMed Res Int. 2014; 2014:253128.
16. Fang YC, Huang HC, Juan HF. MeInfoText: associated gene methylation and cancer information from text mining. BMC Bioinformatics. 2008; 9:22.
17. Rebholz-Schuhmann D, Arregui M, Gaudan S, Kirsch H, Jimeno A. Text processing through Web services: calling Whatizit. Bioinformatics. 2008; 24(2):296–298.
18. Collier N, Doan S, Kawazoe A, Goodwin RM, Conway M, Tateno Y, et al. BioCaster: detecting public health rumors with a Web-based text mining system. Bioinformatics. 2008; 24(24):2940–2941.
19. Steinberger R, Fuart F, van der Goot E, Best C, von Etter P, Yangarber R. Text mining from the web for medical intelligence. In : Francoise FS, editor. Mining massive data sets for security. Amsterdam: IOS Press;2008. p. 295–300.
20. National Institutes of Health. DNorm [Internet]. Bethesda (MD): National Institutes of Health;c2016. cited at 2017 Jul 15. Available from:
21. National Centre for Text Mining. NaCTeM software tools [Internet]. Manchester: National Centre for Text Mining;c2016. cited at 2017 Jul 15. Available from:
22. Romero R, Vieira AS, Iglesias EL, Borrajo . Borrajo. In : BioClass: a tool for biomedical text classification; Proceedings of 8th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2014); 2014 Jun 4-6. Salamanca, Spain: p. 243–251.
23. Barbosa-Silva A, Fontaine JF, Donnard ER, Stussi F, Ortega JM, Andrade-Navarro MA. PESCADOR, a web-based tool to assist text-mining of biointeractions extracted from PubMed queries. BMC Bioinformatics. 2011; 12:435.
24. Chen Y, Liu F, Manderick B. BioLMiner system: interaction normalization task and interaction pair task in the BioCreative II.5 challenge. IEEE/ACM Trans Comput Biol Bioinform. 2010; 7(3):428–441.
25. Rinaldi F, Clematide S, Marques H, Ellendorff T, Romacker M, Rodriguez-Esteban R. OntoGene web services for biomedical text mining. BMC Bioinformatics. 2014; 15:Suppl 14. S6.
26. Tanabe L, Scherf U, Smith LH, Lee JK, Hunter L, Weinstein JN. MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. Biotechniques. 1999; 27(6):1210–1214. 1216–1217.
27. Salton G, McGill MJ. Introduction to modern information retrieval. New York (NY): McGraw-Hill;1986.
28. Manning CD, Schutze H. Foundations of statistical natural language processing. Cambridge (MA): MIT Press;1999.
29. Vijayarani S, Ilamathi MJ, Nithya M. Preprocessing techniques for text mining-an overview. Int J Comput Sci Commun Netw. 2015; 5(1):7–16.
30. Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Inf Process Manag. 1988; 24(5):513–523.
Full Text Links
  • HIR
export Copy
  • Twitter
  • Facebook
Similar articles
Copyright © 2023 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: