Healthc Inform Res.  2017 Jul;23(3):141-146. 10.4258/hir.2017.23.3.141.

Text Mining in Biomedical Domain with Emphasis on Document Clustering

Affiliations
  • 1Head of Institutional Research, Skyline University College, Sharjah, UAE. vinairesearch@yahoo.com

Abstract


OBJECTIVES
With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents.
METHODS
This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain.
RESULTS
Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail.
CONCLUSIONS
Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise.

Keyword

Text Mining; Cluster Analysis; Classification; Natural Language Processing; Software

MeSH Terms

Classification
Cluster Analysis*
Data Mining*
Mining
Natural Language Processing

Cited by  1 articles

Text Mining of Biomedical Articles Using the Konstanz Information Miner (KNIME) Platform: Hemolytic Uremic Syndrome as a Case Study
Ricardo A. Dorr, Juan J. Casal, Roxana Toriano
Healthc Inform Res. 2022;28(3):276-283.    doi: 10.4258/hir.2022.28.3.276.


Reference

1. Ananiadou S, McNaught J. Text mining for biology and biomedicine. London: Artech House;2006.
2. Hearst MA. Automatic acquisition of hyponyms from large text corpora. In : Proceedings of the 14th Conference on Computational Linguistics; 1992 Aug 23-28; Nantes, France: p. 539–545.
3. Dorre J, Gerstl P, Seiffert R. Text mining: finding nuggets in mountains of textual data. In : Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 1999 Aug 15-18; San Diego, CA: p. 398–401.
4. Chowdhury GG. Natural language processing. Annu Rev Inf Sci Technol. 2003; 37(1):51–89.
Article
5. Gonzalez GH, Tahsin T, Goodale BC, Greene AC, Greene CS. Recent advances and emerging applications in text and data mining for biomedical discovery. Brief Bioinform. 2016; 17(1):33–42.
Article
6. National Institute of Health. Fact sheet MEDLINE [Internet]. Bethesda (MD): National Institutes of Health;c2017. cited at 2017 Jul 15. Available from: https://www.nlm.nih.gov/pubs/factsheets/medline.html.
7. Zhu F, Patumcharoenpol P, Zhang C, Yang Y, Chan J, Meechai A, et al. Biomedical text mining and its applications in cancer research. J Biomed Inform. 2013; 46(2):200–211.
Article
8. Tari L, Anwar S, Liang S, Cai J, Baral C. Discovering drug-drug interactions: a text-mining and reasoning approach based on properties of drug metabolism. Bioinformatics. 2010; 26(18):i547–i553.
Article
9. Zhou D, He Y. Extracting interactions between proteins from the literature. J Biomed Inform. 2008; 41(2):393–407.
Article
10. Lim KM, Li C, Chng KR, Nagarajan N. @MInter: automated text-mining of microbial interactions. Bioinformatics. 2016; 32(19):2981–2987.
Article
11. Higgins JP, Green S. Cochrane handbook for systematic reviews of interventions. 1st ed. Chichester: Wiley-Blackwell;2008.
12. Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform. 2008; 35(8):128–144.
Article
13. Nath C, Albaghdadi MS, Jonnalagadda SR. A natural language processing tool for large-scale data extraction from echocardiography reports. PLoS One. 2016; 11(4):e0153749.
Article
14. Bozkurt S, Gimenez F, Burnside ES, Gulkesen KH, Rubin DL. Using automatically extracted information from mammography reports for decision-support. J Biomed Inform. 2016; 62:224–231.
Article
15. Bravo A, Cases M, Queralt-Rosinach N, Sanz F, Furlong LI. A knowledge-driven approach to extract disease-related biomarkers from the literature. BioMed Res Int. 2014; 2014:253128.
Article
16. Fang YC, Huang HC, Juan HF. MeInfoText: associated gene methylation and cancer information from text mining. BMC Bioinformatics. 2008; 9:22.
Article
17. Rebholz-Schuhmann D, Arregui M, Gaudan S, Kirsch H, Jimeno A. Text processing through Web services: calling Whatizit. Bioinformatics. 2008; 24(2):296–298.
Article
18. Collier N, Doan S, Kawazoe A, Goodwin RM, Conway M, Tateno Y, et al. BioCaster: detecting public health rumors with a Web-based text mining system. Bioinformatics. 2008; 24(24):2940–2941.
Article
19. Steinberger R, Fuart F, van der Goot E, Best C, von Etter P, Yangarber R. Text mining from the web for medical intelligence. In : Francoise FS, editor. Mining massive data sets for security. Amsterdam: IOS Press;2008. p. 295–300.
20. National Institutes of Health. DNorm [Internet]. Bethesda (MD): National Institutes of Health;c2016. cited at 2017 Jul 15. Available from: https://www.ncbi.nlm.nih.gov/research/bionlp/Tools/dnorm/.
21. National Centre for Text Mining. NaCTeM software tools [Internet]. Manchester: National Centre for Text Mining;c2016. cited at 2017 Jul 15. Available from: http://www.nactem.ac.uk/software.php.
22. Romero R, Vieira AS, Iglesias EL, Borrajo . Borrajo. In : BioClass: a tool for biomedical text classification; Proceedings of 8th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2014); 2014 Jun 4-6. Salamanca, Spain: p. 243–251.
23. Barbosa-Silva A, Fontaine JF, Donnard ER, Stussi F, Ortega JM, Andrade-Navarro MA. PESCADOR, a web-based tool to assist text-mining of biointeractions extracted from PubMed queries. BMC Bioinformatics. 2011; 12:435.
Article
24. Chen Y, Liu F, Manderick B. BioLMiner system: interaction normalization task and interaction pair task in the BioCreative II.5 challenge. IEEE/ACM Trans Comput Biol Bioinform. 2010; 7(3):428–441.
Article
25. Rinaldi F, Clematide S, Marques H, Ellendorff T, Romacker M, Rodriguez-Esteban R. OntoGene web services for biomedical text mining. BMC Bioinformatics. 2014; 15:Suppl 14. S6.
Article
26. Tanabe L, Scherf U, Smith LH, Lee JK, Hunter L, Weinstein JN. MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. Biotechniques. 1999; 27(6):1210–1214. 1216–1217.
Article
27. Salton G, McGill MJ. Introduction to modern information retrieval. New York (NY): McGraw-Hill;1986.
28. Manning CD, Schutze H. Foundations of statistical natural language processing. Cambridge (MA): MIT Press;1999.
29. Vijayarani S, Ilamathi MJ, Nithya M. Preprocessing techniques for text mining-an overview. Int J Comput Sci Commun Netw. 2015; 5(1):7–16.
30. Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Inf Process Manag. 1988; 24(5):513–523.
Article
Full Text Links
  • HIR
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr