Genomics Inform.  2018 Dec;16(4):e40. 10.5808/GI.2018.16.4.e40.

Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of Genomics & Informatics

Affiliations
  • 1Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University, Seoul 03760, Korea. neo@ewha.ac.kr
  • 2Center for Convergence Research of Advanced Technologies, Ewha Womans University, Seoul 03760, Korea.

Abstract

There is a communal need for an annotated corpus consisting of the full texts of biomedical journal articles. In response to community needs, a prototype version of the full-text corpus of Genomics & Informatics, called GNI version 1.0, has recently been published, with 499 annotated full-text articles available as a corpus resource. However, GNI needs to be updated, as the texts were shallow-parsed and annotated with several existing parsers. I list issues associated with upgrading annotations and give an opinion on the methodology for developing the next version of the GNI corpus, based on a semi-automatic strategy for more linguistically rich corpus annotation.

Keyword

biomedical text mining; corpus; text analytics

MeSH Terms

Genomics*
Informatics*
Full Text Links
  • GNI
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr