Healthc Inform Res.  2022 Jan;28(1):89-94. 10.4258/hir.2022.28.1.89.

ANNO: A General Annotation Tool for Bilingual Clinical Note Information Extraction

Affiliations
  • 1Department of Information Medicine, Asan Medical Center, Seoul, Korea
  • 2Research & Development Team, iKooB, Seoul, Korea
  • 3Department of IT Convergence Engineering, Gachon University, Seongnam, Korea
  • 4Institute of Convergence Medicine, Ewha Womans University Mokdong Hospital, Seoul, Korea

Abstract


Objectives
This study was conducted to develop a generalizable annotation tool for bilingual complex clinical text annotation, which led to the design and development of a clinical text annotation tool, ANNO.
Methods
We designed ANNO to enable human annotators to support the annotation of information in clinical documents efficiently and accurately. First, annotations for different classes (word or phrase types) can be tagged according to the type of word using the dictionary function. In addition, it is possible to evaluate and reconcile differences by comparing annotation results between human annotators. Moreover, if the regular expression set for each class is updated during annotation, it is automatically reflected in the new document. The regular expression set created by human annotators is designed such that a word tagged once is automatically labeled in new documents.
Results
Because ANNO is a Docker-based web application, users can use it freely without being subjected to dependency issues. Human annotators can share their annotation markups as regular expression sets with a dictionary structure, and they can cross-check their annotated corpora with each other. The dictionary-based regular expression sharing function, cross-check function for each annotator, and standardized input (Microsoft Excel) and output (extensible markup language [XML]) formats are the main features of ANNO.
Conclusions
With the growing need for massively annotated clinical data to support the development of machine learning models, we expect ANNO to be helpful to many researchers.

Keyword

Medical Records; Data Mining; Information Storage and Retrieval; Personal Health Records; Information Storage and Retrieval

Figure

  • Figure 1 Workflow of the annotation process. PHI: personal health information.

  • Figure 2 Overview of the system architecture of ANNO.

  • Figure 3 Interface of ANNO software.

  • Figure 4 Output format of ANNO.


Reference

References

1. Spasic I, Nenadic G. Clinical text data in machine learning: systematic review. JMIR Med Inform. 2020; 8(3):e17984.
Article
2. Halevy A, Norvig P, Pereira F. The unreasonable effectiveness of data. IEEE Intell Syst. 2009; 24(2):8–12.
Article
3. Aggarwal A, Garhwal S, Kumar A. HEDEA: a Python tool for extracting and analysing semi-structured information from medical records. Healthc Inform Res. 2018; 24(2):148–53.
Article
4. Cedeno Moreno D, Vargas-Lombardo M. Design and construction of a NLP based knowledge extraction methodology in the medical domain applied to clinical information. Healthc Inform Res. 2018; 24(4):376–80.
Article
5. Yang J, Zhang Y, Li L, Li X. YEDDA: a lightweight collaborative text span annotation tool [Internet]. Ithaca (NY): arXiv.org;2017. [cited at 2022 Jan 19]. Available from: https://arxiv.org/abs/1711.03759 .
6. Chen WT, Styler W. Anafora: a web-based general purpose annotation tool. Proc Conf. 2013; 2013:14–9.
7. Bontcheva K, Cunningham H, Roberts I, Roberts A, Tablan V, Aswani N, et al. GATE Teamware: a web-based, collaborative text annotation framework. Lang Resour Eval. 2013; 47(4):1007–29.
Article
8. Lenzi VB, Moretti G, Sprugnoli R. CAT: the CELCT annotation tool. In : Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC); 2012 May 23–25; Istanbul, Turkey. p. 333–8.
9. Ogren P. Knowtator: a protégé plug-in for annotated corpus construction. In : Proceedings of the Human Language Technology Conference of the NAACL; 2006 Jun 4–9; New York, NY. p. 273–5.
10. Merkel D. Docker: lightweight Linux containers for consistent development and deployment. Linux J. 2014; 2014(239):2.
Full Text Links
  • HIR
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr