Genomics Inform.  2021 Sep;19(3):e26. 10.5808/gi.21014.

O-JMeSH: creating a bilingual English-Japanese controlled vocabulary of MeSH UIDs through machine translation and mutual information

Affiliations
  • 1Computer Science Department, The University of Sheffield, Western Bank, Sheffield S10 2TN, UK
  • 2National Bioscience Database Center, Japan Science and Technology Agency, Tokyo 102-8666, Japan
  • 3Database Center for Life Science, Research Organization of Information and Systems, Kashiwa 277-0871, Japan
  • 4Graduate School of Integrative Science and Engineering, Tokyo City University, Tokyo 158-8557, Japan

Abstract

Previous approaches to create a controlled vocabulary for Japanese have resorted to existing bilingual dictionary and transformation rules to allow such mappings. However, given the possible new terms introduced due to coronavirus disease 2019 (COVID-19) and the emphasis on respiratory and infection-related terms, coverage might not be guaranteed. We propose creating a Japanese bilingual controlled vocabulary based on MeSH terms assigned to COVID-19 related publications in this work. For such, we resorted to manual curation of several bilingual dictionaries and a computational approach based on machine translation of sentences containing such terms and the ranking of possible translations for the individual terms by mutual information. Our results show that we achieved nearly 99% occurrence coverage in LitCovid, while our computational approach presented average accuracy of 63.33% for all terms, and 84.51% for drugs and chemicals.

Keyword

controlled vocabulary; COVID-19; Japanese; multilingualism; natural language processing; translation
Full Text Links
  • GNI
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr