Genomics Inform.  2012 Mar;10(1):44-50. 10.5808/GI.2012.10.1.44.

Efficient Mining of Interesting Patterns in Large Biological Sequences

Affiliations
  • 1Department of Computer Engineering, College of Electronics and Information, Kyung Hee University, Yongin 446-701, Korea.
  • 2Department of Computer Science, Korea Advanced Institute of Science and Technology, Daejeon 305-701, Korea. hojinc@kaist.ac.kr

Abstract

Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin. In this paper, we propose a new interesting measure that can provide meaningful biological information. We also propose an efficient index-based method for mining such interesting patterns. Experimental results show that our approach can find interesting patterns within an acceptable computation time.

Keyword

DNA sequence; index-based method; information gain; pattern mining

MeSH Terms

Base Sequence
Computational Biology
DNA
Mining
DNA
Full Text Links
  • GNI
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr