Int Neurourol J.  2014 Jun;18(2):50-57.

Big Data Analysis Using Modern Statistical and Machine Learning Methods in Medicine

Affiliations
  • 1Department of Biostatistics, Florida International University, Miami, FL, USA. cyoo@fiu.edu
  • 2Department of Dietetic and Nutrition, Florida International University, Miami, FL, USA.

Abstract

In this article we introduce modern statistical machine learning and bioinformatics approaches that have been used in learning statistical relationships from big data in medicine and behavioral science that typically include clinical, genomic (and proteomic) and environmental variables. Every year, data collected from biomedical and behavioral science is getting larger and more complicated. Thus, in medicine, we also need to be aware of this trend and understand the statistical tools that are available to analyze these datasets. Many statistical analyses that are aimed to analyze such big datasets have been introduced recently. However, given many different types of clinical, genomic, and environmental data, it is rather uncommon to see statistical methods that combine knowledge resulting from those different data types. To this extent, we will introduce big data in terms of clinical data, single nucleotide polymorphism and gene expression studies and their interactions with environment. In this article, we will introduce the concept of well-known regression analyses such as linear and logistic regressions that has been widely used in clinical data analyses and modern statistical models such as Bayesian networks that has been introduced to analyze more complicated data. Also we will discuss how to represent the interaction among clinical, genomic, and environmental data in using modern statistical models. We conclude this article with a promising modern statistical method called Bayesian networks that is suitable in analyzing big data sets that consists with different type of large data from clinical, genomic, and environmental data. Such statistical model form big data will provide us with more comprehensive understanding of human physiology and disease.

Keyword

Bayesian analysis; Statistical data interpretation; Systems biology

MeSH Terms

Bayes Theorem
Behavioral Sciences
Computational Biology
Data Interpretation, Statistical
Dataset
Gene Expression
Humans
Learning
Logistic Models
Machine Learning*
Models, Statistical
Physiology
Polymorphism, Single Nucleotide
Statistics as Topic*
Systems Biology
Full Text Links
  • INJ
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr