Osong Public Health Res Perspect.  2011 Sep;2(2):75-82. 10.1016/j.phrp.2011.07.005.

Development of a Predictive Model for Type 2 Diabetes Mellitus Using Genetic and Clinical Data

Affiliations
  • 1Division of Structural and Functional Genomics, Korea National Institute, Osong, Korea
  • 2Division of Biobank for Health Sciences, Korea National Institute, Osong, Korea
  • 3Division of Epidemic Intelligence Service, Korea Centers for Disease Control and Prevention, Osong, Korea
  • 4Division of Bio-Medical Informatics, Korea National Institute, Osong, Korea
  • 5Department of Internal Medicine and Lung Institute, Seoul National University College of Medicine, Seoul, Korea
  • 6Division of Quarantine Support, Korea Centers for Disease Control and Prevention, Osong, Korea
  • 7Department of Biomedical Engineering, School of Medicine, Kyung Hee University, Seoul, Korea
  • 8Department of Statistics, Inha University, Incheon, Korea
  • 9Medical Genomics Laboratory, Pochon CHA University, Seongnam, Korea
  • 10Center for Genome Research, Korea Centers for Disease Control and Prevention, Osong, Korea

Abstract


Objectives
Recent genetic association studies have provided convincing evidence that several novel loci and single nucleotide polymorphisms (SNPs) are associated with the risk of developing type 2 diabetes mellitus (T2DM). The aims of this study were: 1) to develop a predictive model of T2DM using genetic and clinical data; and 2) to compare misclassification rates of different models.
Methods
We selected 212 individuals with newly diagnosed T2DM and 472 controls aged in their 60s from the Korean Genome and Epidemiology Study. A total of 499 known SNPs from 87 T2DM-related genes were genotyped using germline DNA. SNPs were analyzed for significant association with T2DM using various classification algorithms including Quest (Quick, Unbiased, Efficient, Statistical tree), Support Vector Machine, C4.5, logistic regression, and K-nearest neighbor.
Results
We tested these models using the complete Korean Genome and Epidemiology Study cohort (n = 10,038) and computed the T2DM misclassification rates for each model. Average misclassification rates ranged at 28.2–52.7%. The misclassification rates for the logistic and machine-learning algorithms were lower than the statistical tree algorithms. Using 1-to-1 matched data, the misclassification rate of the statistical tree QUEST algorithm using body mass index and SNP variables was the lowest, but overall the logistic regression performed best.
Conclusions
The K-nearest neighbor method exhibited more robust results than other algorithms. For clinical and genetic data, our “multistage adjustment” model outperformed other models in yielding lower rates of misclassification. To improve the performance of these models, further studies using warranted, strategies to estimate better classifiers for the quantification of SNPs need to be developed.

Keyword

classification; early predictive model; single nucleotide polymorphism (SNP); type 2 diabetes mellitus (T2DM)
Full Text Links
  • OPHRP
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr