Yonsei Med J.  2019 Feb;60(2):191-199. 10.3349/ymj.2019.60.2.191.

Machine Learning for the Prediction of New-Onset Diabetes Mellitus during 5-Year Follow-up in Non-Diabetic Patients with Cardiovascular Risks

Affiliations
  • 1Research Institute of Health Sciences, Korea University College of Health Science, Seoul, Korea.
  • 2Cardiovascular Center, Korea University Guro Hospital, Seoul, Korea. swrha617@yahoo.co.kr
  • 3Center for Gastric Cancer, National Cancer Center, Goyang, Korea.
  • 4Division of Cardiology, Nown Eulji Hospital, Eulji University, Seoul, Korea.
  • 5School of Mechanical & Aerospace Engineering, Seoul National University, Seoul, Korea. nohyung@snu.ac.kr

Abstract

PURPOSE
Many studies have proposed predictive models for type 2 diabetes mellitus (T2DM). However, these predictive models have several limitations, such as user convenience and reproducibility. The purpose of this study was to develop a T2DM predictive model using electronic medical records (EMRs) and machine learning and to compare the performance of this model with traditional statistical methods.
MATERIALS AND METHODS
In this study, a total of available 8454 patients who had no history of diabetes and were treated at the cardiovascular center of Korea University Guro Hospital were enrolled. All subjects completed 5 years of follow up. The prevalence of T2DM during follow up was 4.78% (404/8454). A total of 28 variables were extracted from the EMRs. In order to verify the cross-validation test according to the prediction model, logistic regression (LR), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and K-nearest neighbor (KNN) algorithm models were generated. The LR model was considered as the existing statistical analysis method.
RESULTS
All predictive models maintained a change within the standard deviation of area under the curve (AUC) < 0.01 in the analysis after a 10-fold cross-validation test. Among all predictive models, the LR learning model showed the highest prediction performance, with an AUC of 0.78. However, compared to the LR model, the LDA, QDA, and KNN models did not show a statistically significant difference.
CONCLUSION
We successfully developed and verified a T2DM prediction system using machine learning and an EMR database, and it predicted the 5-year occurrence of T2DM similarly to with a traditional prediction model. In further study, it is necessary to apply and verify the prediction model through clinical research.

Keyword

Type 2 diabetes mellitus; diabetes; machine learning; prediction; big data

MeSH Terms

Area Under Curve
Diabetes Mellitus*
Diabetes Mellitus, Type 2
Electronic Health Records
Follow-Up Studies*
Humans
Korea
Learning
Logistic Models
Machine Learning*
Methods
Prevalence

Figure

  • Fig. 1 Study flow chart. KUGH: Korea University Guro Hospital, EMR: electronic medical record.

  • Fig. 2 Selection of features for type 2 diabetes mellitus prediction model generation using ‘Information Gain Attribute Evaluation.’ CAD, coronary artery disease; CKD-MDRD, chronic kidney disease–the modification of diet in renal disease; PCI, percutaneous coronary intervention; ARB, angiotensin receptor blockers; ACEI, angiotensin-converting enzyme inhibitors; CCB, calcium channel blockers; DHP, dihydropyridine; BB, beta blockers.

  • Fig. 3 ROC analysis of the cross-validation tests ranging from 0 to 30 quartile according to the learning model. Change in AUC (A) and amount of change in AUC (B). ROC, receiver-operating characteristic; AUC, area under the curve, KNN, K-nearest neighbor.

  • Fig. 4 10-fold cross-validation test of the predictive models of type 2 diabetes mellitus. KNN, K-nearest neighbor; AUC, area under the curve.


Cited by  2 articles

Machine Learning Application in Diabetes and Endocrine Disorders
Namki Hong, Heajeong Park, Yumie Rhee
J Korean Diabetes. 2020;21(3):130-139.    doi: 10.4093/jkd.2020.21.3.130.

Development and Validation of a Deep Learning Based Diabetes Prediction System Using a Nationwide Population-Based Cohort
Sang Youl Rhee, Ji Min Sung, Sunhee Kim, In-Jeong Cho, Sang-Eun Lee, Hyuk-Jae Chang
Diabetes Metab J. 2021;45(4):515-525.    doi: 10.4093/dmj.2020.0081.


Reference

1. American Diabetes Association. Diagnosis and classification of diabetes mellitus. Diabetes Care. 2013; 36(Suppl 1):S67–S74. PMID: 23264425.
2. Wexler DJ, Grant RW, Wittenberg E, Bosch JL, Cagliero E, Delahanty L, et al. Correlates of health-related quality of life in type 2 diabetes. Diabetologia. 2006; 49:1489–1497. PMID: 16752167.
Article
3. Laakso M. Hyperglycemia and cardiovascular disease in type 2 diabetes. Diabetes. 1999; 48:937–942. PMID: 10331395.
Article
4. Romero SP, Garcia-Egido A, Escobar MA, Andrey JL, Corzo R, Perez V, et al. Impact of new-onset diabetes mellitus and glycemic control on the prognosis of heart failure patients: a propensity-matched study in the community. Int J Cardiol. 2013; 167:1206–1216. PMID: 22560913.
Article
5. Twito O, Ahron E, Jaffe A, Afek S, Cohen E, Granek-Catarivas M, et al. New-onset diabetes in elderly subjects: association between HbA1c levels, mortality, and coronary revascularization. Diabetes Care. 2013; 36:3425–3429. PMID: 23877985.
6. Tuomilehto J, Lindström J, Eriksson JG, Valle TT, Hämäläinen H, Ilanne-Parikka P, et al. Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance. N Engl J Med. 2001; 344:1343–1350. PMID: 11333990.
Article
7. Knowler WC, Barrett-Connor E, Fowler SE, Hamman RF, Lachin JM, Walker EA, et al. Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. N Engl J Med. 2002; 346:393–403. PMID: 11832527.
Article
8. Park JY, Rha SW, Choi B, Choi JW, Ryu SK, Kim S, et al. Impact of low dose atorvastatin on development of new-onset diabetes mellitus in Asian population: three-year clinical outcomes. Int J Cardiol. 2015; 184:502–506. PMID: 25756579.
Article
9. Rha SW, Choi BG, Seo HS, Park SH, Park JY, Chen KY, et al. Impact of statin use on development of new-onset diabetes mellitus in Asian population. Am J Cardiol. 2016; 117:382–387. PMID: 26732422.
Article
10. Almdal T, Scharling H, Jensen JS, Vestergaard H. The independent effect of type 2 diabetes mellitus on ischemic heart disease, stroke, and death: a population-based study of 13,000 men and women with 20 years of follow-up. Arch Intern Med. 2004; 164:1422–1426. PMID: 15249351.
11. Wilson PW, D'Agostino RB, Parise H, Sullivan L, Meigs JB. Metabolic syndrome as a precursor of cardiovascular disease and type 2 diabetes mellitus. Circulation. 2005; 112:3066–3072. PMID: 16275870.
Article
12. Abbasi A, Peelen LM, Corpeleijn E, van der Schouw YT, Stolk RP, Spijkerman AM, et al. Prediction models for risk of developing type 2 diabetes: systematic literature search and independent external validation study. BMJ. 2012; 345:e5900. PMID: 22990994.
Article
13. Schmidt MI, Duncan BB, Bang H, Pankow JS, Ballantyne CM, Golden SH, et al. Identifying individuals at high risk for diabetes: The Atherosclerosis Risk in Communities study. Diabetes Care. 2005; 28:2013–2018. PMID: 16043747.
14. Waljee AK, Higgins PD. Machine learning in medicine: a primer for physicians. Am J Gastroenterol. 2010; 105:1224–1226. PMID: 20523307.
Article
15. Obermeyer Z, Emanuel EJ. Predicting the future - big data, machine learning, and clinical medicine. N Engl J Med. 2016; 375:1216–1219. PMID: 27682033.
Article
16. Deo RC. Machine learning in medicine. Circulation. 2015; 132:1920–1930. PMID: 26572668.
Article
17. Darcy AM, Louie AK, Roberts LW. Machine learning and the profession of medicine. JAMA. 2016; 315:551–552. PMID: 26864406.
Article
18. Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults. Executive summary of the Third Report of the National Cholesterol Education Program (NCEP) Expert Panel on detection, evaluation, and treatment of high blood cholesterol in adults (Adult Treatment Panel III). JAMA. 2001; 285:2486–2497. PMID: 11368702.
19. Lai SW, Tan CK, Ng KC. Epidemiology of fatty liver in a hospital-based study in Taiwan. South Med J. 2002; 95:1288–1292. PMID: 12539995.
Article
20. Poggio ED, Wang X, Greene T, Van Lente F, Hall PM. Performance of the modification of diet in renal disease and Cockcroft-Gault equations in the estimation of GFR in health and in chronic kidney disease. J Am Soc Nephrol. 2005; 16:459–466. PMID: 15615823.
Article
21. Hall MA, Holmes G. Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans Knowl Data Eng. 2003; 15:1437–1447.
Article
22. Motwani M, Dey D, Berman DS, Germano G, Achenbach S, Al-Mallah MH, et al. Machine learning for prediction of all-cause mortality in patients with suspected coronary artery disease: a 5-year multicentre prospective registry analysis. Eur Heart J. 2017; 38:500–507. PMID: 27252451.
Article
23. Kotsiantis SB. Supervised machine learning: a review of classification techniques. Informatica. 2007; 31:249–268.
24. Aha DW, Kibler D, Albert MK. Instance-based learning algorithms. Machine Learning. 1991; 6:37–66.
Article
25. Molinaro AM, Simon R, Pfeiffer RM. Prediction error estimation: a comparison of resampling methods. Bioinformatics. 2005; 21:3301–3307. PMID: 15905277.
Article
26. Witten IH, Frank E, Hall MA, Pal CJ. Data mining: practical machine learning tools and techniques. 4th ed. Cambridge (MA): Morgan Kaufmann;2016.
27. Bradley AP. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 1997; 30:1145–1159.
Article
28. Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016; 316:2402–2410. PMID: 27898976.
Article
Full Text Links
  • YMJ
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr