Nutr Res Pract.  2025 Apr;19(2):273-291. 10.4162/nrp.2025.19.2.273.

Plasma metabolite based clustering of breast cancer survivors and identification of dietary and health related characteristics: an application of unsupervised machine learning

Affiliations
  • 1Department of Food and Nutrition, College of Human Ecology, Seoul National University, Seoul 08826, Korea
  • 2Department of Surgery, Soonchunhyang University Bucheon Hospital, Soonchunhyang University College of Medicine, Bucheon 14584, Korea
  • 3Department of Surgery, Jeonbuk National University Medical School, Jeonju 54907, Korea
  • 4Department of Surgery, Keimyung University School of Medicine, Daegu 42601, Korea
  • 5Department of Surgery, Dankook University Hospital, Dankook University College of Medicine, Cheonan 31116, Korea
  • 6Department of Surgery, Chosun University Hospital, Chosun University College of Medicine, Gwangju 61453, Korea
  • 7Research Institute of Human Ecology, Seoul National University, Seoul 08826, Korea

Abstract

BACKGROUND/OBJECTIVES
This study aimed to use plasma metabolites to identify clusters of breast cancer survivors and to compare their dietary characteristics and health-related factors across the clusters using unsupervised machine learning.
SUBJECTS/METHODS
A total of 419 breast cancer survivors were included in this crosssectional study. We considered 30 plasma metabolites, quantified by high-throughput nuclear magnetic resonance metabolomics. Clusters were obtained based on metabolites using 4 different unsupervised clustering methods: k-means (KM), partitioning around medoids (PAM), self-organizing maps (SOM), and hierarchical agglomerative clustering (HAC). The t-test, χ2 test, and Fisher’s exact test were used to compare sociodemographic, lifestyle, clinical, and dietary characteristics across the clusters. P-values were adjusted through a false discovery rate (FDR).
RESULTS
Two clusters were identified using the 4 methods. Participants in cluster 2 had lower concentrations of apolipoprotein A1 and large high-density lipoprotein (HDL) particles and smaller HDL particle sizes, but higher concentrations of chylomicrons and extremely large very-low-density-lipoprotein (VLDL) particles and glycoprotein acetyls, a higher ratio of monounsaturated fatty acids to total fatty acids, and larger VLDL particle sizes compared with cluster 1. Body mass index was significantly higher in cluster 2 compared with cluster 1 (FDR adjusted-PKM < 0.001; PPAM = 0.001; PSOM < 0.001; and PHAC = 0.043).
CONCLUSION
The breast cancer survivors clustered on the basis of plasma metabolites had distinct characteristics. Further prospective studies are needed to investigate the associations between metabolites, obesity, dietary factors, and breast cancer prognosis.

Keyword

Breast cancer; East Asian people; metabolome; machine learning

Figure

  • Fig. 1 Silhouette index plot for the KM, PAM, SOM, and HAC clustering methods.KM, k-means; PAM, partitioning around medoids; SOM, self-organizing maps; HAC, hierarchical agglomerative clustering.

  • Fig. 2 t-SNE visualization of cluster assignments by KM, PAM, SOM, and HAC.t-SNE, t-distributed stochastic neighboring embedding; KM, k-means; C1, cluster 1; C2, cluster 2; PAM, partitioning around medoids; SOM, self-organizing maps; HAC, hierarchical agglomerative clustering.

  • Fig. 3 Boxplots of z-scores of the top 7 metabolites that contributed most to cluster formation.KM, k-means; VLDL size, average diameter for very-low-density lipoprotein particles; VLDL-C, very-low-density lipoprotein cholesterol; LDL size, average diameter for low-density lipoprotein particles; Total-TG, total-triglycerides; Omega-6%, ratio of omega-6 fatty acids to total fatty acids; MUFA, monounsaturated fatty acid; MUFA%, ratio of MUFA to total fatty acids; PAM, partitioning around medoids; XXL-VLDL-P, extremely large very-low-density lipoprotein particles; GlycA, glycoprotein acetyls; XS-VLDL-PL, phospholipids in very small very-low-density lipoprotein; SOM, self-organizing maps; L-HDL-P, large high-density lipoprotein particles; ApoA1, apolipoprotein A1; HDL size, average diameter for high-density lipoprotein particles; HAC, hierarchical agglomerative clustering; Total-PL, total-phospholipids in lipoprotein particles; XL-HDL-L, total lipids in very large high-density lipoprotein; LA, linoleic acid; C1, cluster 1; C2, cluster 2.


Reference

1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021; 71:209–249. PMID: 33538338.
Article
2. Park EH, Jung KW, Park NJ, Kang MJ, Yun EH, Kim HJ, Kim JE, Kong HJ, Im JS, Seo HG, et al. Cancer statistics in Korea: incidence, mortality, survival, and prevalence in 2021. Cancer Res Treat. 2024; 56:357–371. PMID: 38487832.
Article
3. Kwan ML, Weltzien E, Kushi LH, Castillo A, Slattery ML, Caan BJ. Dietary patterns and breast cancer recurrence and survival among women with early-stage breast cancer. J Clin Oncol. 2009; 27:919–926. PMID: 19114692.
Article
4. Chan DSM, Vieira AR, Aune D, Bandera EV, Greenwood DC, McTiernan A, Navarro Rosenblatt D, Thune I, Vieira R, Norat T. Body mass index and survival in women with breast cancer-systematic literature review and meta-analysis of 82 follow-up studies. Ann Oncol. 2014; 25:1901–1914. PMID: 24769692.
Article
5. Lahart IM, Metsios GS, Nevill AM, Carmichael AR. Physical activity, risk of death and recurrence in breast cancer survivors: a systematic review and meta-analysis of epidemiological studies. Acta Oncol. 2015; 54:635–654. PMID: 25752971.
Article
6. He J, Gu Y, Zhang S. Consumption of vegetables and fruits and breast cancer survival: a systematic review and meta-analysis. Sci Rep. 2017; 7:599. PMID: 28377568.
Article
7. Jayedi A, Emadi A, Khan TA, Abdolshahi A, Shab-Bidar S. Dietary fiber and survival in women with breast cancer: a dose-response meta-analysis of prospective cohort studies. Nutr Cancer. 2021; 73:1570–1580. PMID: 32795218.
Article
8. Gibney MJ, Walsh M, Brennan L, Roche HM, German B, van Ommen B. Metabolomics in human nutrition: opportunities and challenges. Am J Clin Nutr. 2005; 82:497–503. PMID: 16155259.
Article
9. Dunn WB, Ellis DI. Metabolomics: current analytical platforms and methodologies. Trends Analyt Chem. 2005; 24:285–294.
Article
10. Silva C, Perestrelo R, Silva P, Tomás H, Câmara JS. Breast cancer metabolomics: from analytical platforms to multivariate data analysis. a review. Metabolites. 2019; 9:102. PMID: 31121909.
Article
11. McCartney A, Vignoli A, Biganzoli L, Love R, Tenori L, Luchinat C, Di Leo A. Metabolomics in breast cancer: a decade in review. Cancer Treat Rev. 2018; 67:88–96. PMID: 29775779.
Article
12. Lécuyer L, Victor Bala A, Deschasaux M, Bouchemal N, Nawfal Triba M, Vasson MP, Rossary A, Demidem A, Galan P, Hercberg S, et al. NMR metabolomic signatures reveal predictive plasma metabolites associated with long-term risk of developing breast cancer. Int J Epidemiol. 2018; 47:484–494. PMID: 29365091.
Article
13. Yang L, Wang Y, Cai H, Wang S, Shen Y, Ke C. Application of metabolomics in the diagnosis of breast cancer: a systematic review. J Cancer. 2020; 11:2540–2551. PMID: 32201524.
Article
14. Asiago VM, Alvarado LZ, Shanaiah N, Gowda GA, Owusu-Sarfo K, Ballas RA, Raftery D. Early detection of recurrent breast cancer using metabolite profiling. Cancer Res. 2010; 70:8309–8318. PMID: 20959483.
Article
15. Jobard E, Pontoizeau C, Blaise BJ, Bachelot T, Elena-Herrmann B, Trédan O. A serum nuclear magnetic resonance-based metabolomic signature of advanced metastatic human breast cancer. Cancer Lett. 2014; 343:33–41. PMID: 24041867.
Article
16. Tenori L, Oakman C, Morris PG, Gralka E, Turner N, Cappadona S, Fornier M, Hudis C, Norton L, Luchinat C, et al. Serum metabolomic profiles evaluated after surgery may identify patients with oestrogen receptor negative early breast cancer at increased risk of disease recurrence. Results from a retrospective study. Mol Oncol. 2015; 9:128–139. PMID: 25151299.
Article
17. Eicher T, Kinnebrew G, Patt A, Spencer K, Ying K, Ma Q, Machiraju R, Mathé AEA. Metabolomics and multi-omics integration: a survey of computational methods and resources. Metabolites. 2020; 10:202. PMID: 32429287.
Article
18. Hartigan JA, Wong MA. Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc Ser C Appl Stat. 1979; 28:100–108.
Article
19. In : Schubert E, Rousseeuw PJ, editors. Faster k-medoids clustering: improving the PAM, CLARA, and CLARANS algorithms. Similarity Search and Applications, SISAP 2019; 2019 Oct 2-4; Newark, NJ, USA. Cham: Springer;2019.
20. Kohonen T. The self-organizing map. Neurocomputing. 1998; 21:1–6.
Article
21. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York (NY): Springer New York;2017. p. 520–527.
22. Giuliano AE, Connolly JL, Edge SB, Mittendorf EA, Rugo HS, Solin LJ, Weaver DL, Winchester DJ, Hortobagyi GN. Breast cancer-major changes in the American Joint Committee on Cancer eighth edition cancer staging manual. CA Cancer J Clin. 2017; 67:290–303. PMID: 28294295.
Article
23. Shin WK, Song S, Hwang E, Moon HG, Noh DY, Lee JE. Development of a FFQ for breast cancer survivors in Korea. Br J Nutr. 2016; 116:1781–1786. PMID: 27842613.
Article
24. Moon SE, Shin WK, Song S, Koh D, Ahn JS, Yoo Y, Kang M, Lee JE. Validity and reproducibility of a food frequency questionnaire for breast cancer survivors in Korea. Nutr Res Pract. 2022; 16:789–800. PMID: 36467770.
Article
25. Soininen P, Kangas AJ, Würtz P, Suna T, Ala-Korpela M. Quantitative serum nuclear magnetic resonance metabolomics in cardiovascular epidemiology and genetics. Circ Cardiovasc Genet. 2015; 8:192–206. PMID: 25691689.
Article
26. Nightingale Health Plc. Clinically validated biomarkers [Internet]. Helsinki: Nightingale Health Plc.;2024. cited 2024 December 30. Available from: https://nightingalehealth.com/uploads/documents/Nightingale-Blood-Analysis_List-of-Biomarkers.pdf .
27. Kettunen J, Demirkan A, Würtz P, Draisma HH, Haller T, Rawal R, Vaarhorst A, Kangas AJ, Lyytikäinen LP, Pirinen M, et al. Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of LPA. Nat Commun. 2016; 7:11122. PMID: 27005778.
Article
28. Holmes MV, Millwood IY, Kartsonaki C, Hill MR, Bennett DA, Boxall R, Guo Y, Xu X, Bian Z, Hu R, et al. Lipids, lipoproteins, and metabolites and risk of myocardial infarction and stroke. J Am Coll Cardiol. 2018; 71:620–632. PMID: 29420958.
Article
29. Ainsworth BE, Haskell WL, Herrmann SD, Meckes N, Bassett DR Jr, Tudor-Locke C, Greer JL, Vezina J, Whitt-Glover MC, Leon AS. 2011 Compendium of physical activities: a second update of codes and MET values. Med Sci Sports Exerc. 2011; 43:1575–1581. PMID: 21681120.
30. Rock CL, Doyle C, Demark-Wahnefried W, Meyerhardt J, Courneya KS, Schwartz AL, Bandera EV, Hamilton KK, Grant B, McCullough M, et al. Nutrition and physical activity guidelines for cancer survivors. CA Cancer J Clin. 2012; 62:243–274. PMID: 22539238.
Article
31. Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987; 20:53–65.
Article
32. Van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008; 9:2579–2605.
33. Breiman L. Random forests. Mach Learn. 2001; 45:5–32.
34. Hubert L, Arabie P. Comparing partitions. J Classif. 1985; 2:193–218.
Article
35. Willett W, Stampfer MJ. Total energy intake: implications for epidemiologic analyses. Am J Epidemiol. 1986; 124:17–27. PMID: 3521261.
Article
36. James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning. New York (NY): Springer;2013. p. 523.
37. Magkos F, Mohammed BS, Mittendorfer B. Effect of obesity on the plasma lipoprotein subclass profile in normoglycemic and normolipidemic men and women. Int J Obes. 2008; 32:1655–1664.
Article
38. Bogl LH, Kaye SM, Rämö JT, Kangas AJ, Soininen P, Hakkarainen A, Lundbom J, Lundbom N, Ortega-Alonso A, Rissanen A, et al. Abdominal obesity and circulating metabolites: a twin study approach. Metabolism. 2016; 65:111–121. PMID: 26892522.
Article
39. Kashkooli S, Choghakhori R, Hasanvand A, Abbasnezhad A. Effect of calcium and vitamin D co-supplementation on lipid profile of overweight/obese subjects: a systematic review and meta-analysis of the randomized clinical trials. Obes Med. 2019; 15:100124.
40. O’Sullivan A, Gibney MJ, Brennan L. Dietary intake patterns are reflected in metabolomic profiles: potential role in dietary assessment studies. Am J Clin Nutr. 2011; 93:314–321. PMID: 21177801.
Article
41. Schmidt JA, Rinaldi S, Ferrari P, Carayol M, Achaintre D, Scalbert A, Cross AJ, Gunter MJ, Fensom GK, Appleby PN, et al. Metabolic profiles of male meat eaters, fish eaters, vegetarians, and vegans from the EPIC-Oxford cohort. Am J Clin Nutr. 2015; 102:1518–1526. PMID: 26511225.
Article
42. Gibbons H, Carr E, McNulty BA, Nugent AP, Walton J, Flynn A, Gibney MJ, Brennan L. Metabolomic-based identification of clusters that reflect dietary patterns. Mol Nutr Food Res. 2017; 61:1601050.
Article
43. Lindqvist HM, Rådjursöga M, Malmodin D, Winkvist A, Ellegård L. Serum metabolite profiles of habitual diet: evaluation by 1H-nuclear magnetic resonance analysis. Am J Clin Nutr. 2019; 110:53–62. PMID: 31127814.
Article
44. Navarro SL, Tarkhan A, Shojaie A, Randolph TW, Gu H, Djukovic D, Osterbauer KJ, Hullar MA, Kratz M, Neuhouser ML, et al. Plasma metabolomics profiles suggest beneficial effects of a low-glycemic load dietary pattern on inflammation and energy metabolism. Am J Clin Nutr. 2019; 110:984–992. PMID: 31432072.
Article
45. Walker ME, Song RJ, Xu X, Gerszten RE, Ngo D, Clish CB, Corlin L, Ma J, Xanthakis V, Jacques PF, et al. Proteomic and metabolomic correlates of healthy dietary patterns: the Framingham Heart Study. Nutrients. 2020; 12:1476. PMID: 32438708.
Article
46. Wu Y, Li S, Wang W, Zhang D. Associations of dietary vitamin B1, vitamin B2, niacin, vitamin B6, vitamin B12 and folate equivalent intakes with metabolic syndrome. Int J Food Sci Nutr. 2020; 71:738–749. PMID: 31986943.
Article
47. Azadbakht L, Esmaillzadeh A. Red meat intake is associated with metabolic syndrome and the plasma C-reactive protein concentration in women. J Nutr. 2009; 139:335–339. PMID: 19074209.
Article
48. Tikkanen E, Kanerva N, Aittomaki V, Männistö S, Salomaa VV, Wurtz P. Fasting samples are not required for NMR metabolic profiling studies of cardiovascular disease risk: prospective data for 4,400 individuals profiled few weeks apart. Circulation. 2019; 140:A10212.
Full Text Links
  • NRP
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2025 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr