J Korean Med Sci.  2025 Mar;40(11):e26. 10.3346/jkms.2025.40.e26.

Explainability Enhanced Machine Learning Model for Classifying Intellectual Disability and AttentionDeficit/Hyperactivity Disorder With Psychological Test Reports

Affiliations
  • 1Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul, Korea
  • 2Department of Pediatrics, Uijeongbu St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Uijeongbu, Korea
  • 3Wellysis Corp., Seoul, Korea
  • 4Department of Psychiatry, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
  • 5Department of Medical Sciences, College of Medicine, The Catholic University of Korea, Seoul, Korea
  • 6CMC Institute for Basic Medical Science, The Catholic Medical Center of The Catholic University of Korea, Seoul, Korea

Abstract

Background
Psychological test reports are essential in assessing intellectual functioning, aiding in diagnosing and treating intellectual disability (ID) and attention-deficit/ hyperactivity disorder (ADHD). However, these reports can have several problems because they are diverse, unstructured, subjective, and involve human errors. Additionally, physicians often do not read the entire report, and the number of reports is lower than that of diagnoses.
Methods
We developed explainable predictive models for classifying IDs and ADHDs based on written reports to address these issues. The reports of 1,475 patients with IDs and ADHDs who underwent intelligence tests were used for the models. These models were developed by analyzing reports using natural language processing (NLP) and incorporating the physician’s diagnosis for each report. We selected n-gram features from the models’ results by extracting important features using SHapley Additive exPlanations and permutation importance to make the models explainable. Developing the n-gram feature-based original text search system compensated for the lack of human readability caused by NLP and enabled the reconstruction of human-readable texts from the selected n-gram features.
Results
The maximum model accuracy was 0.92, and the 80 human-readable texts were restored from four models.
Conclusion
The results showed that the models could accurately classify IDs and ADHDs, even with a few reports. The models were also able to explain their predictions. The explainability-enhanced model can help physicians understand the classification process of IDs and ADHDs and provide evidence-based insights.

Keyword

Neurodevelopmental Disorder; Intellectual Disability; Attention-Deficit Hyperactivity Disorder Psychological Test Reports; Natural Language Processing; Machine Learning; Explainable Model

Figure

  • Fig. 1 Flowchart of developing explainable predictive models for classifying intellectual disabilities and attention-deficit/hyperactivity disorders. (A) Data pre-processing, (B) natural language processing, (C) classification model development, (D) explainable model development.POS = part of speech, KoNLPy = Korean natural language process in Python, BoW = Bag-of-Words, TF = text frequency, IDF = inverse document frequency, SHAP = SHapley Additive exPlanations, PI = permutation importance, NOTS = n-gram feature-based original text search.

  • Fig. 2 Example of an English-translated version of a report.

  • Fig. 3 Flowchart of the explainable model development. (A) Extract the top 30 important features from each method. (B) Select ten n-gram features from the top 30 important features. (C) Insert one of the ten n-gram features in the n-gram feature-based original text search system. (D) Check if the words of the n-gram feature are in the report. (E) Check if the words in the n-gram feature are in order. (F) Search texts with no other principal POS between the words of the n-gram feature. (G) Restore the original human-readable text.NB = Naïve Bayes, RF = random forest, XGB = eXtreme Gradient Boosting, LGBM = Light Gradient Boosting Machine, SHAP = SHapley Additive exPlanations, PI = permutation importance, IF = important features, POS = part of speech.


Reference

1. Morris-Rosendahl DJ, Crocq MA. Neurodevelopmental disorders-the history and future of a diagnostic concept. Dialogues Clin Neurosci. 2020; 22(1):65–72. PMID: 32699506.
2. Vissers LELM, Gilissen C, Veltman JA. Genetic studies in intellectual disability and related disorders. Nat Rev Genet. 2016; 17(1):9–18. PMID: 26503795.
3. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition, Text Revision (DSM-5-TR). Washington, D.C., USA: American Psychiatric Association Publishing;2022.
4. Shea SE. Mental retardation in children ages 6 to 16. Semin Pediatr Neurol. 2006; 13(4):262–270. PMID: 17178356.
5. Aman MG, Buican B, Arnold LE. Methylphenidate treatment in children with borderline IQ and mental retardation: analysis of three aggregated studies. J Child Adolesc Psychopharmacol. 2003; 13(1):29–40. PMID: 12804124.
6. Brandon CL, Marinelli M, White FJ. Adolescent exposure to methylphenidate alters the activity of rat midbrain dopamine neurons. Biol Psychiatry. 2003; 54(12):1338–1344. PMID: 14675797.
7. Hässler F, Thome J. Mental retardation and ADHD. Z Kinder Jugendpsychiatr Psychother. 2012; 40(2):83–93. PMID: 22354492.
8. Handen BL, McAuliffe S, Janosky J, Feldman H, Breaux AM. A playroom observation procedure to assess children with mental retardation and ADHD. J Abnorm Child Psychol. 1988; 26(4):269–277.
9. Aman MG, Kern RA, McGhee DE, Arnold LE. Fenfluramine and methylphenidate in children with mental retardation and ADHD: clinical and side effects. J Am Acad Child Adolesc Psychiatry. 1993; 32(4):851–859. PMID: 8340309.
10. Handen BL, Breaux AM, Janosky J, McAuliffe S, Feldman H, Gosling A. Effects and noneffects of methylphenidate in children with mental retardation and ADHD. J Am Acad Child Adolesc Psychiatry. 1992; 31(3):455–461. PMID: 1592777.
11. Vittengl JR, Jarrett RB, Ro E, Clark LA. Evaluating a comprehensive model of euthymia. Psychother Psychosom. 2023; 92(2):133–138. PMID: 36917971.
12. Kaufman AS, Raiford SE, Coalson DL. Intelligent Testing With the WISC-V. Hoboken, NJ, USA: Wiley;2016.
13. Styck KM, Walsh SM. Evaluating the prevalence and impact of examiner errors on the Wechsler scales of intelligence: a meta-analysis. Psychol Assess. 2016; 28(1):3–17. PMID: 26011479.
14. Institute of Medicine. Psychological Testing in the Service of Disability Determination. Washington, D.C., USA: National Academies Press;2015.
15. Belk MS, LoBello SG, Ray GE, Zachar P. WISC-III administration, clerical, and scoring errors made by student examiners. J Psychoeduc Assess. 2002; 20(3):290–300.
16. Slate JR, Hunnicutt LC. Examiner errors on the Wechsler scales. J Psychoeduc Assess. 1988; 6(3):280–288.
17. Angelov PP, Soares EA, Jiang R, Arnold NI, Atkinson PM. Explainable artificial intelligence: an analytical review. Wiley Interdiscip Rev Data Min Knowl Discov. 2021; 11(5):e1424.
18. Nauta M, Trienes J, Pathak S, Nguyen E, Peters M, Schmitt Y, et al. From anecdotal evidence to quantitative evaluation methods: a systematic review on evaluating explainable AI. ACM Computing Surveys. 2023; 55(13s):1–42.
19. Loh HW, Ooi CP, Seoni S, Barua PD, Molinari F, Acharya UR. Application of explainable artificial intelligence for healthcare: a systematic review of the last decade (2011-2022). Comput Methods Programs Biomed. 2022; 226:107161. PMID: 36228495.
20. Mermin-Bunnell K, Zhu Y, Hornback A, Damhorst G, Walker T, Robichaux C, et al. Use of natural language processing of patient-initiated electronic health record messages to identify patients with COVID-19 infection. JAMA Netw Open. 2023; 6(7):e2322299. PMID: 37418261.
21. Jiang L, Zhang H, Cai Z. A novel bayes model: hidden naive bayes. IEEE Trans Knowl Data Eng. 2009; 21(10):1361–1371.
22. Breiman L. Random forests. Mach Learn. 2001; 45(1):5–32.
23. Sheridan RP, Wang WM, Liaw A, Ma J, Gifford EM. Extreme gradient boosting as a method for quantitative structure-activity relationships. J Chem Inf Model. 2016; 56(12):2353–2360. PMID: 27958738.
24. Fan J, Ma X, Wu L, Zhang F, Yu X, Zeng W. Light Gradient Boosting Machine: an efficient soft computing model for estimating daily reference evapotranspiration with local and external meteorological data. Agric Water Manage. 2019; 225:105758.
25. Shahriari B, Swersky K, Wang Z, Adams RP, De Freitas N. Taking the human out of the loop: a review of Bayesian optimization. Proc IEEE Inst Electr Electron Eng. 2016; 104(1):148–175.
26. Vega García M, Aznarte JL. Shapley additive explanations for NO2 forecasting. Ecol Inform. 2020; 56:101039.
27. Altmann A, Toloşi L, Sander O, Lengauer T. Permutation importance: a corrected feature importance measure. Bioinformatics. 2010; 26(10):1340–1347. PMID: 20385727.
28. Joshi P, Santy S, Budhiraja A, Bali K, Choudhury M. The state and fate of linguistic diversity and inclusion in the NLP world. In : Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics; 2020 July 5–10; Kerrville, TX, USA: Association for Computational Linguistics;2020. p. 6282–6293.
29. Lee S, Jang H, Baik Y, Park S, Shin H. A small-scale Korean-specific BERT language model. J KIISE. 2020; 47(7):682–692.
30. Park K, Lee J, Jang S, Jung D. An empirical study of tokenization strategies for various Korean NLP tasks. In : Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing; 2020 December 4–7; Suzhou, China. Kerrville, TX, USA: Association for Computational Linguistics;2020. p. 133–142.
31. Park OE, Cho S. KoNLPy: Korean natural language processing in Python. In : Proceedings of the 26th Korean Language and Korean Language Information Processing Conference of the Annual Conference on Human and Language Technology in 2014; 2014 October 10–11; Chuncheon, Korea. Seoul, Korea: Human and Language Technology;2014. p. 133–136.
32. Matteson A, Lee C, Kim Y, Lim H. Rich character-level information for Korean morphological analysis and part-of-speech tagging. In : Proceedings of the 27th International Conference on Computational Linguistics; 2018 August 20–26; Santa Fe, NM, USA. Kerrville, TX, USA: Association for Computational Linguistics;2018. p. 2482–2492.
33. Aizawa A. An information-theoretic perspective of tf–idf measures. Inf Process Manage. 2003; 39(1):45–65.
34. Goodman-Meza D, Shover CL, Medina JA, Tang AB, Shoptaw S, Bui AA. Development and validation of machine models using natural language processing to classify substances involved in overdose deaths. JAMA Netw Open. 2022; 5(8):e2225593. PMID: 35939303.
35. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In : Proceedings of the 31st International Conference on Neural Information Processing Systems; 2017 December 4–9; Long Beach, CA, USA. Red Hook, NY, USA: Curran Associates Inc.;2017. p. 6000–6010.
36. Massaoudi M, Refaat SS, Chihi I, Trabelsi M, Oueslati FS, Abu-Rub H. A novel stacked generalization ensemble-based hybrid LGBM-XGB-MLP model for Short-Term Load Forecasting. Energy. 2021; 214:118874.
37. Shi R, Xu X, Li J, Li Y. Prediction and analysis of train arrival delay based on XGBoost and Bayesian optimization. Appl Soft Comput. 2021; 109:107538.
38. Močkus J. On Bayesian methods for seeking the extremum. In : Proceedings of Optimization Techniques IFIP Technical Conference Novosibirsk; 1974 July 1–7; Berlin, Germany: Springer;1975. p. 400–404.
39. Parikh R, Mathai A, Parikh S, Chandra Sekhar G, Thomas R. Understanding and using sensitivity, specificity and predictive values. Indian J Ophthalmol. 2008; 56(1):45–50. PMID: 18158403.
40. Trevethan R. Sensitivity, specificity, and predictive values: foundations, pliabilities, and pitfalls in research and practice. Front Public Health. 2017; 5:307. PMID: 29209603.
41. Ramage D, Manning CD, Dumais S. Partially labeled topic models for interpretable text mining. In : Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2011 August 21–24; San Diego, CA, USA. New York, NY, USA: Association for Computing Machinery;2011. p. 457–465.
42. Lewis DD. Naive (Bayes) at forty: the independence assumption in information retrieval. In : Proceedings of Machine Learning: ECML-98 (10th European Conference on Machine Learning); 1998 April 21–23; Chemnitz, Germany. Berlin, Germany: Springer;1998. p. 4–15.
43. Kulkarni VY, Sinha PK. Pruning of random forest classifiers: a survey and future directions. In : Proceedings of 2012 International Conference on Data Science & Engineering (ICDSE); 2012 July 18–20; Cochin, India. New York, NY, USA: Institute of Electrical and Electronics Engineers (IEEE);2012. p. 64–68.
Full Text Links
  • JKMS
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2025 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr