Dement Neurocogn Disord.  2024 Jul;23(3):146-160. 10.12779/dnd.2024.23.3.146.

Speech Emotion Recognition in People at High Risk of Dementia

Affiliations
  • 1Department of Silver Business, Sookmyung Women’s University, Seoul, Korea
  • 2Department of Communication Disorders, Korea Nazarene University, Cheonan, Korea
  • 3Baikal AI Co. Ltd., Seoul, Korea

Abstract

Background and Purpose
The emotions of people at various stages of dementia need to be effectively utilized for prevention, early intervention, and care planning. With technology available for understanding and addressing the emotional needs of people, this study aims to develop speech emotion recognition (SER) technology to classify emotions for people at high risk of dementia.
Methods
Speech samples from people at high risk of dementia were categorized into distinct emotions via human auditory assessment, the outcomes of which were annotated for guided deep-learning method. The architecture incorporated convolutional neural network, long short-term memory, attention layers, and Wav2Vec2, a novel feature extractor to develop automated speech-emotion recognition.
Results
Twenty-seven kinds of Emotions were found in the speech of the participants. These emotions were grouped into 6 detailed emotions: happiness, interest, sadness, frustration, anger, and neutrality, and further into 3 basic emotions: positive, negative, and neutral. To improve algorithmic performance, multiple learning approaches were applied using different data sources—voice and text—and varying the number of emotions. Ultimately, a 2-stage algorithm—initial text-based classification followed by voice-based analysis—achieved the highest accuracy, reaching 70%.
Conclusions
The diverse emotions identified in this study were attributed to the characteristics of the participants and the method of data collection. The speech of people at high risk of dementia to companion robots also explains the relatively low performance of the SER algorithm. Accordingly, this study suggests the systematic and comprehensive construction of a dataset from people with dementia.

Keyword

People at High Risk of Dementia; Speech Emotion Recognition; CNN+LSTM Algorithm; Deep Learning; Voice and Text Analysis
Full Text Links
  • DND
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr