Towards Real-Time Emotion Analytics: Integrating  Facial Landmarks and Speech Prosody

Islam, Md. Wahidul

View/Open

CSE-250232.pdf (2.211Mb)

Date

2025-05-19

Author

Islam, Md. Wahidul

Metadata

Show full item record

Abstract

Emotion recognition has garnered significant attention in fields such as mental health, human computer interaction, and personalized services. This research explores a multimodal approach to emotion recognition by integrating facial expression analysis and speech prosody to achieve a more accurate and context-sensitive understanding of human emotions. A distinctive aspect of this study is the creation of a custom video dataset designed specifically for facial expression recognition, which captures a wide range of emotional states under various real-world conditions. In parallel, speech emotion detection is performed using publicly available audio datasets, which analyze features such as pitch, tone, and rhythm to discern emotions expressed vocally. The facial expression recognition is based on Convolutional Neural Networks (CNNs), which extract visual features from the video data, while the emotional cues in speech are analyzed using Long Short-Term Memory (LSTM) networks. By combining these modalities, this research addresses the limitations commonly faced by unimodal systems, such as the challenges posed by noisy environments or occluded faces. The findings demonstrate that the integration of facial and auditory data significantly improves emotion classification accuracy, particularly in real-time applications. This research advances the field of affective computing by highlighting the complementary strengths of visual and auditory emotion cues and offers practical implications for applications in customer service, virtual assistants, and mental health diagnostics.

URI

http://suspace.su.edu.bd/handle/123456789/1581

Collections

2021 - 2025 [125]