Speech Emotion Recognition Using Machine Learning Technique.

Haque, Md. Enamul; Islam, Md. Shakirul; Ahmed, Kawser

View/Open

EEE-200189.pdf (6.141Mb)

Date

2020-02-06

Author

Haque, Md. Enamul

Islam, Md. Shakirul

Ahmed, Kawser

Metadata

Show full item record

Abstract

Speech emotion recognition is a challenging problem partly because it is not clear what features are effective for the task. In this thesis, we propose a comparative study of speech emotion recognition (SER) systems. Theoretical definition, categorization of effective state and the modalities of emotion expression are presented. To achieve this study, we performed pre-processing necessary for emotion recognition from speech data on SER system, based on Multi-layer Perceptron (MLP) classifiers and MLP methods for features extraction, and this generates the training and testing datasets that contain the emotions of Neutral, Calm, Happy, Sad, Angry, Fearful, Disgust and Surprised. The MLP classifiers are then used for the classification stage in order to predict the emotion. Mel-frequency Cepstrum coefficients (MFCC), Chroma, and Mel features are extracted from the speech signals and used to train MLP classified with the Mel-frequency cepstral coefficient (MFCC) feature extraction algorithms & Stochastic L-BFGS algorithm. Bangla and RAVDESS databases are used as the experimental data set. This study shows that for RAVDESS database all classifiers achieve an accuracy of 53.89% and for Bangla database 45.83% when a speaker normalization (SN) and a feature selection are applied to the features. The demand for machines that can interact with its users through speech is growing. For example, four of the world’s largest IT companies; Amazon, Apple, Google and Microsoft, are developing intelligent personal assistants who are able to communicate through speech. In this thesis, we have investigated the effect of feature extraction when classifying emotions in speech, using a Artificial neural network (ANN). We used the "kernels" on Kaggle to extract sets of features from recorded audio, and compared the MLP classification accuracy of the sets with eight classes of emotions. We used one architecture of the ANN, to be fair when comparing each feature set. The ANN architecture was developed by an experimental approach. Using python in recent years, the working which requires human-machine interaction such as speech recognition, emotion recognition from speech recognition is increasing. Not only the speech recognition also the features during the conversation is studied like Melody, emotion, chunk, etc. It has been proven with the research that it can be reached meaningful results using prosodic features of speech. In addition, a Confusion Matrix (CM) technique is used to evaluate the performance of these classifiers. The proposed system is tested on RAVDESS and Bangla databases and has achieved a prediction rate of 70.89

URI

http://suspace.su.edu.bd/handle/123456789/1132

Collections

2021 - 2025 [171]