dc.description.abstract | Speech emotion recognition is a challenging problem partly because it is not clear what
features are effective for the task. In this thesis, we propose a comparative study of speech
emotion recognition (SER) systems. Theoretical definition, categorization of effective state
and the modalities of emotion expression are presented. To achieve this study, we performed
pre-processing necessary for emotion recognition from speech data on SER system, based on
Multi-layer Perceptron (MLP) classifiers and MLP methods for features extraction, and this
generates the training and testing datasets that contain the emotions of Neutral, Calm, Happy,
Sad, Angry, Fearful, Disgust and Surprised. The MLP classifiers are then used for the
classification stage in order to predict the emotion. Mel-frequency Cepstrum coefficients
(MFCC), Chroma, and Mel features are extracted from the speech signals and used to train
MLP classified with the Mel-frequency cepstral coefficient (MFCC) feature extraction
algorithms & Stochastic L-BFGS algorithm. Bangla and RAVDESS databases are used as the
experimental data set. This study shows that for RAVDESS database all classifiers achieve an
accuracy of 53.89% and for Bangla database 45.83% when a speaker normalization (SN) and
a feature selection are applied to the features. The demand for machines that can interact with
its users through speech is growing. For example, four of the world’s largest IT companies;
Amazon, Apple, Google and Microsoft, are developing intelligent personal assistants who are
able to communicate through speech. In this thesis, we have investigated the effect of feature
extraction when classifying emotions in speech, using a Artificial neural network (ANN). We
used the "kernels" on Kaggle to extract sets of features from recorded audio, and compared
the MLP classification accuracy of the sets with eight classes of emotions. We used one
architecture of the ANN, to be fair when comparing each feature set. The ANN architecture
was developed by an experimental approach. Using python in recent years, the working
which requires human-machine interaction such as speech recognition, emotion recognition
from speech recognition is increasing. Not only the speech recognition also the features during
the conversation is studied like Melody, emotion, chunk, etc. It has been proven with the
research that it can be reached meaningful results using prosodic features of speech. In
addition, a Confusion Matrix (CM) technique is used to evaluate the performance of these
classifiers. The proposed system is tested on RAVDESS and Bangla databases and has
achieved a prediction rate of 70.89 | en_US |