| dc.description.abstract | Emotion detection in low-resource languages remains an underexplored area in natural lan- guage
processing (NLP). This study develops a unified framework for detecting emotions in Bangla,
Banglish (code-mixed Bangla-English), English, and multilingual texts using a diverse set of language
models, including Bangla-specialized transformers, code-mixed models, and instruction-tuned
multilingual LLMs. By integrating DU-BEC and BTEd datasets with lexicon- guided augmentation
from EmoLex-BN, the framework provides robust supervision across six canonical emotions. A
modular pipeline automates preprocessing, synthetic augmentation, and model-agnostic training,
enabling systematic comparison across monolingual, code-mixed, and multilingual settings.
Experimental results demonstrate that traditional machine learning approaches (TF-IDF + Logistic
Regression) achieve the best performance with macro F1-score of 0.357, significantly outperforming
fine-tuned transformers (0.088–0.106 F1) and direct LLM prompting (0.000 F1). A novel translation
based LLM approach achieved 0.232 F1-score, representing the first suc- cessful zero-shot emotion
classification for Bangla without labeled training data. Bangla-native transformers excel in supervised
in-domain tasks, code-mixed models outperform in Banglish contexts, and multilingual LLMs
achieve strong zero-shot cross-lingual generalization when combined with translation pipelines. This
work establishes the first comprehensive benchmark for emotion detection across Bangla, Banglish, and
multilingual texts, providing reproducible pipelines, datasets, and evaluation met- rics that advance low
resource and cross-lingual affective computing. The findings demonstrate that increased model
complexity does not guarantee better performance under severe data con- straints, and that simple, well
designed supervised methods remain highly effective for low- resource language NLP.
Keywords: Bangla NLP, Emotion Detection, Code-Mixed Language, Low Resource Language, Large
Language Models, Prompt Engineering, Cross Lingual Transfer, Lexicon-Based Augmentation, DU
BEC, BTEd, EmoLex-BN, Multi-label Classification, Traditional Machine Learning. | en_US |