Evaluating Prompt Engineering Techniques for  Low Resource Language NLP Tasks: A Case Study  on Bangla Emotion Recognition

Al Amin

View/Open

CSE- 250266.pdf (1.821Mb)

Date

2025-02-12

Author

Al Amin

Metadata

Show full item record

Abstract

Emotion detection in low-resource languages remains an underexplored area in natural lan- guage processing (NLP). This study develops a unified framework for detecting emotions in Bangla, Banglish (code-mixed Bangla-English), English, and multilingual texts using a diverse set of language models, including Bangla-specialized transformers, code-mixed models, and instruction-tuned multilingual LLMs. By integrating DU-BEC and BTEd datasets with lexicon- guided augmentation from EmoLex-BN, the framework provides robust supervision across six canonical emotions. A modular pipeline automates preprocessing, synthetic augmentation, and model-agnostic training, enabling systematic comparison across monolingual, code-mixed, and multilingual settings. Experimental results demonstrate that traditional machine learning approaches (TF-IDF + Logistic Regression) achieve the best performance with macro F1-score of 0.357, significantly outperforming fine-tuned transformers (0.088–0.106 F1) and direct LLM prompting (0.000 F1). A novel translation based LLM approach achieved 0.232 F1-score, representing the first suc- cessful zero-shot emotion classification for Bangla without labeled training data. Bangla-native transformers excel in supervised in-domain tasks, code-mixed models outperform in Banglish contexts, and multilingual LLMs achieve strong zero-shot cross-lingual generalization when combined with translation pipelines. This work establishes the first comprehensive benchmark for emotion detection across Bangla, Banglish, and multilingual texts, providing reproducible pipelines, datasets, and evaluation met- rics that advance low resource and cross-lingual affective computing. The findings demonstrate that increased model complexity does not guarantee better performance under severe data con- straints, and that simple, well designed supervised methods remain highly effective for low- resource language NLP. Keywords: Bangla NLP, Emotion Detection, Code-Mixed Language, Low Resource Language, Large Language Models, Prompt Engineering, Cross Lingual Transfer, Lexicon-Based Augmentation, DU BEC, BTEd, EmoLex-BN, Multi-label Classification, Traditional Machine Learning.

URI

http://suspace.su.edu.bd/handle/123456789/2612

Collections

2021 - 2025 [184]