Cyberbullying Detection and Sentiment Analysis in Bangla Social Media using Deep Learning Techniques
Abstract
Cyberbullying is an emergent threat in web-based social media platforms, where the use of
offensive and abusive language can impact the mental well-being and social health of users.
This issue is exacerbated in low-resource languages like Bangla, where the creation of
automated moderation systems is hindered by complex language structure, informal writing
style, and limited availability of annotated corpora. Bangla (or Bengali) is the most popular
language in the Indo-Aryan family of languages and one of the top seven languages by
number of speakers, with approximately 242 million native speakers, along with 43 to 44
million who speak it as a second language. A deep learning-based framework for
cyberbullying detection and sentiment analysis in Bangla social media text, to address the
better detection of harmful content in digital correspondence. This thesis has been made
possible by ideas drawn from a number of studies. A corpus of labeled sentences drawn
from Bangla comments was collected and preprocessed by normalizing noisy text,
removing noise, and tokenizing to deal with challenges, including spelling variation, code
mixing, and idiomatic expressions that are found in abundance on social platforms. Three
neural structures—LSTM, BiLSTM, and a transformer-based BanglaBERT model—were
employed for binary cyberbullying classification and sentiment polarity analysis to assess
the emotional orientation of text. The model was evaluated with standard metrics such as
accuracy, precision, recall, and F1-score. The experimental results show that the accuracy
of the LSTM model is 81.21%, and the BanglaBERT model has a higher 81.70% accuracy,
implying the efficiency of bidirectional context learning—results of the Task layer Models.
The BiLSTM model achieved 81.80% accuracy, 81.77% precision, 81.825% recall, and an
F1 score of 81.75%, demonstrating that RNN-based contextual representations are effective
for representing Bangla text. Our study results indicate that transformer-based models are
able to learn the semantic and contextual subtleties of Bangla social media language more
effectively than transformer-based approaches. This work is a step towards Bangla natural
language processing by verifying deep learning models and will serve as a practical
resource to build automated systems that can help ensure safer, responsible Bangla in online
communities.
Keywords: Cyberbullying detection; Low-resource language; Deep learning; Sentiment
analysis; LSTM; BiLSTM; BanglaBERT; Transformer models; Natural language
processing (NLP).
Collections
- 2021 - 2025 [184]