Cyberbullying Detection and Sentiment  Analysis in Bangla Social Media using Deep  Learning Techniques

Md., Shihab Mia

View/Open

CSE- 250272.pdf (1.925Mb)

Date

2025-01-12

Author

Md., Shihab Mia

Metadata

Show full item record

Abstract

Cyberbullying is an emergent threat in web-based social media platforms, where the use of offensive and abusive language can impact the mental well-being and social health of users. This issue is exacerbated in low-resource languages like Bangla, where the creation of automated moderation systems is hindered by complex language structure, informal writing style, and limited availability of annotated corpora. Bangla (or Bengali) is the most popular language in the Indo-Aryan family of languages and one of the top seven languages by number of speakers, with approximately 242 million native speakers, along with 43 to 44 million who speak it as a second language. A deep learning-based framework for cyberbullying detection and sentiment analysis in Bangla social media text, to address the better detection of harmful content in digital correspondence. This thesis has been made possible by ideas drawn from a number of studies. A corpus of labeled sentences drawn from Bangla comments was collected and preprocessed by normalizing noisy text, removing noise, and tokenizing to deal with challenges, including spelling variation, code mixing, and idiomatic expressions that are found in abundance on social platforms. Three neural structures—LSTM, BiLSTM, and a transformer-based BanglaBERT model—were employed for binary cyberbullying classification and sentiment polarity analysis to assess the emotional orientation of text. The model was evaluated with standard metrics such as accuracy, precision, recall, and F1-score. The experimental results show that the accuracy of the LSTM model is 81.21%, and the BanglaBERT model has a higher 81.70% accuracy, implying the efficiency of bidirectional context learning—results of the Task layer Models. The BiLSTM model achieved 81.80% accuracy, 81.77% precision, 81.825% recall, and an F1 score of 81.75%, demonstrating that RNN-based contextual representations are effective for representing Bangla text. Our study results indicate that transformer-based models are able to learn the semantic and contextual subtleties of Bangla social media language more effectively than transformer-based approaches. This work is a step towards Bangla natural language processing by verifying deep learning models and will serve as a practical resource to build automated systems that can help ensure safer, responsible Bangla in online communities. Keywords: Cyberbullying detection; Low-resource language; Deep learning; Sentiment analysis; LSTM; BiLSTM; BanglaBERT; Transformer models; Natural language processing (NLP).

URI

http://suspace.su.edu.bd/handle/123456789/2617

Collections

2021 - 2025 [184]