Analysis of Bangla dialect translation to  Standard Bangla and English using Natural  Language Processing (NLP)

Haque, Mustafa Mohammad Nashwan

View/Open

CSE-250224.pdf (2.838Mb)

Date

2025-05-19

Author

Haque, Mustafa Mohammad Nashwan

Metadata

Show full item record

Abstract

In an era of global communication and collaboration, the demand for effective language translation applications has surged. This research paper delves into the realm of machine learning (ML) to enhance the capabilities of language translation applications. The linguistic diversity of the Bangla language, marked by numerous regional dialects, presents significant challenges for automated language processing and translation systems. This study explores the application of Natural Language Processing (NLP) techniques to translate various Bangla dialects into Standard Bangla and subsequently into English. Due to the scarcity of publicly available data on Bangla dialects, the whole dataset was manually collected from different regions of Bangladesh, including Chittagong, Sylhet, Barishal and Khulna. By leveraging corpus-based analysis and dialect normalization frameworks, this research aims to bridge the gap between spoken dialects and their standard written forms. We designed and implemented a custom translation model based on Deep Learning architectures, Machine Learning architectures, incorporating domain-specific pre-training and fine-tuning strategies. Our proposed model achieved an accuracy of 95% in translating regional dialects to both Standard Bangla and English, outperforming several existing benchmarks. The study also discusses the challenges of data annotation, linguistic pre-processing, and transliteration, and assesses the impact of contextual embedding on translation quality. The results highlight the potential of NLP to support language preservation, inclusive digital communication, and the development of regionally-aware AI systems tailored to linguistically diverse communities. Keywords

URI

http://suspace.su.edu.bd/handle/123456789/1573

Collections

2021 - 2025 [184]