Analysis of Bangla dialect translation to Standard Bangla and English using Natural Language Processing (NLP)
Abstract
In an era of global communication and collaboration, the demand for effective language
translation applications has surged. This research paper delves into the realm of machine
learning (ML) to enhance the capabilities of language translation applications. The linguistic
diversity of the Bangla language, marked by numerous regional dialects, presents significant
challenges for automated language processing and translation systems. This study explores
the application of Natural Language Processing (NLP) techniques to translate various Bangla
dialects into Standard Bangla and subsequently into English. Due to the scarcity of publicly
available data on Bangla dialects, the whole dataset was manually collected from different
regions of Bangladesh, including Chittagong, Sylhet, Barishal and Khulna. By leveraging
corpus-based analysis and dialect normalization frameworks, this research aims to bridge the
gap between spoken dialects and their standard written forms. We designed and implemented
a custom translation model based on Deep Learning architectures, Machine Learning
architectures, incorporating domain-specific pre-training and fine-tuning strategies. Our
proposed model achieved an accuracy of 95% in translating regional dialects to both Standard
Bangla and English, outperforming several existing benchmarks. The study also discusses the
challenges of data annotation, linguistic pre-processing, and transliteration, and assesses the
impact of contextual embedding on translation quality. The results highlight the potential of
NLP to support language preservation, inclusive digital communication, and the development
of regionally-aware AI systems tailored to linguistically diverse communities.
Keywords
Collections
- 2021 - 2025 [125]