A Reliable and Efficient Approach to Suicidal Ideation Detection in a Low Resource Language
Abstract
Suicide is an endemic and disastrous global public health issue, necessitating the creation of
scalable and forward-looking early detection methods beyond conventional clinical
frameworks. Despite remarkable computational progress in high-resource languages such as
English, the vast Bangla (Bengali) speaker population, ranging between 250 and 290 million
worldwide, is underrepresented severely due to an existing computational imbalance
characterized by data scarcity, inadequate linguistic content, and inherent problems such as
affluent morphological richness, which hinders standard Natural Language Processing (NLP)
methods. This research fills this technology gap by developing, evaluating, and rigorously
validating a highly accurate, effective, and operationally robust Bangla Suicide Risk
Classification system from user-generated digital text with real-world applicability in low
resource healthcare environments. Empirically confirming its assertions through an elite,
clinically annotated corpus, this research demonstrates that Character Ngram TF-IDF
Vectorization is the optimal feature engineering method, outperforming word-level
embeddings by being more adept at dealing with data sparsity. Massive benchmarking across
thirteen disparate Machine Learning (ML) and Deep Learning (DL) models obviates the
critical Deployment Paradox, signifying a trade-off between predictive performance and
computational cost. The best safety performance (Recall: 0.9280, 92 False Negatives) was
achieved by the Bi-directional Long Short-Term Memory (BiLSTM) model but at the
expense of crippling latency (5.23 seconds), rendering it useless for real-time triage. On the
other hand, the light-weight RidgeClassifier (RC) with the same feature representation
obtained an equivalent Recall of 0.9170 (106 False Negatives) with near zero latency (0.001
seconds), which is the Optimal Deployable Triage System for large-scale real-time
intervention. This paper highlights that interpretable and computationally efficient ML
models can outperform state-of-the-art DL architectures in real-world deployment scenarios.
Besides, it encourages ethical deployment with interpretable feature weights and Dynamic
Threshold Tuning (Human-in-the-Loop) for system sensitivity tuning to adapt to changes in
resources in an effort to ensure a sustainable, safe, and effective suicide prevention tool for
the Bangla-speaking populations of the world.
Collections
- 2021 - 2025 [184]