A Reliable and Efficient Approach to  Suicidal Ideation Detection in a Low Resource Language

Jahangir, Hussen

View/Open

CSE- 250273.pdf (1.932Mb)

Date

2025-01-12

Author

Jahangir, Hussen

Metadata

Show full item record

Abstract

Suicide is an endemic and disastrous global public health issue, necessitating the creation of scalable and forward-looking early detection methods beyond conventional clinical frameworks. Despite remarkable computational progress in high-resource languages such as English, the vast Bangla (Bengali) speaker population, ranging between 250 and 290 million worldwide, is underrepresented severely due to an existing computational imbalance characterized by data scarcity, inadequate linguistic content, and inherent problems such as affluent morphological richness, which hinders standard Natural Language Processing (NLP) methods. This research fills this technology gap by developing, evaluating, and rigorously validating a highly accurate, effective, and operationally robust Bangla Suicide Risk Classification system from user-generated digital text with real-world applicability in low resource healthcare environments. Empirically confirming its assertions through an elite, clinically annotated corpus, this research demonstrates that Character Ngram TF-IDF Vectorization is the optimal feature engineering method, outperforming word-level embeddings by being more adept at dealing with data sparsity. Massive benchmarking across thirteen disparate Machine Learning (ML) and Deep Learning (DL) models obviates the critical Deployment Paradox, signifying a trade-off between predictive performance and computational cost. The best safety performance (Recall: 0.9280, 92 False Negatives) was achieved by the Bi-directional Long Short-Term Memory (BiLSTM) model but at the expense of crippling latency (5.23 seconds), rendering it useless for real-time triage. On the other hand, the light-weight RidgeClassifier (RC) with the same feature representation obtained an equivalent Recall of 0.9170 (106 False Negatives) with near zero latency (0.001 seconds), which is the Optimal Deployable Triage System for large-scale real-time intervention. This paper highlights that interpretable and computationally efficient ML models can outperform state-of-the-art DL architectures in real-world deployment scenarios. Besides, it encourages ethical deployment with interpretable feature weights and Dynamic Threshold Tuning (Human-in-the-Loop) for system sensitivity tuning to adapt to changes in resources in an effort to ensure a sustainable, safe, and effective suicide prevention tool for the Bangla-speaking populations of the world.

URI

http://suspace.su.edu.bd/handle/123456789/2618

Collections

2021 - 2025 [184]