Harnessing Language Models for Automated Detection of Depression Severity and Suicide Risk

Ren, Xinyang

Harnessing Language Models for Automated Detection of Depression Severity and Suicide Risk

Date

2026-02-05

relationships.isAuthorOf

Ren, Xinyang

Abstract

Depression is one of the most common mental disorders globally, and can carry an increased risk of adverse outcomes including suicide. Suicide is one of the leading causes of death worldwide and many more individuals attempt it or experience suicidal thoughts. Compounding these severe public health problems is a longstanding shortage of mental health professionals. There are too many patients for available professionals to monitor effectively, presenting opportunities for the use of technology to expand their capacity. Natural language processing (NLP) methods have been widely applied to psychologically-related text analysis tasks to draw relationships between text and the thoughts and feelings of the person who generated it, as indicators of their mental status. Two prominent areas are the detection of depression symptom severity and suicide risk. In this work, I investigated how language models can be harnessed to automatically detect depression symptom severity and suicide risk. To get the numerical representation of the text for further analysis, one common method is to extract contextual embeddings from language models. Contextual embeddings are word representations that take surrounding context into account, which can better represent the complexities of linguistic expressions than models that represent the same word the same way, regardless of their context. However, there is limited research involving clinical populations that utilize contextual embeddings from state-of-the-art language models to detect linguistic indicators of depression and suicide risk. Moreover, certain patient-generated data sources that can reveal mental status, notably text-based therapy sessions, Google search logs, and YouTube activities, remain underexplored. Relevant research has primarily concentrated on electronic health record (EHR) data and social media posts, which are subject to certain limitations. Furthermore, despite the rapid development of large language models, their clinical application remains challenging due to high computational costs and ethical concerns. To fill these gaps, I have developed a series of methods for automatic depression symptom severity and suicide risk detection or prediction utilizing state-of-the-art language models with under-explored data sources. Specifically, I have analyzed the use of contextual embeddings of first-person singular pronouns as predictors of depression symptom severity. Positive classification results on a PHQ-9-derived binary outcome were obtained when applying the methods to the deidentifiedpsychotherapy messages. To explore the use of individualized web searches for suicide risk assessment, I have evaluated the effectiveness of anomaly detection methods in identifying search pattern changes that precede a suicide attempt using Google search data. The proposed framework for semantic feature construction, which consists of initial filtering with a small language model followed by adjudication with a more advanced large language model to assess relevance to suiciderelated constructs, provides a computationally efficient, tractable approach that can be applied to web search logs at scale. The methods were further applied to study participants’ YouTube activity data, which were combined with Google search logs in order to enhance anomaly detection performance. This work demonstrates the potential of effectively using language models for automatic prediction of depression symptom severity and detection of suicide risk using real-world datasets. It helps bridge the gap between advances in NLP and the growing need to enhance mental health service capacity, offering scalable computational tools for timely risk detection and intervention.