Harnessing Language Models for Automated Detection of Depression Severity and Suicide Risk

Ren, Xinyang

Harnessing Language Models for Automated Detection of Depression Severity and Suicide Risk

dc.contributor.advisor	Cohen, Trevor A
dc.contributor.author	Ren, Xinyang
dc.date.accessioned	2026-02-05T19:29:30Z
dc.date.issued	2026-02-05
dc.date.submitted	2025
dc.description	Thesis (Ph.D.)--University of Washington, 2025
dc.description.abstract	Depression is one of the most common mental disorders globally, and can carry an increased risk of adverse outcomes including suicide. Suicide is one of the leading causes of death worldwide and many more individuals attempt it or experience suicidal thoughts. Compounding these severe public health problems is a longstanding shortage of mental health professionals. There are too many patients for available professionals to monitor effectively, presenting opportunities for the use of technology to expand their capacity. Natural language processing (NLP) methods have been widely applied to psychologically-related text analysis tasks to draw relationships between text and the thoughts and feelings of the person who generated it, as indicators of their mental status. Two prominent areas are the detection of depression symptom severity and suicide risk. In this work, I investigated how language models can be harnessed to automatically detect depression symptom severity and suicide risk. To get the numerical representation of the text for further analysis, one common method is to extract contextual embeddings from language models. Contextual embeddings are word representations that take surrounding context into account, which can better represent the complexities of linguistic expressions than models that represent the same word the same way, regardless of their context. However, there is limited research involving clinical populations that utilize contextual embeddings from state-of-the-art language models to detect linguistic indicators of depression and suicide risk. Moreover, certain patient-generated data sources that can reveal mental status, notably text-based therapy sessions, Google search logs, and YouTube activities, remain underexplored. Relevant research has primarily concentrated on electronic health record (EHR) data and social media posts, which are subject to certain limitations. Furthermore, despite the rapid development of large language models, their clinical application remains challenging due to high computational costs and ethical concerns. To fill these gaps, I have developed a series of methods for automatic depression symptom severity and suicide risk detection or prediction utilizing state-of-the-art language models with under-explored data sources. Specifically, I have analyzed the use of contextual embeddings of first-person singular pronouns as predictors of depression symptom severity. Positive classification results on a PHQ-9-derived binary outcome were obtained when applying the methods to the deidentifiedpsychotherapy messages. To explore the use of individualized web searches for suicide risk assessment, I have evaluated the effectiveness of anomaly detection methods in identifying search pattern changes that precede a suicide attempt using Google search data. The proposed framework for semantic feature construction, which consists of initial filtering with a small language model followed by adjudication with a more advanced large language model to assess relevance to suiciderelated constructs, provides a computationally efficient, tractable approach that can be applied to web search logs at scale. The methods were further applied to study participants’ YouTube activity data, which were combined with Google search logs in order to enhance anomaly detection performance. This work demonstrates the potential of effectively using language models for automatic prediction of depression symptom severity and detection of suicide risk using real-world datasets. It helps bridge the gap between advances in NLP and the growing need to enhance mental health service capacity, offering scalable computational tools for timely risk detection and intervention.
dc.embargo.lift	2031-01-10T19:29:30Z
dc.embargo.terms	Restrict to UW for 5 years -- then make Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Ren_washington_0250E_29081.pdf
dc.identifier.uri	https://hdl.handle.net/1773/55110
dc.language.iso	en_US
dc.rights	CC BY-NC-ND
dc.subject	Depression symptom severity
dc.subject	Large language model
dc.subject	Machine learning
dc.subject	Mental health informatics
dc.subject	Natural language processing
dc.subject	Suicide prevention
dc.subject	Artificial intelligence
dc.subject	Information science
dc.subject	Mental health
dc.subject.other	To Be Assigned
dc.title	Harnessing Language Models for Automated Detection of Depression Severity and Suicide Risk
dc.type	Thesis

Collections

To Be Assigned

Harnessing Language Models for Automated Detection of Depression Severity and Suicide Risk

Files

Collections