Enhancing Transformer Models for Dialogue Summarization

Lu, Bo Ru

Enhancing Transformer Models for Dialogue Summarization

Files

Lu_washington_0250E_27440.pdf (1.76 MB)

Date

2024-10-16

relationships.isAuthorOf

Lu, Bo Ru

Abstract

Understanding and generating summaries of conversational speech and text are crucial for various applications such as virtual assistants, customer service calls, sales calls, and doctor-patient consultations. Transcribed human-human dialogues often lack explicit structure, making them time-consuming to read and challenging to skim for essential information. This thesis explores the use of natural language processing (NLP) techniques to automatically summarize two-party conversations, improving the efficiency of under- standing large volumes of dialogue data. We address the limitations of existing summarization methods by proposing innovative approaches aimed at improving performance and reducing the cost of transformer models. We introduce two primary types of summarization tasks: corpus-level and conversation-level. Corpus- level summarization aims to provide an overview of speaker behaviors across multiple dialogues by using a graph-based approach, which simplifies the understanding and summarizing of large dialogue corpora. This method highlights common and distinct subdialogues, and can potentially aid in the refinement of agent training and the enhancement of call center analytics. On the other hand, conversation-level summarization focuses on individual dialogues, producing structured summaries that capture essential details for follow-up interactions. These summaries are particularly useful in domains such as customer service and healthcare, where accurate and efficient information extraction is critical. To address the challenges of summarizing large and complex dialogues, we propose new modeling algorithms and a data collection framework. We enhance transformer models by incorporating additional structured information derived from dialogue graphs and expert-designed schemas, improving both performance and efficiency. Furthermore, we demonstrate that smaller encoder-decoder models can outperform larger decoder-only models in specialized domains, offering faster inference speeds and comparable performance. Our human-language model collaborative data synthesis framework also increases annotation efficiency and reduces costs, particularly for proprietary data. Through these contributions, this thesis advances the field of conversational AI, providing more effective and efficient methods for summarizing dialogues.