A Versatility Analysis: Investigating Large Language Models' Performance Beyond Conventional Benchmarks

LI, CHENXI

A Versatility Analysis: Investigating Large Language Models' Performance Beyond Conventional Benchmarks

dc.contributor.advisor	Xia, Fei
dc.contributor.author	LI, CHENXI
dc.date.accessioned	2024-09-09T23:12:02Z
dc.date.available	2024-09-09T23:12:02Z
dc.date.issued	2024-09-09
dc.date.submitted	2024
dc.description	Thesis (Master's)--University of Washington, 2024
dc.description.abstract	Recent progress in large language models (LLMs) has marked a notable milestone in the field of artificial intelligence. The conventional evaluation of LLMs primarily relies on existing tasks and benchmarks, raising concerns about test set contamination and the genuine comprehension abilities of LLMs. This study introduces a novel approach by developing unique datasets to circumvent potential data contamination issues and scrutinizes LLMs' adaptability to new tasks, their sensitivity to prompt variations, and their error tendencies. We investigate the capacity of LLMs to adapt to new but simple tasks, especially when they diverge from the models' pre-existing knowledge. Our methodology emphasizes the creation of straightforward tasks, facilitating a precise error analysis to uncover the underlying causes of LLM failures. This strategic approach also aims to uncover effective strategies for enhancing LLM performance based on the detailed error analysis of system output.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	LI_washington_0250O_26616.pdf
dc.identifier.uri	https://hdl.handle.net/1773/52075
dc.language.iso	en_US
dc.rights	none
dc.subject	Evaluation
dc.subject	Large Language Model
dc.subject	Robustness
dc.subject	Linguistics
dc.subject	Computer science
dc.subject.other	Linguistics
dc.title	A Versatility Analysis: Investigating Large Language Models' Performance Beyond Conventional Benchmarks
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: LI_washington_0250O_26616.pdf
Size:: 382 KB
Format:: Adobe Portable Document Format

Download

Collections

Linguistics