A Versatility Analysis: Investigating Large Language Models' Performance Beyond Conventional Benchmarks

dc.contributor.advisorXia, Fei
dc.contributor.authorLI, CHENXI
dc.date.accessioned2024-09-09T23:12:02Z
dc.date.available2024-09-09T23:12:02Z
dc.date.issued2024-09-09
dc.date.submitted2024
dc.descriptionThesis (Master's)--University of Washington, 2024
dc.description.abstractRecent progress in large language models (LLMs) has marked a notable milestone in the field of artificial intelligence. The conventional evaluation of LLMs primarily relies on existing tasks and benchmarks, raising concerns about test set contamination and the genuine comprehension abilities of LLMs. This study introduces a novel approach by developing unique datasets to circumvent potential data contamination issues and scrutinizes LLMs' adaptability to new tasks, their sensitivity to prompt variations, and their error tendencies. We investigate the capacity of LLMs to adapt to new but simple tasks, especially when they diverge from the models' pre-existing knowledge. Our methodology emphasizes the creation of straightforward tasks, facilitating a precise error analysis to uncover the underlying causes of LLM failures. This strategic approach also aims to uncover effective strategies for enhancing LLM performance based on the detailed error analysis of system output.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherLI_washington_0250O_26616.pdf
dc.identifier.urihttps://hdl.handle.net/1773/52075
dc.language.isoen_US
dc.rightsnone
dc.subjectEvaluation
dc.subjectLarge Language Model
dc.subjectRobustness
dc.subjectLinguistics
dc.subjectComputer science
dc.subject.otherLinguistics
dc.titleA Versatility Analysis: Investigating Large Language Models' Performance Beyond Conventional Benchmarks
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
LI_washington_0250O_26616.pdf
Size:
382 KB
Format:
Adobe Portable Document Format

Collections