mSCAN - a Multilingual Dataset for Compositional Generalization Evaluation
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Language models achieve remarkable results on a variety of tasks, yet still struggle on compositional generalization benchmarks. The majority of these benchmarks evaluate performance in English only, leaving open the question of whether these results generalize to other languages. As an initial step to answering this question, we introduce mSCAN, a multilingual adaptation of the SCAN dataset covering Mandarin Chinese, French, Hindi and Russian. It was produced by a rule-based translation, developed in cooperation with native speakers. We then showcase this dataset on some in-context learning experiments on multiple open-source multilingual models.
Description
Thesis (Master's)--University of Washington, 2025
