Bilingual alignment transfers to multilingual alignment for unsupervised parallel text mining

dc.contributor.advisorSteinert-Threlkeld, Shane
dc.contributor.authorTien, Chih-chan
dc.date.accessioned2021-03-19T22:56:06Z
dc.date.issued2021-03-19
dc.date.submitted2020
dc.descriptionThesis (Master's)--University of Washington, 2020
dc.description.abstractThis work presents methods for learning cross-lingual sentence representations using paired or unpaired bilingual texts. We hypothesize that the cross-lingual alignment strategy is transferable, and therefore a model trained to align only two languages can encode multilingually more aligned representations. The method of transferring bilingual alignment between two pivot languages to multilingual alignment among other languages is novel and we call this method dual-pivot transfer. To study the applicability of the transfer, we train an unsupervised model with unpaired sentences and another single-pair supervised model with bitexts, both based on the unsupervised language model XLM-R. The experiments evaluate the models as universal sentence encoders on the task of unsupervised bitext mining on two datasets, where the unsupervised model reaches the state of the art of unsupervised retrieval, and the alternative single-pair supervised model approaches the performance of multilingually supervised models. The results suggest that bilingual training techniques as proposed can be applied to get sentence representations with higher multilingual alignment.
dc.embargo.lift2023-03-09T22:56:06Z
dc.embargo.termsRestrict to UW for 2 years -- then make Open Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherTien_washington_0250O_22440.pdf
dc.identifier.urihttp://hdl.handle.net/1773/46826
dc.language.isoen_US
dc.rightsnone
dc.subject
dc.subjectLinguistics
dc.subjectComputer science
dc.subject.otherLinguistics
dc.titleBilingual alignment transfers to multilingual alignment for unsupervised parallel text mining
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Tien_washington_0250O_22440.pdf
Size:
373.35 KB
Format:
Adobe Portable Document Format

Collections