A Linguist-Friendly Machine Translation System for Low-Resource Languages
Lockwood, Ronald Milton
MetadataShow full item record
Low-resource languages have largely been left out of the machine translation revolution. Speakers would benefit from machine translation for many different tasks if it were available. Because of insufficient text data, the results of using statistical machine translation are subpar. The best choice for these languages is probably a transfer-based approach where rules define how to translate from one language to another. Unfortunately, the transfer-based systems available today are not easy to use for anyone outside the computational linguistics field. This thesis presents a transfer-based system that is easy to use for ordinary linguists. It is linguist-friendly because a central component is the intuitive application Fieldworks Language Explorer. This application serves as the repository for lexicons, the place where entries are linked and the tool for the analysis piece of the analysis-transfer-synthesis-style system. Apertium, the well-established open-source machine translation platform, is used for the transfer piece of the system and STAMP for the synthesis piece. All of these programs are well-documented. The linguist’s role is to link lexicon entries and write transfer rules to do either word or syntactic-level translation. Although this machine translation system is a proof-of-concept system, I show that it translates texts successfully in a test case using Persian and Gilaki. Such a system can be used by ordinary linguists all over the world for almost any language pair where machine translation is needed.
- Linguistics