An Investigation Into Supervision for Seq2Seq Techniques for Natural Language to Code Translation

dc.contributor.advisorSteinert-Threlkeld, Shane
dc.contributor.authorYeditha, Meheresh Sai
dc.date.accessioned2023-01-21T05:04:08Z
dc.date.available2023-01-21T05:04:08Z
dc.date.issued2023-01-21
dc.date.submitted2022
dc.descriptionThesis (Master's)--University of Washington, 2022
dc.description.abstractThis thesis examines the role of supervised data using small-scale datasets for the natural language to code task. The primary angles of inquiry are from analyzing the balance between unsupervised learning and supervised learning, as well as experimenting with several training techniques. To do so, two publicly available datasets were utilized, the CodeSearchNet task for English documentation to Python code, and the Mostly Basic Python Problems (MBPP) dataset, using the mBART seq2seq framework for running experiments. The best performing models pretrained on the full CodeSearchNet dataset, and finetuned on the MBPP dataset. Several avenues for future inquiry and effective experimentation were discovered and solidifed, including lample masking, creation of more datasets fitting the NL2C paradigm, and size and division of datasets. Finetuning is significantly more important than the pretraining phase, although both are crucial when using the seq2seq framework. Overall, this thesis solidifies the utility of seq2seq frameworks for the NL2C task, and the promise of transfer learning and inquiries for this task going forward.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherYeditha_washington_0250O_25027.pdf
dc.identifier.urihttp://hdl.handle.net/1773/49699
dc.language.isoen_US
dc.rightsCC BY-SA
dc.subjectcode
dc.subjectdeep learning
dc.subjectlanguage model
dc.subjectmaching learning
dc.subjectnl2c
dc.subjectseq2seq
dc.subjectComputer science
dc.subjectLinguistics
dc.subject.otherLinguistics
dc.titleAn Investigation Into Supervision for Seq2Seq Techniques for Natural Language to Code Translation
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Yeditha_washington_0250O_25027.pdf
Size:
552.08 KB
Format:
Adobe Portable Document Format

Collections