On the Statistical Significance Testing for Natural Language Processing
| dc.contributor.advisor | Xia, Fei | |
| dc.contributor.author | Zhu, Haotian | |
| dc.date.accessioned | 2020-04-30T17:43:47Z | |
| dc.date.issued | 2020-04-30 | |
| dc.date.submitted | 2020 | |
| dc.description | Thesis (Master's)--University of Washington, 2020 | |
| dc.description.abstract | This thesis explores and compares statistical significance tests frequently used in comparing Natural Language Processing (NLP) system performance in several aspects. We begin by establishing the fundamentals of the NLP system performance comparison and formulating it into four major tasks specific to NLP. Each statistical significance test is explained in great detail with its assumptions explicated and testing procedure outlined. We stress the importance of verifying test assumptions before conducting a test. In addition, we examine the effect size and statistical power and discuss their significance in the statistical significance testing in NLP. By considering potential dependencies within a test set, the block bootstrap is introduced and employed to calibrate the statistical significance testing for comparing performance of two systems on average. Four case studies with both simulated and real data, of which the complexity of data dependency varies, are presented to illustrate the process of properly using a statistical significance test in comparing NLP system performance under different settings. We then proceed to discussion from different perspectives, with some open issues such as cross-domain comparison and the violation of i.i.d. assumption, which expects further studies. In conclusion, this thesis advocates the proper use of statistical significance testing in comparing NLP system performance and the reporting of the comparison results in more transparency and completeness. | |
| dc.embargo.lift | 2021-04-30T17:43:47Z | |
| dc.embargo.terms | Restrict to UW for 1 year -- then make Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Zhu_washington_0250O_21188.pdf | |
| dc.identifier.uri | http://hdl.handle.net/1773/45512 | |
| dc.language.iso | en_US | |
| dc.rights | none | |
| dc.subject | Effect size | |
| dc.subject | Power analysis | |
| dc.subject | Significance testing | |
| dc.subject | Linguistics | |
| dc.subject | Statistics | |
| dc.subject.other | Linguistics | |
| dc.title | On the Statistical Significance Testing for Natural Language Processing | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Zhu_washington_0250O_21188.pdf
- Size:
- 730.31 KB
- Format:
- Adobe Portable Document Format
