Deep Learning Solutions for High Expertise Domains
| dc.contributor.advisor | Howe, Bill | |
| dc.contributor.advisor | Shapiro, Linda | |
| dc.contributor.author | Yang, Sean T | |
| dc.date.accessioned | 2022-09-23T20:45:41Z | |
| dc.date.available | 2022-09-23T20:45:41Z | |
| dc.date.issued | 2022-09-23 | |
| dc.date.submitted | 2022 | |
| dc.description | Thesis (Ph.D.)--University of Washington, 2022 | |
| dc.description.abstract | Deep learning has had significant success in addressing big data's knowledge organization and effective communication problems. However, the technology is difficult to apply to high expertise domains due to limited accessibility to structured data. While data labeling in most deep learning problems only needs common sense, data curation in high expertise domains requires extensive knowledge and experience in these specialized domains. Thus, acquiring large-scale labeled data for high expertise domains is expensive and sometimes difficult. The scientific community is one example of a high expertise application where it is more difficult to apply deep learning due to lack of structured data. We offer solutions to communication challenges caused by an overwhelming number of publications in the scientific community. We demonstrate that scientific figures are a significant channel of communication and they can serve as a tracker of popularity and propagation of the ideas and methods. We next propose networks that automatically identify Central Figures, which are selected from the existing publications and summarize the main contributions of research papers. Central figures can be deployed on online search engines to facilitate a literature review process. We also provide evidence supporting the idea that citation behaviors in individual research documents predicts acceptance decisions, even more so than existing natural language processing models. This bibliography analysis provide additional submission reviewing strategies for publishers or conference coordinators. We extend our studies to broader high-expertise domains based o observations from the exploration of the scientific community. First, we find that application-agnostic ontologies are often invested in these high-expertise domains. These ontologies can be utilized in Hierarchical Multi-label Classification for knowledge organization. We propose a novel framework to address multi-label classification problem and we demonstrate that the proposed model outperforms existing methods by a significant margin. We introduce Global Hierarchical Violation to measure whether the predictions follow the hierarchy constraints. We show that the current benchmarks in hierarchical multi-label classification do not properly represent the problem space and we further introduce a declarative query system to produce customizable datasets along with four benchmarks which better describe the problem. Second, we discover that images in high-expertise domains are often equipped with short text descriptions. We present JECL, which leverages this noisy text description as a source of weak supervision. It simultaneously learns to cluster and joint representations for image-text pairs. We show that JECL outperforms existing multi-view methods on four benchmarks. The learned representations from JECL can be deployed on GraviTIE, an interactive data visualization platform that affords scalability, query, and reproducibility. It allows users to explore large heterogeneous image collections efficiently. This dissertation offers deep learning solutions to challenges arising from low accessibility to structured data in high-expertise domains. The presented analyses within the scientific community provide strategies for researchers to communicate complex ideas efficiently. The proposed methods allow experts to organize knowledge with ontologies and to explore large-scale heterogeneous image collections with more feasibility. | |
| dc.embargo.terms | Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Yang_washington_0250E_24780.pdf | |
| dc.identifier.uri | http://hdl.handle.net/1773/49347 | |
| dc.language.iso | en_US | |
| dc.rights | CC BY | |
| dc.subject | Deep Learning | |
| dc.subject | Distant Curation | |
| dc.subject | High Expertise | |
| dc.subject | Ontologies | |
| dc.subject | Artificial intelligence | |
| dc.subject.other | Electrical and computer engineering | |
| dc.title | Deep Learning Solutions for High Expertise Domains | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Yang_washington_0250E_24780.pdf
- Size:
- 18.09 MB
- Format:
- Adobe Portable Document Format
