Automated Vulnerability Prediction in Software Systems and Lightweight Identification of Design Patterns in Source Code

dc.contributor.advisorAsuncion, Hazeline
dc.contributor.authorPoozhithara, Jeffy Jahfar
dc.date.accessioned2021-08-26T18:02:49Z
dc.date.available2021-08-26T18:02:49Z
dc.date.issued2021-08-26
dc.date.submitted2021
dc.descriptionThesis (Master's)--University of Washington, 2021
dc.description.abstractSoftware development companies put a heavy investment in fixing security vulnerabilities in their products after code development. This demands an automated mechanism to identify security vulnerabilities during and after software development. One approach is to include possible solutions like security design patterns during design. This reduces system-wide architectural changes required and enables efficient documentation and maintenance of the software systems. Further, identifying which design patterns already exist in source code can help maintenance engineers determine if new requirements can be satisfied. The current techniques for design pattern identification require either manually labeling training datasets or manually specifying rules or queries for each pattern. As part of this research, we took a two-pronged approach: 1. Pre-implementation: predict vulnerabilities before any source code is written, to increase awareness of possible risks while developing the system. 2. Post-implementation: check the source code to identify any missing security patterns, based on the identified vulnerabilities. For the first approach, we created a Keyword Extraction-based Vulnerability Identification System (KEVIS) that uses natural language processing techniques to extract keywords and n-grams from software documentation to predict security vulnerabilities in software systems. We analyzed the correlation of certain keywords and n-grams with the occurrence of various security vulnerabilities as well as the correlation between different vulnerabilities. Additionally, we analyzed the performance of classification algorithms (Logistic Regression, Support Vector Machines, K-Nearest Neighbors, Multi-level perception, and Random Forest) in the prediction. To enable the analysis, we also created a dataset by mapping over 200,000 vulnerability reports on the CVE website with technical/functional documentation of 3602 products. The preliminary analysis shows that the performance of KEVIS is comparable or better than the prediction using source code as well as other static analysis methods. For the second approach, we introduced PatternScout, a technique for automatically generating SPARQL queries by parsing UML diagrams of design patterns, ensuring that pattern characteristics are matched. We discuss key concepts and the design of PatternScout. Our results indicate that PatternScout can automatically generate queries for the three types of design patterns (i.e., creational, behavioral, structural), with accuracy that is comparable, or perform better than, existing techniques. Due to the difference in concepts used for both approaches and ease of explanation, the background, literature review, method, results, and discussions corresponding to each approach is discussed separately in their own sections (Approach 1 - Automated Vulnerability Prediction in Software Systems, and Approach 2 - Lightweight Identification of Design Patterns in Source Code, respectively).
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherPoozhithara_washington_0250O_23223.pdf
dc.identifier.urihttp://hdl.handle.net/1773/47191
dc.language.isoen_US
dc.rightsCC BY
dc.subjectCybersecurity
dc.subjectKeyword Extraction
dc.subjectMachine Learning
dc.subjectRDF
dc.subjectSemantic Web
dc.subjectSPARQL
dc.subjectComputer science
dc.subject.otherComputing and software systems
dc.titleAutomated Vulnerability Prediction in Software Systems and Lightweight Identification of Design Patterns in Source Code
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Poozhithara_washington_0250O_23223.pdf
Size:
2.33 MB
Format:
Adobe Portable Document Format