Deep Learning Methods for Video-Based Human Activity Recognition in Industrial Settings
| dc.contributor.advisor | Banerjee, Ashis G | |
| dc.contributor.author | Parsa, Behnoosh | |
| dc.date.accessioned | 2021-03-19T22:56:35Z | |
| dc.date.issued | 2021-03-19 | |
| dc.date.submitted | 2020 | |
| dc.description | Thesis (Ph.D.)--University of Washington, 2020 | |
| dc.description.abstract | With increasingly high interest in assistive robots and smart surveillance systems, we need a powerful perception mechanism to be able to describe the events in a scene. However, achieving accurate perception models is not trivial, since, even for one perception task there are unlimited possible scenarios. Hoping to develop analytically driven models seems too optimistic for such systems; hence, Supervised Learning as a sub-field of function approximation has become very popular in robotic perception. Supervised learning is the task of learning a function that maps an input to an output based on example input-output pairs. Scene understanding is even more involved when it comes to solving Human Action Recognition (HAR) problems. In HAR the task is to classify human activities from an image or determine atomic actions composing the activity in a video. In video-based HAR, there are exponentially many ways that humans can perform the same task. Besides, the variety in posture and speed at which people perform activities makes solving HAR tasks even more challenging. Therefore, models should be designed to learn common underlying spatial and temporal properties of human activity to achieve generalizability. This thesis is dedicated to designing perception models for recognizing human actions and determining the ergonomic risk associated with them. Specifically, Part I focus on solving the Human Activity Segmentation (HAS) problem in long videos, which is the task of semantically segmenting long videos into distinct actions in an offline framework. In Part II, we present our designs for solving online-HAR problems to recognize human activities in the observed batch of frames. Since, the performance of computer vision algorithms also depends on the quality and relevance of the training data, in Part I, we introduce a new dataset for an indoor object manipulation task called the University of Washington Indoor Object Manipulation (UW-IOM). | |
| dc.embargo.lift | 2022-03-19T22:56:35Z | |
| dc.embargo.terms | Delay release for 1 year -- then make Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Parsa_washington_0250E_22404.pdf | |
| dc.identifier.uri | http://hdl.handle.net/1773/46847 | |
| dc.language.iso | en_US | |
| dc.rights | CC BY-NC | |
| dc.subject | Computer Vision | |
| dc.subject | Deep Learning | |
| dc.subject | Graph Convolutional Networks | |
| dc.subject | Human Activity Recognition | |
| dc.subject | Human Postural Assessment | |
| dc.subject | Video Semantic Segmentation | |
| dc.subject | Mechanical engineering | |
| dc.subject.other | Mechanical engineering | |
| dc.title | Deep Learning Methods for Video-Based Human Activity Recognition in Industrial Settings | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Parsa_washington_0250E_22404.pdf
- Size:
- 16.87 MB
- Format:
- Adobe Portable Document Format
