Deep Learning Methods for Video-Based Human Activity Recognition in Industrial Settings

dc.contributor.advisorBanerjee, Ashis G
dc.contributor.authorParsa, Behnoosh
dc.date.accessioned2021-03-19T22:56:35Z
dc.date.issued2021-03-19
dc.date.submitted2020
dc.descriptionThesis (Ph.D.)--University of Washington, 2020
dc.description.abstractWith increasingly high interest in assistive robots and smart surveillance systems, we need a powerful perception mechanism to be able to describe the events in a scene. However, achieving accurate perception models is not trivial, since, even for one perception task there are unlimited possible scenarios. Hoping to develop analytically driven models seems too optimistic for such systems; hence, Supervised Learning as a sub-field of function approximation has become very popular in robotic perception. Supervised learning is the task of learning a function that maps an input to an output based on example input-output pairs. Scene understanding is even more involved when it comes to solving Human Action Recognition (HAR) problems. In HAR the task is to classify human activities from an image or determine atomic actions composing the activity in a video. In video-based HAR, there are exponentially many ways that humans can perform the same task. Besides, the variety in posture and speed at which people perform activities makes solving HAR tasks even more challenging. Therefore, models should be designed to learn common underlying spatial and temporal properties of human activity to achieve generalizability. This thesis is dedicated to designing perception models for recognizing human actions and determining the ergonomic risk associated with them. Specifically, Part I focus on solving the Human Activity Segmentation (HAS) problem in long videos, which is the task of semantically segmenting long videos into distinct actions in an offline framework. In Part II, we present our designs for solving online-HAR problems to recognize human activities in the observed batch of frames. Since, the performance of computer vision algorithms also depends on the quality and relevance of the training data, in Part I, we introduce a new dataset for an indoor object manipulation task called the University of Washington Indoor Object Manipulation (UW-IOM).
dc.embargo.lift2022-03-19T22:56:35Z
dc.embargo.termsDelay release for 1 year -- then make Open Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherParsa_washington_0250E_22404.pdf
dc.identifier.urihttp://hdl.handle.net/1773/46847
dc.language.isoen_US
dc.rightsCC BY-NC
dc.subjectComputer Vision
dc.subjectDeep Learning
dc.subjectGraph Convolutional Networks
dc.subjectHuman Activity Recognition
dc.subjectHuman Postural Assessment
dc.subjectVideo Semantic Segmentation
dc.subjectMechanical engineering
dc.subject.otherMechanical engineering
dc.titleDeep Learning Methods for Video-Based Human Activity Recognition in Industrial Settings
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Parsa_washington_0250E_22404.pdf
Size:
16.87 MB
Format:
Adobe Portable Document Format