Towards Robust and Effective Human Pose Estimation and Generation
| dc.contributor.advisor | Hwang, Jenq-Neng | |
| dc.contributor.author | Jiang, Zhongyu | |
| dc.date.accessioned | 2025-05-12T22:47:47Z | |
| dc.date.available | 2025-05-12T22:47:47Z | |
| dc.date.issued | 2025-05-12 | |
| dc.date.submitted | 2025 | |
| dc.description | Thesis (Ph.D.)--University of Washington, 2025 | |
| dc.description.abstract | Human pose estimation (HPE) in both 2D and 3D remains a fundamental yet challenging problem in computer vision, with broad applications in action recognition, human-computer interaction, motion analysis, and object tracking. Despite recent advances, achieving robustness and efficiency in real-world and edge-device scenarios remains difficult. This dissertation presents a series of contributions toward making HPE more effective and robust. Specifically, we propose (1) a temporal-based 2D HPE method for golf swing analysis, (2) an optimization-driven pipeline for 3D HPE, and (3) a unified contrastive learning-based framework for 2D-3D pose representation. Furthermore, building upon HPE, we explore its potential in human motion generation. In particular, we introduce PackDiT, a novel diffusion-based framework for joint motion and text generation via mutual prompting. PackDiT effectively integrates text and motion generation by leveraging a unique training strategy with two DiT models (Text-DiT and Motion-DiT) with shared latent spaces, enabling text-to-motion, motion-to-text, and joint motion-text synthesis. Evaluated on the HumanML3D dataset, PackDiT outperforms state-of-the-art generative models across multiple tasks, demonstrating its capability as a unified framework for motion understanding and generation. The dissertation discusses challenges, limitations, and potential directions for advancing HPE and human motion generation in future research. | |
| dc.embargo.terms | Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Jiang_washington_0250E_27962.pdf | |
| dc.identifier.uri | https://hdl.handle.net/1773/52980 | |
| dc.language.iso | en_US | |
| dc.rights | CC BY-NC | |
| dc.subject | Artificial Intelligence | |
| dc.subject | Human Pose Estimation | |
| dc.subject | Machine Learning | |
| dc.subject | Motion Generation | |
| dc.subject | Computer science | |
| dc.subject.other | Electrical and computer engineering | |
| dc.title | Towards Robust and Effective Human Pose Estimation and Generation | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Jiang_washington_0250E_27962.pdf
- Size:
- 28.49 MB
- Format:
- Adobe Portable Document Format
