Photorealistic Virtual Try-on with Generative Models
| dc.contributor.advisor | Kemelmacher-Shlizerman, Ira | |
| dc.contributor.author | Zhu, Luyang | |
| dc.date.accessioned | 2024-09-09T23:06:37Z | |
| dc.date.available | 2024-09-09T23:06:37Z | |
| dc.date.issued | 2024-09-09 | |
| dc.date.submitted | 2024 | |
| dc.description | Thesis (Ph.D.)--University of Washington, 2024 | |
| dc.description.abstract | Virtual try-on (VTO) is revolutionizing the online apparel shopping experience, enabling customers to see how a particular fashion item would look on them. Despite significant progress, current VTO methods still encounter challenges such as accurately warping garments under large pose gap and heavy occlusion, as well as preserving body shape and identity of the person under the new garment. Additionally, most research focuses on upper-body VTO, whereas a full-body VTO that allows for garment mix-and-match is more desirable in real-world scenarios. In my thesis, I address above challenges by developing generative models tailored for the VTO task. First, I propose TryOnDiffusion, the first method capable of try-on synthesis at 1024x1024 resolution for various body poses and shapes while preserving garment details. Previous methods either focus on garment detail preservation without effective pose and shape variation, or allow try-on with the desired shape and pose but lack garment details. In this project, I show that the underlying reason for this challenge is a widely-used two-stage pipeline consisting of an explicit warping model and a blending GAN model. To solve this issue, I propose a diffusion-based architecture that unifies two UNets (referred to as Parallel-UNet), which can warp the garment implicitly with cross attention, in addition to warping and blending in a single network pass. Next, I present M&M VTO, which extends TryOnDiffusion from upper body VTO to full body VTO, allowing to mix and match multiple garments. To preserve intricate garment details required by full body VTO, I propose a single-stage diffusion model in the pixel space that is trained progressively. To solve a common identity loss problem in current VTO methods, I design a novel architecture named VTO UNet Diffusion Transformer (VTO-UDiT) to disentangle denoising from person specific features, allowing for a highly effective finetuning strategy. Furthermore, M&M VTO also supports garment layout editing via text inputs finetuned on multi-modal foundation models. Finally, I show how we can train generative models on synthetic datasets for 3D clothed human reconstruction, which is an important component towards VTO in the 3D world. I propose reconstructing NBA players, which takes as input a single photo of a clothed player in any basketball pose and outputs a high resolution mesh and 3D pose for that player. Key to my approach is a deep neural skinning approach for creating poseable, skinned models of NBA players, and a large database of meshes derived from the video game. Although trained only on synthetic data, the proposed pipeline generalizes well to real-world images even under heavy occlusion. | |
| dc.embargo.terms | Open Access | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.other | Zhu_washington_0250E_26781.pdf | |
| dc.identifier.uri | https://hdl.handle.net/1773/51884 | |
| dc.language.iso | en_US | |
| dc.rights | CC BY-NC | |
| dc.subject | computer graphics | |
| dc.subject | computer vision | |
| dc.subject | Deep learning | |
| dc.subject | Diffusion models | |
| dc.subject | Generative Models | |
| dc.subject | Virtual Try-on | |
| dc.subject | Computer science | |
| dc.subject.other | Computer science and engineering | |
| dc.title | Photorealistic Virtual Try-on with Generative Models | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Zhu_washington_0250E_26781.pdf
- Size:
- 91.69 MB
- Format:
- Adobe Portable Document Format
