Generative Keyframing

Wang, Xiaojuan

Generative Keyframing

dc.contributor.advisor	Seitz, Steven M.
dc.contributor.advisor	Curless, Brian
dc.contributor.author	Wang, Xiaojuan
dc.date.accessioned	2026-02-05T19:34:18Z
dc.date.available	2026-02-05T19:34:18Z
dc.date.issued	2026-02-05
dc.date.submitted	2025
dc.description	Thesis (Ph.D.)--University of Washington, 2025
dc.description.abstract	Keyframing is a fundamental element of animation creation and video editing. It involves defining specific frames, i.e., keyframes, that mark important moments of change and guide how the intermediate frames are filled or interpolated. In early hand-drawn animation, a keyframe is a visual drawing created by animators, with assistants manually drawing the in-between frames. With the advent of digital animation and video editing software, a keyframe became a set of parameters that define the state of the rendered character or object at specific times, with in-between transitions produced by interpolating these parameters. However, such parametric approaches rely heavily on manually designed controls and artist-crafted heuristics, making them difficult to capture complex, nuanced, and realistic motions. Furthermore, they do not naturally generalize to real image and video domains. The rapid progress of visual generative models that are trained on large collections of visual data and capable of learning rich appearance and motion patterns, has made it possible to generate high-fidelity imagery and realistic motion. Building on these advances, this thesis investigates generative keyframing, a data-driven, non-parametric, image-based approach to the keyframing process. To this end, I present a series of works in this thesis that collectively develop and explore this idea. I begin with the basic aspect: using generative models to synthesize transitions directly from images, and even to fully generate in-between motions. I first present a GAN-based technique for smoothing jump cuts in talking head videos, synthesizing seamless transitions between the cuts even in challenging cases involving large head movement. I then introduce a method for generating in-between videos with dynamic motion between more distant key frames by adapting a pretrained large-scale image-to-video diffusion model with minimal fine-tuning effort. Beyond automatically generating transitions between keyframes, I further explore multi-scale keyframing for achieving very deep zoom. Specifically, I introduce a multi-scale joint sampling diffusion approach for generating consistent images (keyframes) across different spatial scales while adhering to their respective input text prompts. This enables deep semantic zoom and a continuous zoom video can be rendered from these images. When working with multiple keyframes, one import question is how they should be ordered in the final video. I address this in the context of dance video generation---specifically, music synchronized and choreography-aware animal dance video---where unordered keyframes representing distinct animal poses are arranged via graph optimization to satisfy a specified choreography pattern of beats that defines the long-range structure of a dance. Finally, I conclude with discussions and directions for future works.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Wang_washington_0250E_29032.pdf
dc.identifier.uri	https://hdl.handle.net/1773/55188
dc.language.iso	en_US
dc.rights	CC BY-NC-SA
dc.subject	Generative AI
dc.subject	Keyframing
dc.subject	Vision and Graphics
dc.subject	Computer science
dc.subject.other	Computer science and engineering
dc.title	Generative Keyframing
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Wang_washington_0250E_29032.pdf
Size:: 99.22 MB
Format:: Adobe Portable Document Format

Download

Collections

Computer science and engineering