Restoring Reality with Spatial Audio and Generative Models

Jayaram, Vivek

Restoring Reality with Spatial Audio and Generative Models

dc.contributor.advisor	Seitz, Steven M.
dc.contributor.advisor	Kemelmacher-Shlizerman, Ira
dc.contributor.author	Jayaram, Vivek
dc.date.accessioned	2025-05-12T22:46:24Z
dc.date.available	2025-05-12T22:46:24Z
dc.date.issued	2025-05-12
dc.date.submitted	2025
dc.description	Thesis (Ph.D.)--University of Washington, 2025
dc.description.abstract	Everyday, we encounter parts of our reality that are noisy, incomplete, or distorted—whether we're taking a phone call in a noisy cafe, viewing old black and white photos, or trying to better hear a specific instrument in a musical piece. The ability to restore and enhance these signals is essential for improving communication, preserving memories, and creating immersive experiences. This thesis explores new methodologies for signal restoration across both images and audio to address problems like denoising, source separation, inpainting, super-resolution, and colorization. We focus on two different high-level approaches for effective signal restoration: leveraging spatial information to enhance audio experiences and employing generative models to solve challenging inverse problems. The first part of this work covers spatial audio processing, where we develop novel systems that use multiple microphones for speech enhancement, separation, and spatial audio rendering. We first present a method that uses a multi-microphone array to perform real-time source localization and separation with as many as 4 concurrent speakers. We then showcase custom binaural earbuds that can isolate the wearer's voice on phone calls in real-time using a neural network running on a mobile phone. Finally, we use everyday recordings from those binaural earbuds to improve upon the ability to render sounds to the listener in a spatially consistent manner. The second part investigates the use of deep generative models as priors for signal reconstruction. By framing signal restoration tasks as probabilistic inference problems, we apply techniques such as Langevin dynamics and denoising diffusion to efficiently sample from posterior distributions. We first explore the use of score-based models and flow models for source separation of visual signals. We then extend this work to the audio domain by using auto-regressive models for audio source separation and enhancement. Lastly, we present a fast method for solving noisy linear inverse problems using diffusion models. Together, these contributions demonstrate the powerful ability of both spatial audio processing and deep generative modeling in advancing signal restoration. By tackling practical challenges and pushing the boundaries of theoretical frameworks, this thesis paves the way for more robust communication technologies and immersive audio-visual experiences.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Jayaram_washington_0250E_27921.pdf
dc.identifier.uri	https://hdl.handle.net/1773/52955
dc.language.iso	en_US
dc.rights	CC BY
dc.subject	Audio Signal Processing
dc.subject	Generative Models
dc.subject	Machine Learning
dc.subject	Spatial Audio
dc.subject	Computer science
dc.subject.other	Computer science and engineering
dc.title	Restoring Reality with Spatial Audio and Generative Models
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Jayaram_washington_0250E_27921.pdf
Size:: 71.41 MB
Format:: Adobe Portable Document Format

Download

Collections

Computer science and engineering