Restoring Reality with Spatial Audio and Generative Models

dc.contributor.advisorSeitz, Steven M.
dc.contributor.advisorKemelmacher-Shlizerman, Ira
dc.contributor.authorJayaram, Vivek
dc.date.accessioned2025-05-12T22:46:24Z
dc.date.available2025-05-12T22:46:24Z
dc.date.issued2025-05-12
dc.date.submitted2025
dc.descriptionThesis (Ph.D.)--University of Washington, 2025
dc.description.abstractEveryday, we encounter parts of our reality that are noisy, incomplete, or distorted—whether we're taking a phone call in a noisy cafe, viewing old black and white photos, or trying to better hear a specific instrument in a musical piece. The ability to restore and enhance these signals is essential for improving communication, preserving memories, and creating immersive experiences. This thesis explores new methodologies for signal restoration across both images and audio to address problems like denoising, source separation, inpainting, super-resolution, and colorization. We focus on two different high-level approaches for effective signal restoration: leveraging spatial information to enhance audio experiences and employing generative models to solve challenging inverse problems. The first part of this work covers spatial audio processing, where we develop novel systems that use multiple microphones for speech enhancement, separation, and spatial audio rendering. We first present a method that uses a multi-microphone array to perform real-time source localization and separation with as many as 4 concurrent speakers. We then showcase custom binaural earbuds that can isolate the wearer's voice on phone calls in real-time using a neural network running on a mobile phone. Finally, we use everyday recordings from those binaural earbuds to improve upon the ability to render sounds to the listener in a spatially consistent manner. The second part investigates the use of deep generative models as priors for signal reconstruction. By framing signal restoration tasks as probabilistic inference problems, we apply techniques such as Langevin dynamics and denoising diffusion to efficiently sample from posterior distributions. We first explore the use of score-based models and flow models for source separation of visual signals. We then extend this work to the audio domain by using auto-regressive models for audio source separation and enhancement. Lastly, we present a fast method for solving noisy linear inverse problems using diffusion models. Together, these contributions demonstrate the powerful ability of both spatial audio processing and deep generative modeling in advancing signal restoration. By tackling practical challenges and pushing the boundaries of theoretical frameworks, this thesis paves the way for more robust communication technologies and immersive audio-visual experiences.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherJayaram_washington_0250E_27921.pdf
dc.identifier.urihttps://hdl.handle.net/1773/52955
dc.language.isoen_US
dc.rightsCC BY
dc.subjectAudio Signal Processing
dc.subjectGenerative Models
dc.subjectMachine Learning
dc.subjectSpatial Audio
dc.subjectComputer science
dc.subject.otherComputer science and engineering
dc.titleRestoring Reality with Spatial Audio and Generative Models
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Jayaram_washington_0250E_27921.pdf
Size:
71.41 MB
Format:
Adobe Portable Document Format