Creating a Photorealistic World from Casual Lighting Capture

Wang, Yifan

Creating a Photorealistic World from Casual Lighting Capture

Files

Wang_washington_0250E_27067.pdf (101.77 MB)

Date

2024-09-09

Authors

Wang, Yifan

Abstract

Light is the most crucial phenomenon for us to perceive and interact with the world. It plays a fundamental role in how we navigate, recognize, and interpret our surroundings. Throughout human history, we have sought to capture and record different lighting effects, from the earliest forms of painting to the invention of photography and the development of video technology. These mediums have allowed us to document and study the world in increasing detail. Many psychological studies have shown that the human visual system excels at deducing depth, shape, and motion from lighting effects such as shading and shadows. This ability underscores the importance of accurately simulating these effects in the creation of a photorealistic world. Creating photorealistic images and virtual environments has been a long-standing goal in the field of computer graphics, driven by applications in virtual reality, film production, and architectural visualization. Achieving high levels of realism requires accurately simulating the interaction of light with various objects and surfaces, as well as generating detailed highlights and realistic shadows. Casual capture --- everyday photos and videos taken with consumer devices --- offer a rich source of data for understanding and replicating real-world lighting effects. In this thesis, I explore techniques for enhancing photorealistic image synthesis by leveraging casual lighting capture. The ultimate goal is to create highly realistic and immersive visual experiences from novel viewpoints, novel scene configurations, and novel illumination. I begin with an overview of the psychophysics of light, focusing on human perception and the use of shading techniques in western art, providing a foundational understanding of how lighting effects influence visual perception. This foundational knowledge is critical for the subsequent development of methods that accurately replicate the nuances of lighting and shading in synthetic imagery. The core contributions of this thesis are as follows: First, I present People as Scene Probes, a method that infers depth, occlusion, lighting, and shadow information from video sequences captured from a single camera viewpoint. This technique enables realistic image composition by accurately modeling scene geometry and shading effects. Second, I introduce Repopulating Street Scenes, a framework that uses learned scene properties from image collections to automatically reconfigure street scenes by populating, depopulating or repopulating them with objects such as pedestrians or vehicles. It enables the realistic removal of existing objects along with their shadows and the insertion of new objects by accurately matching the lighting and casting shadows. This method enhances privacy and generates diverse training data for autonomous driving applications. Next, I introduce SunStage, a lightweight capture setup that replicates the functionality of a light stage using only a smartphone camera and the sun as the light source. With a video of an individual rotating in-place under the sun, SunStage reconstructs a physical model of the subject and the scene lighting, which enables applications such as relighting the subject with realistic reflections and cast shadows. SunStage allows arbitrary lighting and reflectance control in the reconstructed physical space, which can be rendered to produce photo-realistic results. I demonstrate several applications such as editing skin reflectance, relighting, and view synthesis. Finally, I present Infinite Texture, a method for generating arbitrarily large texture images from a text prompt. This technique supports applications in 3D rendering and texture transfer, ensuring consistent shading and depth through the use of a minimal dataset. I demonstrate the effectiveness of this approach in generating high-resolution, high-quality textures that can be seamlessly integrated into various downstream tasks. This work represents significant progress towards the goal of generating high-quality graphics assets from natural language descriptions.