Exploring the World's Visual History
Martin Brualla, Ricardo
MetadataShow full item record
Collectively, every day we take hundreds of millions of photos of people, objects, and places and share themin online services, like Facebook, Instagram, or Flickr. Together, these photos create a living visual record of the world that is growing every day and covers the whole planet. Now is the first time in history that we have access to more than a decade worth of such detailed visual information. In my thesis, I proposed novel techniques to analyze and visualize the world's history through the lens of these Internet photos. First,I propose a method to synthesizing time-lapse videos of the world's most famous landmarks over several years from publicly available photos on the Internet. I processed a database containing 86 million photos and generated thousands of time-lapse videos that span up to a decade, and are effectively some of the longest time-series ever captured. The synthesized time-lapses show, for example, the retreat of glaciers, the construction floor by floor of skyscrapers, and seasonal changes in landscapes across theworld. Furthermore, I extend the technique to create 3D time-lapses, where the virtual camera moves continuously in time and space, creating compelling parallax effects. Next, I propose the 3D Wikipedia, a system that analyzes online text together with online photos to automatically create interactive visualizations of famous landmarks that effectively convey their history. The system mines text and image co-occurrences across the Internet, to generate correspondences between objects described in the text and bounding boxes in the 3D model, that enable novel interactions for coordinated browsing of the reference text and the 3D model. Selecting discovered objects in the 3D visualization scrolls the text where the object is mentioned, and when clicking on discovered objects in the text, the camera moves to show the corresponding objects in the 3D model. In another mode, the text serves as a visual guide to the scene, where the visualization highlights the described objects as the user reads the text. Finally, I propose a method to help visualize and analyze the millions of visits to tourist sites, by generating 3D reconstructions of large indoor spaces. These are a common failure case for Structure-from-Motion systems, that fail to generate a complete 3D model due to sparse coverage, and break them up instead into small, disconnected pieces. I jointly analyze Internet photos together with an annotated floor plan of the landmark to recover a 3D model of the landmark, where the disconnected 3D pieces are localized into the map's reference frame. My approach is akin to solvinga 3D jigsaw puzzle, where the position and orientation of the 3D pieces are unknown. I extract position, orientation, and shape cues from the map and introduce a novel crowd flow cue between pieces that is based on howpeople travel between the rooms. The recovered complete 3D reconstructions allow mapping tourists' visits through the site, enabling compelling visualizations of their visits, and provide insights on tourists' spatiotemporal behaviors.