MacKenzie, DonAemmer, Zack2021-08-262021-08-262021-08-262021Aemmer_washington_0250O_23138.pdfhttp://hdl.handle.net/1773/47404Thesis (Master's)--University of Washington, 2021Household survey collection efforts provide immense value in the fields of transportation and urban planning, where public agencies, private companies, and researchers alike use the data to draw conclusions on populations. However, even the most well-funded surveying agencies rely on sampling methods to estimate the nature of the true population, and microdata from public censuses is frequently aggregated, or limited in volume and detail to protect the privacy of respondents. With growing emphasis on microsimulation to predict population behavior in response to emerging transportation technologies such as electric/autonomous vehicles, or new micromobility and ridesharing services, population synthesis provides a means to scale this socioeconomic microdata into synthetic populations representing much larger areas. Despite their accuracy and widespread adoption, traditional synthesis algorithms for reweighting microdata samples scale poorly with the number of variables and geographic regions being modeled, and can suffer from non-convergence when smaller sample sizes are used. Several generative models have been proposed to address these shortcomings, but lack key features such as sub-region modeling, and the ability to simultaneously generate both individuals and households. This work proposes a new approach to generating synthetic populations consisting of both individual and household-level variables, that uses a Conditional Variational Autoencoder (CVAE) to learn a distribution of latent variables in the general population, and use them to generate new samples. The accuracy and computational efficiency of this approach are benchmarked against a state of the art open source population synthesizer. In addition, the CVAE model is tested under increasingly minimal training data to determine its ability to generate realistic populations from smaller surveys. Findings indicate that the CVAE model creates more accurate populations, using less time than the traditional synthesizer under small to medium dimensional datasets (4-16 variables). The CVAE also performs well with relatively small (n=100) training data samples, but tends to overfit at lower sample sizes, despite adding additional regularization to the model.application/pdfen-USnoneConditionalGenerativeMicrosimulationPopulation SynthesisVAEVariational AutoencoderTransportationDemographyComputer scienceCivil engineeringGenerative Population Synthesis for Joint Household and Individual CharacteristicsThesis