Scaling Econometrics: Text Processing, Distributed Computing, and Experimental Design
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This dissertation develops new methodological approaches to address three fundamentalchallenges in modern econometrics: computational scalability in choice models, experimental design in digital markets, and the integration of unstructured text data. The first chapter
addresses the computational challenges in estimating multinomial logistic regression mod-
els with large choice sets. We introduce an iterative distributed computing estimator that
dramatically reduces computational burden while preserving statistical efficiency. This estimator, when initialized with a consistent preliminary estimate, achieves asymptotic efficiency
under a weak dominance condition. We develop a parametric bootstrap procedure for statistical inference and establish its consistency. Through extensive simulation studies, we
demonstrate that our method achieves substantial computational gains while maintaining
estimation accuracy, making it particularly valuable for applications in industrial organization and marketing where researchers face increasingly large choice sets. The second chapter
tackles the methodological challenges inherent in e-commerce pricing experiments. While
cluster randomization is necessary to prevent bias from spillover effects between substitute
products, it introduces additional variation that can compromise statistical power. We develop a comprehensive analytical framework for understanding and managing these variance
components. Our methodology makes several contributions: first, we provide a detailed
decomposition of variance components in cluster randomized experiments; second, we introduce a novel binned estimator specifically designed for the high-kurtosis data common
in e-commerce settings; and third, we evaluate various approaches to variance reduction including matched-pair designs, stratified randomization, and covariate adjustment. Through
simulation of e-commerce data, we demonstrate that our proposed methods can improve
power while maintaining robust inference. The binned estimator proves particularly effective, though we carefully describe the conditions under which it maintains unbiasedness. The
third chapter presents a methodological breakthrough in the integration of textual data into
econometric analysis. We develop a two-stage text regression methodology that leverages
recent advances in transformer-based language models to capture rich semantic information
and contextual nuances. The first stage employs state-of-the-art natural language processing
techniques to represent textual data in a lower-dimensional space while preserving semantic relationships. The second stage develops an econometric framework for estimating the
association between these text-derived features and economic outcomes. We demonstrate
the methodology’s effectiveness through an application to online economics forums, showing
substantial improvements in both predictive accuracy and interpretability compared to traditional bag-of-words approaches. This methodology opens new avenues for research across
various subfields of economics, from labor economics to finance, where textual data may
provide crucial insights into economic behavior and outcomes. Collectively, these chapters
advance the frontier of empirical methods in economics by developing scalable solutions for
modern data challenges. The methodological innovations presented here enable researchers
to handle larger datasets, conduct more precise experiments, and incorporate richer forms
of information into their analyses. While each chapter addresses a distinct challenge, they
are united by a common theme: expanding the scope of feasible empirical research through
methodological innovation. The tools and frameworks developed in this dissertation con-
tribute to the growing toolkit available to empirical researchers, particularly those working
with large-scale, complex, or unstructured data.
Description
Thesis (Ph.D.)--University of Washington, 2025
