Scaling Econometrics: Text Processing, Distributed Computing, and Experimental Design

Okar, Yigit

Scaling Econometrics: Text Processing, Distributed Computing, and Experimental Design

dc.contributor.advisor	Fan, Yanqin
dc.contributor.author	Okar, Yigit
dc.date.accessioned	2025-08-01T22:20:39Z
dc.date.available	2025-08-01T22:20:39Z
dc.date.issued	2025-08-01
dc.date.submitted	2025
dc.description	Thesis (Ph.D.)--University of Washington, 2025
dc.description.abstract	This dissertation develops new methodological approaches to address three fundamentalchallenges in modern econometrics: computational scalability in choice models, experimental design in digital markets, and the integration of unstructured text data. The first chapter addresses the computational challenges in estimating multinomial logistic regression mod- els with large choice sets. We introduce an iterative distributed computing estimator that dramatically reduces computational burden while preserving statistical efficiency. This estimator, when initialized with a consistent preliminary estimate, achieves asymptotic efficiency under a weak dominance condition. We develop a parametric bootstrap procedure for statistical inference and establish its consistency. Through extensive simulation studies, we demonstrate that our method achieves substantial computational gains while maintaining estimation accuracy, making it particularly valuable for applications in industrial organization and marketing where researchers face increasingly large choice sets. The second chapter tackles the methodological challenges inherent in e-commerce pricing experiments. While cluster randomization is necessary to prevent bias from spillover effects between substitute products, it introduces additional variation that can compromise statistical power. We develop a comprehensive analytical framework for understanding and managing these variance components. Our methodology makes several contributions: first, we provide a detailed decomposition of variance components in cluster randomized experiments; second, we introduce a novel binned estimator specifically designed for the high-kurtosis data common in e-commerce settings; and third, we evaluate various approaches to variance reduction including matched-pair designs, stratified randomization, and covariate adjustment. Through simulation of e-commerce data, we demonstrate that our proposed methods can improve power while maintaining robust inference. The binned estimator proves particularly effective, though we carefully describe the conditions under which it maintains unbiasedness. The third chapter presents a methodological breakthrough in the integration of textual data into econometric analysis. We develop a two-stage text regression methodology that leverages recent advances in transformer-based language models to capture rich semantic information and contextual nuances. The first stage employs state-of-the-art natural language processing techniques to represent textual data in a lower-dimensional space while preserving semantic relationships. The second stage develops an econometric framework for estimating the association between these text-derived features and economic outcomes. We demonstrate the methodology’s effectiveness through an application to online economics forums, showing substantial improvements in both predictive accuracy and interpretability compared to traditional bag-of-words approaches. This methodology opens new avenues for research across various subfields of economics, from labor economics to finance, where textual data may provide crucial insights into economic behavior and outcomes. Collectively, these chapters advance the frontier of empirical methods in economics by developing scalable solutions for modern data challenges. The methodological innovations presented here enable researchers to handle larger datasets, conduct more precise experiments, and incorporate richer forms of information into their analyses. While each chapter addresses a distinct challenge, they are united by a common theme: expanding the scope of feasible empirical research through methodological innovation. The tools and frameworks developed in this dissertation con- tribute to the growing toolkit available to empirical researchers, particularly those working with large-scale, complex, or unstructured data.
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Okar_washington_0250E_27998.pdf
dc.identifier.uri	https://hdl.handle.net/1773/53526
dc.language.iso	en_US
dc.rights	none
dc.subject	Economics
dc.subject.other	Economics
dc.title	Scaling Econometrics: Text Processing, Distributed Computing, and Experimental Design
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Okar_washington_0250E_27998.pdf
Size:: 2.57 MB
Format:: Adobe Portable Document Format

Download

Collections

Economics