Scaling Econometrics: Text Processing, Distributed Computing, and Experimental Design

dc.contributor.advisorFan, Yanqin
dc.contributor.authorOkar, Yigit
dc.date.accessioned2025-08-01T22:20:39Z
dc.date.available2025-08-01T22:20:39Z
dc.date.issued2025-08-01
dc.date.submitted2025
dc.descriptionThesis (Ph.D.)--University of Washington, 2025
dc.description.abstractThis dissertation develops new methodological approaches to address three fundamentalchallenges in modern econometrics: computational scalability in choice models, experimental design in digital markets, and the integration of unstructured text data. The first chapter addresses the computational challenges in estimating multinomial logistic regression mod- els with large choice sets. We introduce an iterative distributed computing estimator that dramatically reduces computational burden while preserving statistical efficiency. This estimator, when initialized with a consistent preliminary estimate, achieves asymptotic efficiency under a weak dominance condition. We develop a parametric bootstrap procedure for statistical inference and establish its consistency. Through extensive simulation studies, we demonstrate that our method achieves substantial computational gains while maintaining estimation accuracy, making it particularly valuable for applications in industrial organization and marketing where researchers face increasingly large choice sets. The second chapter tackles the methodological challenges inherent in e-commerce pricing experiments. While cluster randomization is necessary to prevent bias from spillover effects between substitute products, it introduces additional variation that can compromise statistical power. We develop a comprehensive analytical framework for understanding and managing these variance components. Our methodology makes several contributions: first, we provide a detailed decomposition of variance components in cluster randomized experiments; second, we introduce a novel binned estimator specifically designed for the high-kurtosis data common in e-commerce settings; and third, we evaluate various approaches to variance reduction including matched-pair designs, stratified randomization, and covariate adjustment. Through simulation of e-commerce data, we demonstrate that our proposed methods can improve power while maintaining robust inference. The binned estimator proves particularly effective, though we carefully describe the conditions under which it maintains unbiasedness. The third chapter presents a methodological breakthrough in the integration of textual data into econometric analysis. We develop a two-stage text regression methodology that leverages recent advances in transformer-based language models to capture rich semantic information and contextual nuances. The first stage employs state-of-the-art natural language processing techniques to represent textual data in a lower-dimensional space while preserving semantic relationships. The second stage develops an econometric framework for estimating the association between these text-derived features and economic outcomes. We demonstrate the methodology’s effectiveness through an application to online economics forums, showing substantial improvements in both predictive accuracy and interpretability compared to traditional bag-of-words approaches. This methodology opens new avenues for research across various subfields of economics, from labor economics to finance, where textual data may provide crucial insights into economic behavior and outcomes. Collectively, these chapters advance the frontier of empirical methods in economics by developing scalable solutions for modern data challenges. The methodological innovations presented here enable researchers to handle larger datasets, conduct more precise experiments, and incorporate richer forms of information into their analyses. While each chapter addresses a distinct challenge, they are united by a common theme: expanding the scope of feasible empirical research through methodological innovation. The tools and frameworks developed in this dissertation con- tribute to the growing toolkit available to empirical researchers, particularly those working with large-scale, complex, or unstructured data.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherOkar_washington_0250E_27998.pdf
dc.identifier.urihttps://hdl.handle.net/1773/53526
dc.language.isoen_US
dc.rightsnone
dc.subjectEconomics
dc.subject.otherEconomics
dc.titleScaling Econometrics: Text Processing, Distributed Computing, and Experimental Design
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Okar_washington_0250E_27998.pdf
Size:
2.57 MB
Format:
Adobe Portable Document Format

Collections