Essays on Machine Learning and Hedonic Models
MetadataShow full item record
Chapter 1 and 2: We survey and apply several techniques from the statistical and computer science literature to the problem of demand estimation. We derive novel asymptotic properties for several of these models. To improve out-of-sample prediction accuracy and obtain parametric rates of convergence, we propose a method of combining the underlying models via linear regression. We illustrate our method using a standard scanner panel data set to estimate promotional lift and find that our estimates are considerably more accurate in out-of-sample predictions of demand than some commonly-used alternatives. While demand estimation is our motivating application, these methods are widely applicable to other microeconometric problems. Chapter 3: We collect high dimensional data and extract features from house descriptions and images to use as controls within a hedonic model to estimate the impact of fracking on house prices in Pennsylvania. Supplementing a structured dataset with high dimensional unstructured data in the form of descriptive words and images of homes can help to close the gap caused by omitted variable bias. We construct curb appeal scores based on aesthetic features of home images. We then compare four models: OLS, LASSO - OLS, random forest and gradient boosting. The ensemble tree models (random forest and gradient boosting) yield 10% improvements in prediction accuracy compared to LASSO and OLS. Our results imply that royalty payments exactly compensate for the negative environmental effects on homes within 1 km of fracking wells but increase the price of houses farther away by up to 5%.
- Economics