Essays on Machine Learning in Applied Microeconomics
Hedonic models are commonly used to recover the implicit prices of house attributes and local non-market public goods. Yet they are plagued by omitted variable bias when variables that are correlated with the attribute in question are unobservable. The increase in availability of big data and unstructured data in the form of text and images allow for a more extensive set of variables that are relevant to consumers to be included in hedonic methods. Unstructured data are high-dimensional and require machine learning methods that are robust to multicollinearity and irrelevant variables. They can also nest previous econometric methods. In this dissertation, I show that by controlling for more home attributes, bias is significantly reduced when estimating willingness-to-pay for environmental and urban amenities. I first estimate the effects of air pollution on house prices in Pennsylvania. By incorporating a rich home transaction dataset collected from Zillow, I reduce bias by more than half. Using a similar dataset, I then estimate how minimum lot-size zoning impacts home prices in Seattle. I nest a boundary discontinuity design within ensemble tree models such as random forest and gradient boosting and find that zoning is associated with 5\% increase in home prices, a number significantly smaller than estimates when limited data and standard linear models are used. Last, I demonstrate how features can be extracted from curbside view images using computer vision tools. These features can be used to model curb appeal, a home attribute that has never been included in hedonic models but is of importance to consumers. The combination of more data and machine learning tools leads to models that are more predictive as well as significant reduction in bias when estimating treatment effects.
- Economics