Causal Inference Using Educational Observational Data: Statistical Bias Reduction Methods and Multilevel Data Extensions
Author
Hernandez, Jose Manuel
Metadata
Show full item recordAbstract
This study utilizes a data driven simulation design, which deviates from the traditional model-based approaches most commonly adopted in quasi-experimental Monte Carlo (MC) simulation studies, to answer two main questions. First, this study explores the finite sample properties of the most utilized quasi-experimental methods that control for observable selection bias in the field of education and compares them to traditional regression methods. Second, this study lends an insight into the effects of ignoring the multilevel structure of data commonly found in the field when using quasi-experimental methods. Specifically, treatment effects were estimated using (1) Ordinary Least Squares (OLS) multiple linear regression (treatment effects, adjusted for mean differences on confounders), (2) Propensity Score Matching (PSM) using nearest neighbor 1:n with replacement, (3) Propensity Score Matching using Inverse Probability Weighting (IPW) of the propensity score, and (4) Propensity Score Matching using Sub-classification (Subclassification). There were five main factors that were varied to simulate the data, all of which were fully crossed, as follows: Four sample sizes (600, 1000, 2000, and 5000); three association levels among simulated variables (low, moderate, high); two treatment exposure levels (25% and 50%); four treatment effect sizes using Cohen’s d (none, low, moderate, and high); and five levels of ICCs (0, .10, .20, .30, and .40). These 480 conditions were each analyzed with four methods of analysis, for a total of 1920 conditions. Additionally, using data from the Educational Longitudinal Study of 2002 (ELS:2002), an applied study demonstration of the different estimation methods in question was performed and compared to the simulation results. Findings indicate that under certain conditions all methods compared perform the same and have similar estimates of treatment effects. Additionally, when the clustering of the data is ignored bias is introduced for smaller sample size conditions.
Collections
- Education - Seattle [475]