Weighted likelihood estimation under two-phase sampling
MetadataShow full item record
Two-phase sampling is a sampling technique for cost reduction and improved efficiency of estimation, adopted in many epidemiological studies. In this dissertation, we study weighted likelihood estimation, a standard estimation method in this study design. Though sampling without replacement at the second phase induces dependence among observations, independence is often assumed in practice for theoretical convenience, leading to overestimating the asymptotic variance. The main contribution of this dissertation is to develop asymptotic theory for the weighted likelihood estimation taking account of the dependence of observations due to the sampling scheme, for both cases where the nuisance parameter is estimable at a regular (square root n-rate) and non-regular rates. To this end, we develop a set of empirical process tools including a Glivenko-Cantelli theorem, a theorem for rates of convergence of M-estimators, and a Donsker theorem for the inverse probability weighted empirical processes under two-phase sampling and sampling without replacement at the second phase. For variance estimation, we propose two different bootstrap procedures. The first method is to estimate the phase I and II variances separately which allows us to evaluate how much information we lose by two-phase designs. The second method, which accounts for the phase I and II variances at the same time, provides valid variance estimates even under model misspecification. We also develop the method, within-stratum centered calibration, to improve efficiency over generally inefficient weighted likelihood estimators and study its theoretical properties.
- Biostatistics