Harchaoui, ZaidPal, SoumikLiu, Lang2023-01-212023-01-212023-01-212023-01-212023-01-212022Liu_washington_0250E_25117.pdfhttp://hdl.handle.net/1773/49764Thesis (Ph.D.)--University of Washington, 2022Statistical divergences have been widely used in statistics and artificial intelligence to measure the dissimilarity between probability distributions. The applications range from generative modeling to statistical inference. Early works in statistics have focused on discrete and low-dimensional probability distributions. We choose to tackle problems emerging in modern applications of statistics and artificial intelligence in which the sample space is either discrete with a large alphabet (e.g., natural language processing) or continuous of high dimension (e.g., computer vision). This dissertation revisits statistical divergences in modern applications and addresses challenges arising from the complex nature of the data. Chapter 2 studies the minimum Kullback-Leibler divergence estimation which is equivalent to the widely used maximum likelihood estimation. While the classical asymptotic theory is well established in a rather general setting, high-dimensional problems reveal several of its limitations. We develop finite-sample bounds characterizing the asymptotic behavior in a non-asymptotic fashion, allowing the dimension to grow with the sample size. Unlike previous work that relies heavily on the strong convexity of the objective function, we only assume the Hessian is lower bounded at optimum and allow it to gradually become degenerate. This is enabled by the notion of self-concordance originating from convex optimization. Chapter 3 investigates the framework of divergence frontiers, a notion of trade-off curves built upon statistical divergences, for comparing generative models. These trade-off curves are analogous to operating characteristic curves in statistical decision theory. Due to the complex and high-dimensional nature of the input space, an effective approach used by practitioners to estimate divergence frontiers involves a quantization step followed by an estimation step. We establish non-asymptotic bounds on the sample complexity of this estimator. We also show how smoothed distribution estimators such as Good-Turing or Krichevsky-Trofimov can overcome the missing mass problem and lead to faster rate of convergence. Chapters 4 and 5 explore the Schrödinger bridge problem---an information projection problem which projects a reference measure onto a linear subspace of probability distributions in terms of the Kullback-Leibler divergence. This problem is equivalent to the entropy-regularized optimal transport problem that recently attracted a huge attention from the statistics and machine learning communities. We develop limit laws and non-asymptotic bounds for its empirical estimators. Unlike the unregularized optimal transport, our results enjoy a parametric rate of convergence that does not suffer from the curse of dimensionality. We also propose statistical tests for testing homogeneity and independence based on the Schrödinger bridge problem.application/pdfen-USCC BYStatisticsStatisticsStatistical Divergences for Learning and Inference: Limit Laws and Non-Asymptotic BoundsThesis