Chen, Yen-ChiMcCormick, Tyler HarrisWei, Zeyu2024-09-092024-09-092024-09-092024-09-092024Wei_washington_0250E_26957.pdfhttps://hdl.handle.net/1773/52189Thesis (Ph.D.)--University of Washington, 2024Graph, consisting of a set of vertices and a set of edges, is a natural tool to study relations. From a geometric perspective, relations between data points reveal information about the underlying structure, and a graph as a geometric object can not only visualize but also mathematical characterize such geometric structures in the data. From a network perspective, graphs can also model connections between different units and have applications in various fields such as epidemiology, econometrics, sociology, biology, and astronomy. We first take advantage of graphs from a geometric perspective and propose a data analysis framework that constructs weighted graphs, called skeletons, to encode the geometric structures in the data and utilize the learned graphs to assist downstream analysis tasks such as clustering and regression. For clustering, we introduce a density-aided method that can detect clusters in multivariate and even high-dimensional data with irregular shapes. To bypass the curse of dimensionality, we propose surrogate density measures that are less dependent on the dimension and have intuitive geometric interpretations. The clustering framework constructs a concise graph representation of the given data as an intermediate step and can be thought of as a combination of prototype methods, density-based clustering, and hierarchical clustering. We show by theoretical analysis and empirical studies that skeleton clustering leads to reliable clusters in multivariate and high-dimensional scenarios. For regression tasks, we propose a novel framework specialized for covariates concentrated around some low-dimension geometric structures. The proposed framework first learns a graph representation of the covariates which encodes the geometric structures. Then we apply nonparametric regression techniques to estimate the regression function on the skeleton graph, which, notably, bypasses the curse of dimensionality. We derive statistical and computational properties of the proposed regression framework and use simulations and real data examples to illustrate its effectiveness. Our framework has the advantage that predictors for distinct geometric structures can be accounted for and is robust to additive noise and noisy observations. Graphs are widely used to represent networks of connections and serve as a helpful tool in modeling real-world diffusion processes.Network diffusion models are used to study things like disease transmission, information spread, and technology adoption. However, small amounts of mismeasurement are extremely likely in the networks constructed to operationalize these models. We show that estimates of diffusions are highly non-robust to this measurement error. First, we show that even when measurement error is vanishingly small, such that the share of missed links is close to zero, forecasts about the extent of diffusion will greatly underestimate the truth. Second, a small mismeasurement in the identity of the initial seed generates a large shift in the locations of the expected diffusion path. We show that both of these results still hold when the vanishing measurement error is only local in nature. Such non-robustness in forecasting exists even under conditions where the basic reproductive number is consistently estimable. Possible solutions, such as estimating the measurement error or implementing widespread detection efforts, still face difficulties because the number of missed links is so small. Finally, we conduct Monte Carlo simulations on simulated networks, and real networks from three settings: travel data from the COVID-19 pandemic in the western US, a mobile phone marketing campaign in rural India, and an insurance experiment in China.application/pdfen-USCC BYDiffusionNetworksNonparametric StatisticsRegression AnalysisTopological Data AnalysisStatisticsArtificial intelligenceStatisticsStatistical Learning and Modeling with Graphs and NetworksThesis