Graph Estimation and Cluster Analysis in High Dimensions
Tan, Kean Ming
MetadataShow full item record
In many applications, it is of interest to uncover patterns from a high-dimensional data set in which the number of features, p, is larger than the number of observations, n. We consider the areas of graph estimation and cluster analysis, which are often used to construct gene expression network and to partition the observations or features into subgroups, respectively. For graph estimation, we propose a framework to estimate graphical models with a few hub nodes that are densely-connected to many other nodes. We apply our framework to three widely used probabilistic graphical models: the Gaussian graphical model, the covariance graph model, and the binary Ising model. For cluster analysis, we propose a novel methodology for partitioning both observations and features into groups simultaneously, which we refer to as sparse biclustering. We also propose a framework to account for the correlation among the observations and features when we perform sparse biclustering. In addition, we study the statistical properties of convex clustering, a recent proposal for cluster analysis, which involves solving a convex optimization problem.
- Biostatistics