Statistical Prediction of HLA Alleles and Relatedness Analysis in Genome-Wide Association Studies
MetadataShow full item record
Genome-wide association studies (GWAS) have been used widely in the last decade to investigate the genetic basis of many complex diseases and traits. However, the significantly associated SNPs have explained relatively little of the heritability of most common diseases and traits, and most of these SNPs give small increments in risk or have small effect sizes. This phenomenon has led to a well-known problem of "missing heritability" of complex diseases and traits using GWAS. Discussions of "missing heritability" in GWAS require examination of the underlying assumption of linkage disequilibrium (LD) at the population level. The human leukocyte antigen (HLA) region has been considered as a high-LD region, but it is also known to be highly polymorphic. The study of HLA imputation could well provide us new insights into the "missing heritability", and a new method "HIBAG" is proposed that makes predictions by averaging HLA type posterior probabilities over an ensemble of classifiers built on bootstrap samples. Another major analytical factor affecting the interpretation of GWAS is cryptic relatedness and population stratification. Principal component analysis (PCA) has been widely used to detect and correct for population structure, however it is a model-free approach and seems like a "black box". An interpretation of PCA based on identity-by-descent (IBD) measures is given, and an approximately linear transformation between the projection of individuals onto principal components and allele admixture fractions assuming two or more ancestral populations is revealed.
- Biostatistics