Characterization and analysis of repetitive centromeres
MetadataShow full item record
Centromeres are specialized regions of eukaryotic chromosomes that ensure faithful transmission of genetic information at each cell division. The molecular architecture of centromeres is defined by evolutionarily dynamic protein and DNA components, which have been proposed to contribute to the origin of new species, while defects in centromeres have been linked to human disease. Centromeres are embedded in regions composed of large arrays of head-to-tail 'satellite' DNA elements, which are not amenable to many conventional genomic analyses. Here, I describe the development of methods for the analysis of repetitive genomic regions and apply these tools to study primate centromeres, which are composed of ~170-bp alpha-satellite units. Although centromeric DNA is known to be polymorphic in humans, comprehensive cataloguing of variants at centromeres has not been possible. To gain insight into centromeric genetic variation, I developed a method that uses single-molecule sequencing for analyzing characteristic sequence periodicities called higher-order repeats that arise in human centromeres. The application of this approach to catalogue inter-individual, population-scale, and disease-associated structural variation identified extensive polymorphism in centromeres associated with binding sites for CENP-B, a sequence-specific DNA binding protein. This work also defined a set of functionally important alpha-satellite dimeric units that are underrepresented in current centromere models and demonstrated aberrations in centromeric sequence in breast cancer. I suggest a role for CENP-B in the evolution and maintenance of higher-order periodicities in centromeric arrays. Although alpha-satellite is present at the centromeres of most primates, the precise mechanisms of evolution of centromeric DNA and the contribution of genetic sequence to the specification of centromere identity remain unresolved. I examined centromere evolution in primates using a combination of data from different whole-genome sequencing methods. This approach demonstrated the presence of higher-order periodicities in all primates and identified an important role for CENP-B in shaping centromeric repeat organization. Further analysis of alpha-satellite uncovered interspecific variation in the presence of short inverted repeats, which may form hairpin and stem-loop structures. Based on these data, I propose a genetic mechanism for centromere specification that depends on the formation of cruciform or other non-B-form nucleic acid structures. Taken together, this work enables the cataloguing of variation in satellite DNA, defines important evolutionary transitions in primate centromeres, and advances a model for primate centromere evolution and a theory for centromere specification.