1 
  Genetic etiologies  of Autism Spectrum Disorder   Niklas Krumm    A dissertation submitted in partial  fulfillment of the requirements for the degree of Doctor of Philosophy  University of Washington, 2014    Reading Committee: Evan E. Eichler, Chair Deborah A. Nickerson  Jay Shendure    Program Authorized to Offer Degree Genome Sciences, University of Washington 
 2 
ABSTRACT Autism spectrum disorder (ASD) is a common, heritable neurodevelopmental disorder. In this thesis, I examine how different genetic etiologies, mutation types and specific genes contribute to the risk of ASD and how these factors can be used to expand our understanding of the neurobiological underpinnings of ASD. I develop a new bioinformatics method (CoNIFER: copy number inference from exome reads) to identify copy number variants (CNVs) using exome sequencing data, enabling much more sensitive identification of a previously under-ascertained class of small CNVs (<100 kbp in size). I estimate the precision of the algorithm using 366 exomes and show that this method can be used to reliably predict both de novo and inherited rare CNVs and can predict absolute copy number for loci with fewer than eight copies.  Next, I searched for disruptive, genic rare CNVs among 411 families with sporadic ASD from the Simons Simplex Collection (SSC) and identified additional small genic rare CNVs compared to high-density single nucleotide polymorphism (SNP) microarrays (~2X higher yield). I found that affected probands inherit more CNVs than their siblings (p=0.004; OR=1.19), and these CNVs affect more genes, are enriched for brain-expressed genes, and are transmitted preferentially from the mother. Finally, I found that the excess burden of inherited CNVs among probands is driven primarily by proband-sibling pairs with discordant social behavior phenotypes. I then created a combined set of both inherited and de novo single nucleotide variants (SNVs) and CNVs across 2,377 SSC ASD families, including 1,786 families with both an affected and unaffected child. I compared the burden of inherited and de novo mutations between affected and unaffected siblings and found that private inherited truncating SNV mutations in conserved genes are significantly enriched in probands (OR=1.14, p<0.0002)? an effect that became more pronounced with increasing gene conservation. I quantified ASD risk for de novo and inherited CNVs and SNVs by using a conditional logistic regression model. Independent from de novo mutations, private truncating SNVs and rare inherited CNVs contribute an increase in risk of 1.11 (p=0.0002) and 1.23 (p=0.01), respectively. These results confirm a statistically independent role for inherited mutations in ASD risk and identify additional candidate genes (e.g., RIMS1, CUL7 and CSMD1) where inherited and de novo burden converge. 
 3 
TABLE of CONTENTS I. INTRODUCTION 6!1.1 SUMMARY 6!1.2 AUTISM SPECTRUM DISORDER 7!1.3 IDENTIFYING GENES AND PATHWAYS INVOLVED IN ASD AND ID 8!1.4 AN INCREASE IN DE NOVO LOSS-OF-FUNCTION MUTATIONS 11!1.5 CANDIDATE GENES YET FEW RECURRENT HITS 12!1.6 LARGE-SCALE RESEQUENCING OF CANDIDATE GENES 13!1.7 NOVEL CANDIDATES AND THEIR NEUROBIOLOGY 16!1.8 PROTEIN INTERACTION NETWORKS CONVERGE ON COMMON PATHWAYS 18!1.9 ASSAYING COPY NUMBER VARIATION 24!1.10 THESIS GOALS 25!II. COPY NUMBER VARIATION DETECTION AND GENOTYPING FROM EXOME SEQUENCE DATA 28!2.1 SUMMARY 28!2.2 INTRODUCTION 29!2.3 METHODS 29!2.4 RESULTS 32!2.5 DISCUSSION 41!III. TRANSMISSION DISEQUILIBRIUM OF SMALL CNVS IN SIMPLEX AUTISM 44!3.1 SUMMARY 44!3.2 INTRODUCTION 45!3.3 METHODS 46!3.4 RESULTS 48!3.5 DISCUSSION 57!IV. INHERITED SNV MUTATIONS IN AUTISM SPECTRUM DISORDER 65!4.1 SUMMARY 65!4.2 INTRODUCTION 66!4.3 METHODS 67!4.4 RESULTS 68!4.5 DISCUSSION 79!V. SUMMARY AND FUTURE DIRECTIONS 82!5.1 SUMMARY OF RESULTS 82!5.2 TOWARDS ASSAYING THE COMPLETE SET OF GENETIC VARIATION 82!5.3 UNDERSTANDING NORMAL VARIATION AND PATHOGENIC VARIANTS: 86!5.4 DEFINING SUBTYPES OF THE AUTISM SPECTRUM 88!5.5 DEFINING A GRADIENT OF SIMPLEX AND MULTIPLEX AUTISM 89!5.6 UNDERSTANDING COMPLEX GENETIC ETIOLOGIES AT A FAMILY LEVEL 90!5.7 FUTURE DIRECTIONS 92!REFERENCES 94!WEB AND SOFTWARE RESOURCES 106!APPENDICES 108! 
 4 
LIST of FIGURES  Chapter 1 Figure 1.1:  Estimating the number of ASD/ID risk genes ...........................................14 Figure 1.2:  Location of de novo truncating mutations in ASD and ID genes ..............16 Figure 1.3:  Genes disrupted by de novo mutations form a connected network ...........20 Figure 1.4:  CHD8 and CTNNB1 putative regulatory network for head size ................22  Chapter 2 Figure 2.1:  Method overview and CNV discovery ......................................................34 Figure 2.2:  CNP locus genotyping of RHD and C4A ...................................................38 Figure 2.3:  Genotyping accuracy across 62 CNP loci ..................................................39  Chapter 3 Figure 3.1:  Discovery and validation of previously undiscovered CNVs   using exomes ........................................................................................50 Figure 3.2:  Increased inherited CNV burden in ASD probands for    large and small CNVs ..........................................................................52 Figure 3.3:  Inherited CNV burden correlates with SRS phenotype .............................54 Figure 3.4:  Genes in proband-only CNVs from SRS-discordant quads are    more likely brain-expressed .................................................................56 Figure 3.5:  A combined model of inherited and de novo mutations ............................62  Chapter 4 Figure 4.1:  Network of genes with recurrent de novo hits ...........................................70 Figure 4.2:  Transmission disequilibrium of SNVs in ASD ..........................................72 Figure 4.3:  Transmitted mutations and their effect on phenotype ................................74 Figure 4.4:  Convergence of de novo and inherited mutations on CSMD1 ...................77 Figure 4.5:  Combined risk model for SNVs and CNVs, inherited and de novo ..........78  Chapter 5 Figure 5.1:  High genomic GC nucleotide content (green histogram) hinders whole-exome sequencing ................................................................................84 Figure 5.2:  Number of trios sequenced and expected rate for    ASD-implicated genes .........................................................................91 
 5 
LIST of TABLES  Chapter 1 Table 1.1:  Six recent family-based exome studies of ASD and ID ..............................10 Table 1.2:  Recurrent disruptive mutations in ID and ASD ..........................................15  Chapter 2 Table 2.1:  Cohorts analyzed .........................................................................................30 Table 2.2:  Precision of exome-based CNV calls in HapMap samples .........................36 Table 2.3:  Precision of exome-based CNV calls in autism trios ..................................37  Chapter 3 Table 3.1:  Summary of transmitted CNVs in 411 ASD quads .....................................49 Table 3.2:  Summary of IQ and SRS burden .................................................................55  Table 3.3:  Selected inherited CNVs .............................................................................61  Chapter 4 Table 4.1:  Genes with new recurrent de novo mutations .............................................69 Table 4.2:  Summary of mutations in IGF-related ASD network .................................71 Table 4.3:  Converging evidence for RIMS1, CUL7 and CSMD1 from de novo and inherited mutations ................................................................................75 Table 4.4:  Summary of logistic regression model results ............................................79 
 6 
I. Introduction 1.1 Summary Autism spectrum disorder (ASD) and intellectual disability (ID) are neurodevelopmental disorders with large genetic components, but identification of pathogenic genes has proceeded slowly because hundreds of loci are involved. This introduction describes how new exome sequencing technology has identified novel rare variants and has found that sporadic cases of ASD/ID are enriched for disruptive de novo mutations. Targeted large-scale resequencing studies have confirmed the significance of specific loci, including chromodomain helicase DNA binding protein 8 (CHD8), sodium channel, voltage-gated, type II, alpha subunit (SCN2A), dual specificity tyrosine-phosphorylation-regulated kinase 1A (DYRK1A), and beta-catenin (CTNNB1). I review recent studies and suggest that they have led to a convergence on three functional pathways: (i) chromatin remodeling; (ii) wnt signaling during development; and (iii) synaptic function.  
The heritability and genetic etiology cannot be fully explained by de novo events, and I suggest that additional underlying genetic etiologies, mechanisms and effects also play a role in ASD. I describe three major topics of this thesis: (i) the development of new tools to more completely assay genetic variants using next-generation sequencing data (Chapter II), (ii) describing the role of inherited copy number variants (CNVs) in the risk of ASD (Chapter III), and (iii) exploring how multiple types of mutations, including inherited and de novo SNVs and CNVs, contribute to ASD (Chapter IV).   This introduction has been adapted from: Krumm, N., O'Roak, B. J., Shendure, J., & Eichler, E. E. (2014). A de novo convergence of autism genetics and molecular neuroscience. Trends in Neurosciences, 37(2), 95?105. doi:10.1016/j.tins.2013.11.005.  
 7 
1.2 Autism spectrum disorder Autism spectrum disorder (ASD) is a common (frequency 1/88?1/250 births) disorder that results in three distinct phenotypes which manifest early in development: 1) deficits in language ability, 2) impairment in reciprocal social interaction and communication, and 3) repetitive or stereotyped movements and interests. ASD encompasses a range of disorders, including Asperger?s syndrome and Autistic Disorder itself. ASD and related disorders can commonly present with comorbidities of intellectual disability or developmental delay and epilepsy, but these are neither requirements for diagnosis nor observed in all cases.   The etiology of ASD has a strong genetic component, and the heritability of these disorders is widely estimated to be 80-90% based on studies comparing concordance in monozygotic and dizygotic twins (Lichtenstein et al. 2010; Bailey et al. 1995), although larger and more recent studies based on a broader set of familial relationships have shown significantly lower estimates of ~50% (Hallmayer et al. 2011). Additional evidence for a genetic etiology stems from increased proportions of ASD diagnoses in monogenic developmental disorders such as Fragile X and tuberous sclerosis, the identification of genes in families with Mendelian forms of autism, an increased rate of sibling recurrence (~8% for boys), and observation of a ?Broad Autism Phenotype? in (unaffected) parents of children with ASD.  Understanding the genetic etiology of ASD has several important motivating factors. First, identification of specific genes (and mutations) underlying the pathogenesis of ASD will enable a new era of genetic diagnosis and classification of ASD. The observed ASD phenotypes are extremely heterogeneous, making diagnostic criteria based on behavior and phenotype necessarily broad as well. An understanding of the specific genes that cause ASD will enable a much more precise diagnosis, or refinement of an initial diagnosis.   Second, it is likely the genetic etiology of ASD encompasses two different broad mechanisms of genetic mutations: The first are so-called de novo mutations, which are 
 8 
present in offspring but not their parents and arise either in the parent?s germline or early in development; the second type of mutation is an inherited (or transmitted) mutation from parents to offspring. Both types of mutations are likely to contribute to ASD, and an understanding of their importance and pathogenic potential will provide a crucial ?big picture? understanding of how genetics affect ASD risk.  Finally, an understanding of both the genes and the broader genetic mechanisms in ASD is requisite for the identification and development of therapeutics. Specifically, therapeutic development will follow the ability to identify genetic subtypes of ASD (both defined by gene and genetic etiology or mechanism) and will build off of an understanding of the function of individual genes identified. The often severe and debilitating nature of the ASD phenotype, combined with its high population prevalence, provides strong impetus to study and understand this disorder.  1.3 Identifying genes and pathways involved in ASD and ID The identification of genes underlying ID and ASD has been most successful for syndromic Mendelian or monogenic disorders?for example, FMR1 (Fragile X syndrome, Fu et al. 1991), MECP2 (Rett syndrome, Amir et al. 1999) or UBE3A (Angelman syndrome, Matsuura et al. 1997). Together, however, these syndromes are estimated to account for less than 10% of ASD/ID, suggesting the presence of additional genes and etiologies. Initial population-based studies failed to identify single genes of major effect and few major common risk variants have been replicated, despite the strong observed heritability of these diseases (Steffenburg et al. 1989; Bailey et al. 1995; Hallmayer et al. 2011; Constantino et al. 2013). By contrast, targeted and genome-wide microarray studies revealed that large de novo CNVs were significantly enriched among probands when compared to unaffected siblings and/or controls (Marshall et al. 2008; Sebat et al. 2007; de Vries et al. 2005; Levy et al. 2011; Sanders et al. 2011; Cooper et al. 2011; Sharp et al. 2006), a finding that echoed the earlier discovery of large chromosomal aberrations in ASD and ID. Both initial and subsequent higher-resolution studies estimate that 8% of sporadic ASD cases carried a de novo CNV, as compared with only 2% of unaffected siblings (Levy et al. 2011; Sanders et al. 2011). Furthermore, among children 
 9 
with general developmental delay (DD) and ID, rare large de novo CNVs are thought to account for up to 15% of disease burden (Cooper et al. 2011). Although individually rare, some of these CNVs were in fact recurrent mutations, mediated by locus-specific genomic instability (Sharp et al. 2006), and many of these same recurrent CNVs observed initially in patients with ID (Sharp et al. 2008) or ASD (Miller et al. 2009) have been identified in adults with epilepsy (Helbig et al. 2009), bipolar disorder (Ben-Shachar et al. 2009), or schizophrenia (Stefansson et al. 2008; International Schizophrenia Consortium 2008), suggesting overlap in the genetic etiology of these disorders.   The discovery of an aggregate burden of large de novo CNVs and the identification of recurrent events signaled a new paradigm for ASD and ID genetics. While specific CNVs are individually rare, combined they account for a significant fraction of cases, indicating the presence of considerable locus heterogeneity of ASD and ID. The de novo nature of these CNVs, together with their absence in the general population, suggests they represent a class of highly deleterious and highly penetrant mutations. Their underlying genetic model does not explicitly fit a recessive model of disease, since CNVs are primarily present as hemizygous deletions or duplications. These mutations alter the dosage of genes but do not completely abolish their presence. Collectively, these observations support a complex disease/rare variant model for ASD, in which a proportion of etiologic risk is conferred by very rare variants and de novo mutations.  The commoditization of next-generation or ?massively parallel? sequencing represents a turning point in human genetics and makes it possible to discover sequence-level variants across nearly all coding regions (?the exome?) or the whole genome. These methods were first applied to confirm point mutations underlying Mendelian disorders (Ng et al. 2009c), and subsequent pilot studies demonstrated that family-based (trio) exome sequencing could discover pathogenic mutations in simplex ID (Vissers et al. 2010) or ASD (O'Roak et al. 2011). In the past year, this paradigm of de novo mutation discovery using exome sequencing of parent-child trios has been expanded to about one thousand ASD or ID families, resulting in the first detailed picture of how de novo coding mutations contribute to these disorders. 
 10 
 In this introduction, I synthesize the results of recent large-scale exome sequencing studies of ASD and ID (O'Roak et al. 2012b; Neale et al. 2012; Iossifov et al. 2012; Sanders et al. 2012; de Ligt et al. 2012; Rauch et al. 2012) and summarize their implications for human neurodevelopmental genetics. There are three themes: 1) Exome sequencing of ASD/ID families has revealed a significant excess of de novo mutations in probands when compared to unaffected siblings and has identified novel candidate genes contributing to the neurological deficits. I note that the strongest effects are observed for de novo loss-of-function (or truncating) mutations, which prematurely truncate the protein due to frameshift and nonsense mutations. 2) Both CNV and exome sequencing data suggest that no single gene will account for more than 1% of autism cases; rather, rare mutations in hundreds of genes may contribute to ASD or ID. 3) Analyses of network connectivity further implicate potentially important neurodevelopmental and synaptic pathways in ASD and ID.   Study details N Synonymousb Missensec Nonsense,  splice site, indels Iossifov 2012 (Iossifov!et!al.!2012)  ASDa Quads 343 79:69 207:207 59:28 Sanders 2012 (Sanders!et!al.!2012) ASDa Quads 200 29:39 110:82 15:5c O?Roak 2012 (O'Roak!et!al.!2012b) ASDa Quads 50 14:16 40:31 6:3 Trios 159 54 115 29 Neale 2012 (Neale!et!al.!2012) ASD Trios 175 50 101 18 de Ligt 201  (de!Ligt!et!al.!2012) ID Trios 100 16 48 14 Rauch 2012 (Rauch!et!al.!2012) ID Trios 51 11 56 20 Sum Odds Ratio (ASD) (95% CI)   1,078 122:124 0.98 (0.73?1.31:1) 357:320 1.29 (1.01?1.63) 80:36 2.41 (1.58?3.75)  aTrios/quads from the Simons Simplex Collection (SSC). bCounts refer to the number of mutations in probands or, when separated by a colon, to probands and siblings (e.g., probands:siblings). cNot including indels.  Table 1.1: Six recent family-based exome studies of ASD and ID    
 11 
1.4 An increase in de novo loss-of-function mutations Both de novo CNVs and single nucleotide variants (SNVs) can have, in principle, similarly disruptive effects on genes. Crucially, however, the detection of de novo SNVs yields gene-level specificity, thus allowing individual pathogenic genes and neurobiological pathways to be identified. Moreover, a small subset of the de novo mutations (~4% for unaffected and ~9% of affected; Iossifov et al. 2012; Sanders et al. 2012) are disruptive (e.g., frameshift, premature stop codon, splice-donor defect) with respect to the protein?s biological function. Recurrent mutations of this type for a specific gene can strengthen the probability that the de novo mutation relates to phenotype. Since de novo protein-altering SNVs are collectively more common mutation events (~1/generation) than large de novo CNVs (~0.02/generation), there is the exciting possibility that this type of mutation may explain a larger faction of genetic etiology of ASD.  In all, six recent exome sequencing studies of trios (mother, father and affected child) and quads (also includes an unaffected sibling) of sporadic ASD (Iossifov et al. 2012; Sanders et al. 2012; O'Roak et al. 2012b; Neale et al. 2012) or sporadic ID (de Ligt et al. 2012; Rauch et al. 2012), together comprising 1,078 families (Table 1.1) have been performed. Three of the ASD studies included unaffected siblings in order to compare mutation rates between affected probands and siblings. While these studies found a slightly elevated rate of mutation in probands versus their unaffected siblings (1.02 vs. 0.79 mutations per offspring), the type of mutation was critical: probands had two- to threefold more disruptive de novo mutations in comparison to their siblings, or to a random model of mutation (Sanders et al. 2012; Iossifov et al. 2012). Overall, among the 593 ASD quads, there were 80 such mutations in probands with ASD, but only 36 in siblings (OR = 2.41, p < 1x10-4, Fisher?s exact test; Table 1.1). The reported enrichment of missense mutations in probands has been less robust, with study estimates for enrichment between 1- and 1.34-fold, but analysis of all quads does show weak statistical enrichment (OR = 1.29, p = 0.03). It is likely that some missense mutations are pathogenic while others are benign, a distinction likely dependent on the context of the mutation and affected proteins themselves. 
 12 
 Overall, these studies suggest that protein-truncating de novo SNVs contribute to the risk of ASD for about 10-15% of probands (Sanders et al. 2012; Iossifov et al. 2012), though this fraction is almost certainly a conservative estimate, as an unknown fraction of de novo events are still missed using current sequencing methods and bioinformatics tools. It is important to note that the six current exome studies focused primarily on a de novo mutation genetic model for the development of disease. Recent results highlight the effect of transmitted CNVs (Poultney et al. 2013; Krumm et al. 2013), as well as a renewed emphasis on the effect of common variation in ASD (Klei et al. 2012) based on study of data generated from the same samples.   1.5 Candidate genes yet few recurrent hits Many strong neurobiological candidates have emerged from the genes disrupted by de novo mutations in these studies, including mutations in previously identified ASD/ID genes. Several mutations were identified in NRXN1 and NLGN1; both are central components of the neurexin-neuroligin synaptic cell adhesion complex (Kim et al. 2008). Numerous de novo mutations were identified in genes or loci linked to Mendelian disorders, many of which have features of ASD or ID. These loci and genes include MBD5 (mental retardation, autosomal dominant #1, OMIM 156200), CHD7 (CHARGE syndrome, OMIM 214800), PTEN (Cowden syndrome, OMIM 158350), DYRK1A (in Down syndrome critical region, OMIM 190685), TSC2 (tuberous sclerosis, OMIM 613254), SETBP1 (Schinzel-Giedion syndrome, OMIM 269150), and RPS6KA3 (Coffin-Lowery syndrome, OMIM 303600). Finally, mutations in several genes mapped to critical deletion regions or association intervals initially discovered by large CNVs, including mutations in SYNRG (17q12 deletion syndrome), POLRMT (19p13.3 deletion), and CTTNBP2?a potential candidate for the AUTS1 (7q31) deletion locus.   Recurrently mutated genes, however, were few. In sum, only three genes (CHD8, SCN2A, and SYNGAP1) had two independent truncating de novo mutations in any single study, and no gene had more than three mutations. Models designed to estimate the significance of recurrent de novo mutations based on gene size and context found 
 13 
nominal significance for CHD8, NTNG1 (O'Roak et al. 2012b), and SCN2A (Sanders et al. 2012), but most genes could not be conclusively implicated. Notably, however, a review of case-control data by Neale and colleagues of 935 cases found three additional truncating CHD8 mutations and one splice-site mutation in SCN2A, further strengthening the initial disease associations of these genes (Neale et al. 2012). In addition, within a few weeks of these initial reports, a de novo translocation was discovered mapping to CHD8 (Talkowski et al. 2012).  1.6 Large-scale resequencing of candidate genes Given the low rate of recurrence among genes with de novo mutations, estimates of overall locus heterogeneity for ASD have yielded between 300 and 1,000 genes that could confer increased ASD risk when subjected to de novo mutation (Figure 1.1). Even if exome sequencing prices continue to fall, the cost to confirm the association for a significant fraction of these genes remains impractically high, especially if thousands or tens of thousands of samples are required as has been suggested by CNV studies. Instead, targeted next-generation resequencing of candidate genes has proven to be instrumental in associating specific genes. In particular, de Ligt and colleagues resequenced five candidate genes in a confirmation series of 765 ID patients, identifying additional mutations in CTNNB1 and GATAD2B and markedly strengthening their association with ID. Similarly, we have successfully used a molecular inversion probe (MIP) assay to capture and sequence 44 candidate genes in 2,446 ASD probands (O'Roak et al. 2012a). MIP resequencing generates complete sequence across targeted regions, can be performed at high scale and low cost (under $1 per gene per sample), and delivers higher sensitivity for targeted loci than exome sequencing due to increased sequence coverage. Altogether, this assay yielded 27 new de novo mutations across 16 genes; of these mutations, 17 were disruptive SNVs, a fraction higher than expected by chance. The discovery of these mutations confirmed the association with ASD for CHD8 and DYRK1A and provided significant statistical support for four novel genes: GRIN2B, TBR1, PTEN, and TBL1XR1. 
 14 
 
 Figure 1.1: Estimating the number of ASD/ID risk genes. We estimate the number of ASD and ID genes, using an adaptation of the ?hidden species problem? based on the ratio of genes with multiple de novo mutations to all genes with de novo mutations. For each estimate, all genes with recurrent de novo mutations are considered pathogenic, as well as a defined fraction of mutations in genes observed just once (since all de novo mutations are unlikely to be pathogenic). Including more of these singleton mutations as pathogenic, as well as including a broader range of mutation type, exponentially increases the number of ASD and ID risk loci. Thus, considering a disease model in which 15% of all truncating de novo mutations are sufficient and pathogenic, only ~50 genes are expected to be similarly sufficient in their pathogenicity; however, by including missense mutations, the number of loci rises dramatically (to over 400 loci when 15% of de novo SNVs are considered pathogenic). Taken together, this model highlights the locus heterogeneity underlying the genetic etiology of ASD and ID and suggests that the etiology of a large proportion of ASD/ID cases may not be due to a single de novo mutation (truncating or missense); rather, these cases may be the result of a complex set of interactions between multiple mutations, including SNVs, indels, and CNVs. Shaded area indicates 95% confidence intervals around estimate.  In sum, when considering only protein-disruptive mutations from six exome sequencing studies (four ASD and two ID) and including the resequencing of some of these candidate genes, a set of 11 genes (Table 1.2) show enrichment in cases with ASD/ID and account for approximately 2.2% of all cases. We have summarized the distribution of mutations, as well as the prevalence and coding location of mutations found in exome sequence from 6,503 samples from the NHLBI Exome Sequencing Project (ESP) in Table 1.2. For several of these genes with recurrent de novo hits in ASD probands (CHD8, GRIN2B, 
 15 
DYRK1A), no truncating variants were observed in the ESP. Furthermore, while control mutations are sometimes found in genes in high frequency (e.g., frameshift in SYNGAP1 at 3.2% frequency in controls), these mutations are found exclusively near the carboxy terminus of the protein and outside of functional protein domains and are unlikely to affect protein function (Figure 1.2). 
Gene ID Cases ASD Cases Summary ESP Samples Variants Frequency CHD8 - 9/2,446 2 (O), 7 (O*) [+ 3 (N*)] 0 0/6503 SCN2A 3/151 2/593 1 (L), 2 (R), 2 (S) [+ 1 (N*)] 1 7/6503 SYNGAP1 3/151 - 1 (L), 2 (R) 1 207/6503a GRIN2B - 3/2446 1 (O), 2 (O*) 0 0/6503 DYRK1A - 3/2446 1 (I), 1 (O), 1 (O*) 0 0/6503 ZNF292 1/151 1/593 1 (L), 1 (N)  1 2/6503 POGZ - 2/593 1 (I), 1 (N) 1 1/6503 KATNAL2 - 2/593 1 (O), 1 (S) 1 1/6503 TBR1 - 2/2446 1 (O), 1 (O*) 0 0/6503 CTNNB1 1/151 1/2446 1 (L), 1 (O*), [+ 1 (L*)] 0 0/6503 SETBP1 1/151 1/593 1 (O), 1 (R) 3 58/6503a ADNP - 2/2446 1 (O), 1 (O*) 1 1/6500 LRP2 1/151 1/593 1 (I), 1 (L) 6 53/6500 ARID1B - 2/2446 1 (O), 1 (O*) 5 314/6500     Table 1.2: Recurrent disruptive mutations in ID and ASD Genes with two or more de novo truncating mutations observed in studies of ASD or ID. Summary indicates studies in which mutations were discovered. I, Iossifov et al. (Iossifov(et(al.(2012); L, de Ligt et al. (de(Ligt(et(al.(2012); N, Neal et al. (Neale(et(al.(2012), O and O*: O?Roak et al. (O'Roak(et(al.(2012a;(2012b); R, Rauch et al. (Rauch(et(al.(2012); S, Sanders et al. (Sanders(et(al.(2012). Mutations found in secondary replication screens or case-control studies indicated in [brackets] with starred (*) reference. Truncating events found in the ESP database and their population frequencies are shown. (a) The truncating variants found in the EVS database in SYNGAP1 and SETBP1 genes fell at the extreme 3' end of the gene, suggesting that they do not adversely affect gene function. 
 16 
 Figure 1.2: Location of de novo truncating mutations in top five ASD and ID genes. Red markers indicate locations of de novo mutations in ASD and ID cases; green markers indicate locations of truncating mutations in ESP database of over 6,500 samples (see Table 2.2 for details). Mutation codes: S, Stop-gain; Fs, Frameshift; Ss, Splice-site mutation; ?AA amino-acid loss (non-frameshifting). Blocks indicate annotated protein domains from UniProt. Domain names, top to bottom: CD, chromodomain; DEX, Helicase ATP-binding; HELC, Helicase C-terminal; TM, transmembrane domain; IQ, IQ domain; PDZ, PDZ-binding motif; LOC, Bipartite nuclear localization signal; STK, Serine/Threonine protein kinase; PH, Pleckstrin homology domain; C2, C2 domain; SH3, SRC Homology 3 Domain.  1.7 Novel candidates and their neurobiology Many of the top genes from recent exome studies are novel candidates for ASD and ID, including the strongest overall association: CHD8, an ATP-dependent chromodomain helicase that directly regulates CTNNB1 (Beta-catenin; Nishiyama et al. 2012) as well as the p53 pathway (Nishiyama et al. 2009). The CHD8 protein has known binding activity with another chromodomain helicase, CHD7 (Batsukh et al. 2010), which is the key protein in CHARGE syndrome, a rare syndrome with high ASD comorbidity (Betancur 
CHD8
HELCDEXCD
DYRK1A
STKLOC
GRIN2B
TM PDZ
TM IQ
SCN2A
SYNGAP1
PH C2 SH3Ras-GAP
Figure 2
 17 
2011). In addition to directly interacting, both are homologues of the Drosophila trithorax group protein kismet and are components of large chromatin remodeling complexes thought to be important in neural crest cell differentiation (Bajpai et al. 2010). Overall, eight de novo truncating mutations were observed across 2,597 cases in this gene; in contrast, no such mutations were observed in control siblings, or in over 6,500 exomes in the ESP. The frequency of mutations in this gene is the highest of all genes screened thus far and nearly matches that of CNVs at 16p11.2, which is the most frequent recurrently deleted (0.5%) or duplicated (0.3%) locus in sporadic ASD (Kumar et al. 2009; Walsh and Bracken 2011).  The second strongest overall association, with two truncating mutations in ASD cases and three such mutations in ID cases, is SCN2A, a gene previously associated with epilepsy and seizure disorders (Kamiya et al. 2004; Ogiwara et al. 2009). SCN2A encodes a voltage-gated sodium channel (type II, alpha 1;Nav1.2) expressed throughout the brain, and is responsible for the generation and propagation of action potential in neurons. The phenotype associated with this gene appears to be highly variable. Given the smaller number of ID cases, the prevalence of mutations in SCN2A is higher in ID?however, one of the ID cases also shows signs of autism. Lastly, only one of the five cases had a history of seizures (de Ligt et al. 2012), suggesting that mutations in SCN2A have highly variable phenotypic outcomes.   Another striking candidate is DYRK1A (dual-specificity tyrosine phosphorylation-regulated kinase 1A), for which three truncating mutations have been discovered in autism probands (O'Roak et al. 2012b; 2012a) and de novo structural variants in ID probands (van Bon et al. 2011; Moller et al. 2008). DYRK1A is a highly conserved gene whose dosage imbalance has been implicated in the cognitive deficits associated with Down syndrome. The gene interacts with the SWI/SNF complex (Lepagnol-Bestel et al. 2009) and is considered a master regulator of brain growth, affecting diverse aspects of neurogenesis, including neuronal proliferation, morphogenesis, differentiation, and maturation (Mazur-Kolecka et al. 2012; Guedj et al. 2012; Yabut et al.). Mutations in the Drosophila ortholog (mnb) have been known for more than 20 years and result in a 
 18 
?minibrain phenotype? where optic lobes and central brain hemispheres are reduced (Tejedor et al. 1995). Similarly, heterozygous mice knockouts for Dyrk1A (+/-) show a reduction of brain volume in a region-specific manner as well as mental impairment (Fotaki et al. 2002; Song et al. 1996). Consistent with these models, all three human loss-of-function autism patients are cognitively impaired and microcephalic (z-score < ?2).   Three truncating mutations each of GRIN2B and SYNGAP1, and two mutations of TBR1, highlight the importance of excitatory/glutamatergic signaling in both ASD and ID?and are perhaps some of the most conclusive previously implicated genes to date. GRIN2B (found in an ASD case with ID) forms a subunit of an NMDA receptor associated with learning and memory, and targeted sequencing has linked it to neurodevelopmental disorders as well as its discovery in ASD (Endele et al. 2010). This receptor participates in a larger postsynaptic complex with SYNGAP1, in which three mutations in ID patients have been observed in the present cohorts, as well as in multiple previous screens of ID (Vissers et al. 2010; Hamdan et al. 2009). Interestingly, while no mutations of SYNGAP1 were found in the Simons Simplex Collection (SSC) ASD cohorts, SYNGAP1 mutations were recently implicated in several cases of ID with ASD (Berryer et al. 2012). Finally, TBR1, together with the CASK protein, regulates transcription of GRIN2B (as well as several other candidate ASD genes, such as RELN and AUTS2) (Bedogni et al. 2010).  1.8 Protein interaction networks converge on common pathways Knowledge of molecular-level interaction between proteins has enabled the development of transcriptional networks (Voineagu et al. 2011) and protein-protein interaction (PPI) networks enriched for mutation in ASD and ID cases. These networks provide a powerful method to unify the landscape of mutations observed in genetically heterogeneous human disorders by leveraging regulatory interactions between genes and/or physical interactions between proteins. For example, Iossifov et al. found that 14/59 genes disrupted by de novo mutations were significantly enriched (p < 0.006) in a group of 842 genes previously defined (Darnell et al. 2011) as regulated by FMR1, the key protein disrupted in Fragile X syndrome, and noted that this was not true for mutations found in siblings (2/28 part of FMR1-regulated genes), the general population, nor for all missense 
 19 
variants (Iossifov et al. 2012). Neale et al. performed a similar analysis to previously identified ASD and ID risk genes?including a core set of 31 synaptic genes identified from previous proteomic studies?and found that genes with nonsynonymous de novo mutations had a significantly reduced network distance (i.e., they were more closely associated in the network) than was a set of ?comparator? genes derived from silent de novo mutations and sibling mutations (Neale et al. 2012). Lastly, we developed a network for interactions between proteins corresponding to genes with de novo mutations, revealing a single connected component for 39% (49/126) of genes with disruptive or likely disruptive missense de novo mutations (O'Roak et al. 2012b). Notably, in follow-up MIP resequencing, we targeted ~50% network and ~50% non-network genes and found that 94% (16/17) of the newly discovered truncating mutations fell within the network (or a similar, expanded 74-gene network)?an observation unlikely to have occurred by chance (p = 0.0002). In contrast, the non-network genes had only six total mutations, only one of which was a truncating mutation.  I integrated the results from the six exome studies by forming PPI networks using experimentally verified interaction data from StringDB (Jensen et al. 2009) (Supplemental Methods). I found the PPI network based on all truncating and missense mutations in probands was significantly more clustered, had more edges, and created larger connected components than randomly sampled or permuted networks (p ? 0.009 for all tests; Supplemental Methods); in contrast, neither the genes with mutations in siblings, nor those with synonymous mutations (in either proband or siblings) showed any difference from the null distribution of networks (Table S1).   In order to summarize these PPI networks, I connected all truncating mutations as well as six genes with missense mutations with important roles in brain development (Figure 1.3; Supplemental Methods). The two largest connected components of this combined network encompass three broad functional pathways: the first connected component (13 proteins) forms a highly interconnected set of postsynaptic scaffolding proteins and receptors, including SYNGAP1, DLG4, GRIN2A/B, NLGN1 and NRXN1, while the second (9 proteins) contains both WNT-signaling functions of CTNNB1, DLL1, and 
 20 
TBL1XR1 and chromatin remodeling functions, anchored by the CHD8 protein. It is important to emphasize that while the nodes in the displayed network are partially based on a manually selected set of genes, the connected components formed are a strict subset of the unbiased PPI simulations described above and are larger than any connected component that can be formed using disruptive mutations found in siblings, synonymous changes, or randomly chosen genes. 
 Figure 1.3: Genes disrupted by de novo mutations in ASD and ID form a central connected network. Genes with de novo truncating mutations (red nodes) or selected missense mutations (blue nodes) in four ASD exome studies and two ID exome studies are connected using experimentally derived PPI data from StringDB. Only medium- and high-confidence experimental interactions are shown, though we note that these may not always represent local interactions protein-protein interactions or interactions within the same subcellular compartment. Peripheral nodes (lighter shades) represent genes with additional truncating de novo mutations, which are separated from the central network by only a single node (white nodes; for this analysis we excluded SUMO1/SUMO2 and UBC, which are highly connected but nonspecific nodes).   Interestingly, Gilman et al., using a novel probabilistic framework (NETBAG) in conjunction with CNV data from SSC families, highlighted several genes and pathways with remarkable premonition and overlap to those found in the present exome-based studies (Gilman et al. 2011). In particular, their model showed enrichment of the 
 21 
canonical WNT pathway, postsynaptic complexes, and dendritic spine development (e.g., DLG4, SYNGAP1) and several proteins involved in chromatin remodeling, including BAZ1B and SMARCA2, both of which interact with the central nodes of the chromatin remodeling network (Figure 1.3).   The WNT pathway and chromatin remodeling modules of the network are linked by interaction between CHD8 and CTNNB1/Beta-catenin. Both of these proteins play important roles in neural development and growth: Beta-catenin, via downstream WNT pathways, influences neuronal migration, polarity and synaptogenesis (Salinas and Zou 2008), and constitutive overexpression of beta-catenin in mice results in macrocephaly (Chenn and Walsh 2003). CHD8 negatively regulates beta-catenin via direct binding and, furthermore, downregulates beta-catenin responsive genes by recruitment to their promoter regions (Thompson et al. 2008). Strikingly, ASD cases with truncating mutations in CHD8 have significant macrocephaly (O'Roak et al. 2012a), while all three cases with truncating mutations in beta-catenin have microcephaly (de Ligt et al. 2012; Bernier et al. 2014). These reciprocal phenotypes suggest that CHD8 and beta-catenin form a regulatory network that controls head size by altering neuronal migration and growth during development (Figure 1.4). Other proteins with de novo mutations in this network include TBL1XR1, which binds beta-catenin (Cadigan 2008), and DLL1, which is expressed in neural progenitor cells and part of the Delta/Notch signaling pathway (Barton and Fendrik 2013). 
 22 
 Figure 1.4: CHD8 and Beta-catenin/CTNNB1 putative regulatory network for head size. a) Truncating mutations in CTNNB1 (red arrows) and CHD8 (blue arrows) are found in patients with small and large head circumference, respectively. Gray histogram represents background distribution of age- and sex-corrected head circumference Z-scores for 2,446 probands from the SSC. (The exact head circumference for one case [marked with *] with clinically reported microcephaly could not be determined, so Z-score was estimated at ?2.0, or the clinical threshold for microcephaly). b) A putative regulatory model of head growth where CHD8 negatively regulates CTNNB1 (Thompson et al. 2008); CTNNB1 promotes head growth and constitutive over-expression of CTNNB1 in mice results in macrocephaly (Chenn and Walsh 2003).  Convergence on a second common pathway?chromatin remodeling?has primarily been driven by overlap between genetic syndromes and de novo mutations in sporadic ASD and ID. As discussed, CHD8 possesses ATP-dependent chromatin remodeling activity and directly interacts with CHD7 (Batsukh et al. 2010), which is responsible for CHARGE syndrome, a complex syndrome in which up to two-thirds of patients have been found to have ASD (Betancur 2011). Several de novo missense mutations in ASD cases have been noted in genes encoding for chromodomain helicase proteins, including CHD7 and CHD3, and a de novo frameshift in CHD2 was found by Rauch et al. in an ID case (Rauch et al. 2012). A second syndrome, Coffin-Siris syndrome (OMIM 135900), characterized by ID and severe speech delays, was recently attributed to truncating 
CHD8 CTNNB1 Head Growth
Fre
que
ncy
 (P
rob
and
s)
Head Circumference (Z-score) 
*
CTNNB1
mutations
CHD8
mutations
a.
b.
Figure 4
 23 
mutations or disruptive CNVs in ARID1B (encoding a subunit of the SWI/SNF chromatin remodeling complex; Santen et al. 2012), and one de novo frameshift of ARID1B was found in a sporadic ASD case (O'Roak et al. 2012b). Additional disruptive de novo mutations recognized in ASH1L, KDM6B, and MLL5 suggest that the chromatin remodeling activity of these proteins may be an underlying pathway implicated in ASD and ID (Iossifov et al. 2012). Finally, we note that mutations in KANSL1 (n? KIAA1267), a histone acetyltransferase with similar p53 regulatory activity to CHD8, were recently found to underlie 17q21.31 microdeletion syndrome, in which ID is a characteristic feature (Koolen et al. 2012). However, no mutations in KANSL1 have been found in ASD cases, though this is likely due to exclusion of known clinical syndromes from these cohorts.  In addition to these newly proposed pathways, de novo mutations also highlight the importance of genes with roles in synaptic function and localization?a pathway previously suspected to be disrupted in ASD (Glessner et al. 2009). Many of these genes with de novo mutations form a closely related network of postsynaptic proteins, including the GTPase activating protein SYNGAP1, NMDA receptor subunits GRIN2B and GRIN2A, the scaffolding proteins DLG4 and CASK (the underlying mutation in CASK syndrome, OMIM 300749), and NRXN1, which has been previously associated with ASD (Kim et al. 2008). In conjunction with TBR1, CASK also transcriptionally activates several known neurodevelopmental genes, such as RELN, a gene with critical roles in neuronal development, synaptogenesis, and plasticity (Wang et al. 2004). Finally, this pathway is closely linked to SHANK3, a previously identified ASD protein with up to 1% mutation frequency in ASD cases (Durand et al. 2007; Moessner et al. 2007), although no mutations in this gene have been identified in the six studies presented here. While the reasons for this are not fully clear, it is likely that the high GC content of the gene impedes current short-read sequencing platforms.  Interaction networks (Figure 1.3) can also suggest novel targets for mutation screens or functional studies. For example, while DLGAP1 plays a central role in connecting the ?Synaptic Function? component to beta-catenin, no mutations have been observed in 
 24 
DLGAP1. Similarly, SMARCA4 connects BRWD1 to the in-network ADNP protein. These proteins, as well as other ?nearby? proteins suggested by PPI networks, can provide novel targets for mutation screens and deeper functional/pathway study. It is likely that sequencing studies of patients will identify novel candidates for PPI networks, creating a reiterative process by which networks and genetics mutually inform.  Despite their widespread role in the current study of ASD and ID, PPI networks have several important limitations. First, protein interactions are difficult to assay experimentally and often are not at a proteomic scale, resulting in false negatives and false positives in databases. In addition, the extent to which the temporal and spatial nature of interactions is captured also limited, and in our network we do not distinguish between different interaction types (regulatory or physical) or cellular compartments. For example, while CASK binds NRXN1 presynaptically (Fairless et al. 2008), binding to the transcription factor TBR1 is in the nucleus (Hsueh et al. 2000). Second, although our PPI network only uses experimentally verified interactions, the impact and weight of interactions can vary considerably for different nodes, especially for ?hub? nodes which can interact with hundreds of other proteins. Finally, current PPI networks do not take into account the functional impact of mutations on the proteins or the interactions themselves.  1.9 Assaying copy number variation Recent advances in technology have made possible genome-wide discovery of rare CNVs and the estimation of copy number for copy number polymorphisms (CNPs). Most commonly, array comparative genome hybridization (array-CGH; Pinkel et al. 1998) or single nucleotide polymorphism (SNP) microarray platforms (Peiffer et al. 2006) have been used to interrogate the copy number of thousands to millions of positions within the genome. Briefly, these technologies work by hybridizing fluorescent-labeled DNA to targeted oligonucleotide probes on a glass slide and use a scanner to measure the hybridization intensity of each fragment. Copy number variation increases or decreases this intensity, either relative to a control sample (for array-CGH) or in absolute units (for 
 25 
SNP microarray platforms); a CNV can be inferred (?called?) from the presence of multiple nearby probes with increases or decreases in signal.   However, microarray-based technology has several significant limitations. First, the minimum size of a CNV is limited by the number of probes printed on the microarray and further by the fact that generally at least 20 probes are required to distinguish a true CNV from random noise. Thus, even with high-resolution microarrays of 1 million or more probes, studies have typically examined CNVs greater than 50 or 100 kbp in size. A second limitation of microarray platforms is their poor performance within segmental duplications in the human genome (Bailey et al. 2002; Sudmant et al. 2010). Thus, the majority of studies to date have excluded the analysis of duplicated and CNP loci. The advent of whole-genome sequencing using ?next-generation? short-read technology has resulted in the development of several additional methods for CNV/CNP discovery and genotyping. Two broad approaches exist: the first are read-depth methods which leverage the relatively uniform distribution of sequenced reads throughout the genome (the ?read-depth?) to estimate copy number; these methods can estimate the copy number of genomic segments as small as 1 kbp (Alkan et al. 2009; Yoon et al. 2009; Chiang et al. 2008). The second class of CNV discovery tools is based on the ?paired-end? alignment of sequences from the two ends of fragment of DNA of known approximate size: when the ends of the fragment align to distant genomic loci, a deletion in the fragment can be inferred, and similar rules can be used to infer insertions and (unique to this method) inversions (Hormozdiari et al. 2009; Korbel et al. 2009; 2007). Paired-end approaches are able to find very small CNVs less than 1 kbp in size, though they can suffer from a high false positive rate and require a large amount of sequence data as well as computational resources.  1.10 Thesis goals This introduction and review of published literature has emphasized the role of de novo variation in ASD and ID and has cast light on how genes recurrently affected by de novo loss-of-function mutation in probands are part of functionally important and connected pathways. However, the rarity and effect size (versus unaffected siblings) of de novo 
 26 
mutations in ASD cannot fully explain the overall heritability of the disorder, suggesting that additional underlying genetic etiologies, mechanisms and effects also play a role in ASD. Furthermore, given the extensive heterogeneity of both genotypes and phenotypes seen in ASD, it is likely that multiple genetic etiologies underlie ASD risk in families, and that any explanation of the genetics of ASD is incomplete if based on de novo mutations alone.  Therefore, the first aim of this dissertation is to fully explore the spectrum of mutations that contribute to ASD risk. To do this, I develop novel methods that leverage exome sequencing data to assay small CNVs, especially those under 100 kbp that have been missed using traditional assay methods such as array-CGH. Chapter one describes CoNIFER (copy number inference from exome reads), a novel computation method that uses exome read-depth data to infer copy number. First, I establish the sensitivity and precision of the method compared to standard high-resolution microarrays, and by forward validation of novel events. I demonstrate that exome-based detection of rare CNVs has up to 10-fold higher sensitivity for small events less than 10 kbp. In addition, I show that exome-based copy number correlates with true copy number for regions with 0-10 copies, suggesting that exome-based methods can be used for quantitative assessment of copy number. Knowledge of these smaller CNVs, many of which affect only a few exons of a single gene, fills out the spectrum of mutations, from the smallest single base-pair mutations through small and large CNVs and provides a more complete picture of the genes and pathways involved in ASD.   The second aim of this dissertation is to understand the role of rare inherited mutations in ASD. In chapter two, I apply CoNIFER to exome data from 411 ASD families and examine the pattern of inherited variation between the ASD-affected probands and their unaffected siblings. I show that rare inherited CNVs confer increased risk for ASD, even in the context of ?sporadic? autism. These inherited variants are correlated with specific ASD-related phenotypes, including IQ and social ability. Finally, inherited variants found to be associated with ASD also are more likely to affect highly brain-expressed genes and are more likely to be part of existing disease and disorder pathways.  
 27 
 In chapter three, I expand the hypothesis that inherited events contribute to ASD risk to SNVs. Using an expanded resource of nearly 2,400 families, I demonstrate that ultra-rare SNVs carry pathogenic risk for ASD similar to that seen in CNVs and find that this risk is specifically related to loss-of-function SNVs in genes that do not tolerate functional (i.e., deleterious) variation in control populations. These SNVs also mirror CNVs in that they are correlated with specific phenotypes, in particular the IQ of ASD individuals.  This dissertation describes the integration of multiple genetic etiologies (de novo and inherited) and types of mutations in the context of simplex ASD. First in chapter two, and more fully in chapter three, I utilize a logistic regression model of ASD risk to directly compare the effects of de novo and inherited variation, for both CNVs and SNVs. These models suggest that all four combinations contribute an independent statistical component of risk for the development of ASD.   Finally, I also examine how multiple types and classes of mutation converge on specific genes and interactions and suggest new ASD candidate genes as well as an integrated etiology for risk of autism (Chapter III) . In particular, convergence of de novo SNVs and inherited CNVs suggests that CSMD1, a complement control protein, may be an ASD risk factor. This gene is particularly interesting in the context of a neurodevelopmental disorder in that it displays strong and specific pan-brain expression, participates in dendritic spine restructuring, and has been implicated in other disorders with neurological basis, such as schizophrenia. I also examine how multiple mutations within one individual may lead to ASD. One case, described in chapter three, has mutations in two previously identified ASD genes that are part of the same complex (NLGN2 and NRXN3). Underscoring the importance of the entire spectrum of genetic mutations, this interaction was discovered only by examination of inherited and de novo mutations, as the NRXN3 mutation was an inherited CNV and the NLGN2 mutation was a de novo SNV.  
 28 
 
II. Copy number variation detection and genotyping from exome sequence data   2.1 Summary While exome sequencing is readily amenable to single nucleotide variant discovery, the sparse and non-uniform nature of the exome capture reaction has hindered exome-based detection and characterization of genic copy number variation. We developed a novel method using singular value decomposition (SVD) normalization to discover rare genic copy number variants (CNVs) as well as genotype copy number polymorphic (CNP) loci with high sensitivity and specificity from exome sequencing data. We estimate the precision of our algorithm using 122 trios (366 exomes) and show that this method can be used to reliably predict both de novo and inherited rare CNVs involving three or more consecutive exons. We demonstrate that exome-based genotyping of CNPs strongly correlates with whole-genome data (median r2 = 0.91), especially for loci with fewer than eight copies, and can estimate the absolute copy number of multi-allelic genes with high accuracy (78% call level). The resulting user-friendly computational pipeline, CoNIFER (copy number inference from exome reads), can reliably be used to discover disruptive genic CNVs missed by standard approaches and should have broad application in human genetic studies of disease.  This chapter has been published: Krumm, N., Sudmant, P. H., Ko, A., O'Roak, B. J., Malig, M., Coe, B. P., et al. (2012). Copy number variation detection and genotyping from exome sequence data. Genome Research, 22(8), 1525?1532. doi:10.1101/gr.138115.112   
 29 
2.2 Introduction Targeted capture and sequencing of coding exons (?exome sequencing?) has revealed common single-nucleotide polymorphisms (SNPs), rare sequence variants, short indels, and breakpoints of structural variation (Ng et al. 2009b; for review see Bamshad et al. 2011), but has been largely refractory to the discovery of copy number variants (CNVs). In contrast to whole-genome sequencing data, exome capture and sequencing results in non-uniform read-depth between captured regions and strong systematic biases between batches of samples. These biases, as well as the sparse nature of the capture, make exome sequencing unsuitable for ?traditional? CNV detection algorithms, such as raw read-depth (Alkan et al. 2009), (Yoon et al. 2009), read-pair alignment (Hormozdiari et al. 2009) or split-read mapping (Karakoc et al. 2011). In this study, we combine read-depth data from exome sequencing with singular value decomposition (SVD) methods to discover rare CNVs and genotype known copy number polymorphic (CNP) regions from eight HapMap samples and 122 autism spectrum disorder (ASD) mother-father-proband trios sequenced as part of separate study to primarily discover de novo SNPs and indels (O'Roak et al. 2011). We validated the discovered events using orthogonal datasets, including whole-genome sequencing and tiling array comparative genomic hybridization (array-CGH) data for HapMap samples, and SNP array and quantitative PCR for events discovered in the autism trios. In light of the tens of thousands of exomes already sequenced, we believe this method will have widespread application for the discovery and association of both rare and common copy number variation in disease, and will complement existing methods to discover single-nucleotide variation from exome-sequencing data.  2.3 Methods Samples and datasets We used exome sequencing data from eight HapMap individuals (NA12878, NA15510, NA18507, NA18517, NA18555, NA18956, NA19129, and NA19240; available in the NCBI Sequence Read Archive under accession SRA039053 or SRP007298) and exomes from 122 mother-father-proband ASD trios (for 366 total individuals). In addition, we utilized exome data from 533 individuals from the NHLBI Exome Sequencing Project 
 30 
(ESP) as a means to derive accurate estimates of the distribution of sequence coverage at each exon. Underlying exome sequence data are available from the Short Read Archive for the HapMap exomes (SRA039053) and from the dbGaP exchange area (for ASD exomes: phs000482.v1.p1; for ESP exomes: phs000 279.v1.p1, phs000290.v1.p1, phs000291.v1.p1, phs000281.v1.p1, phs000254.v1.p1), with additional cohorts pending; more information at http://evs.gs.washington.edu/EVS/). All exomes were captured using either the Roche NimbleGen EZ Exome SeqCap Version 2 (for ESP samples and ASD trios) or Version 1 (for HapMap samples) in-solution exome capture kits (44 Mbp captured, including 36 Mbp exon target). Short-read sequencing was performed using either an Illumina HiSeq2000 platform or an Illumina GAII, with a mix of 50 bp and 76 bp paired-end reads (Table 2.1; see Supplementary Note for additional details).   
Cohort # Samples Capture Version Passed QC Average Number of Mapped Reads Average Number of Mappings 
HapMap 8 Roche NimbleGen EZ Exome SeqCap Version 1 8 138,593,483 158,568,475 
Autism Trios 122 probands and 244 parents Roche NimbleGen EZ Exome SeqCap Version 2 
366 119,461,629 143,574,053 NHLBI Exome Sequence Project 613 533 127,125,719 152,787,950  Table 2.1: Cohorts analyzed Mapping Sequence reads were divided into non-overlapping 36 bp constituents and mapped to exons and the 300 bp flanking sequence of the repeat-masked hg19 reference sequence using mrsFAST (Hach et al., 2010), allowing for up to two mismatches per 36 bp. We calculated RPKM (reads per thousand bases per million reads sequenced; Mortazavi et al. 2008) values for 194,080 exome capture targets (see Supplementary Note) and excluded from further analysis 3,964 probes with a median RPKM of less than one, as these probes were likely failed or improperly targeted.  
Singular value decomposition RPKM values were transformed into standardized z-scores (termed Z-RPKM values) 
 31 
based on the mean and standard deviation across all analyzed exomes and organized into an exon-by-sample matrix (X). Using SVD, we decomposed X into three matrices: X = USVT In order to remove the strongest k components, we set S1...Sk to zero to form S?, and then recalculate X as the dot product of U, S? and VT (Fig. 2.1). We termed these final values SVD-ZRPKM values?each of which represents the normalized relative copy number of an exon in a sample.  
Validation For this study, we specifically selected samples that had been subjected to extensive prior experimental validation. Copy number variation of the eight HapMap samples was previously assessed by whole-genome shotgun sequencing and targeted clone sequencing (Kidd et al. 2008), and data from the 1000 Genomes Project (Sudmant et al. 2010). Accurate estimates of copy number for duplicated loci were determined experimentally by single-channel array-CGH data and qPCR (Sudmant et al. 2010; Campbell et al. 2011). CNV data for the 366 autism exomes was obtained by SNP Microarray (Illumina 1M) and by targeted array-CGH as described previously (O'Roak et al. 2011; Sanders et al. 2011). 
CoNIFER implementation We implemented our algorithm as a collection of python programs under the name CoNIFER (copy number inference from exome reads), available at http://conifer.sourceforge.net. CoNIFER can accept files containing BAM alignment files or RPKM values from samples and outputs a number of charts (e.g., scree plots), a text file containing calls, and images corresponding to each call. Additionally, the raw SVD-ZRPKM values can be saved, facilitating genotyping of CNP loci and further analysis. The computational resources to run CoNIFER are lightweight. BAM-format files can be converted into read-depth files in approximately 20 to 30 minutes; then, given read-depth or read-count values for targeted exons or probes, the CoNIFER and the SVD-normalization can be run with minimal hardware requirements (e.g., 500 samples processed in less than one hour using 4 GB or less of memory).  
 32 
2.4 Results Our method exploits differences in sequence read-depth from exome datasets to predict copy number variation (Fig. 2.1). We focused on characterizing two distinct classes of genetic variation: rare CNVs and CNPs. The former are individually rare in populations (less than 1% frequency) and are predominantly found in unique regions of the genome. In contrast, CNPs are common, both between individuals and between populations, and are frequently associated with segmental duplications (Girirajan et al. 2010). The absolute copy number of multi-allelic CNPs embedded in segmental duplications ranges widely from zero to more than 40 copies, and this variation is typically referred to as multi-copy or multi-allelic (Sudmant et al. 2010). Crucially, as the total copy number of CNPs is estimated as the sum of both haplotypes (i.e., the copy number is not phased), independent re-assortment of parental haplotypes obfuscates the pattern of inheritance for CNPs between parents and offspring. Moreover, our approach utilizes relative read-depth values for each exon; for exons with highly diverse copy number across a population, the population standard deviation will be high as well, thus shrinking the range of relative values observed at that exon. In effect, this makes a threshold-based discovery algorithm less sensitive for CNPs and exons of high copy number diversity, but does not impact genotyping of these CNPs and exons when their location is known. Because of these fundamental differences, we chose to pursue the characterization of CNVs and CNPs differently: for CNVs, discovery within the exome data was unbiased by location, whereas for CNPs, we used a priori information regarding the location of copy number variable loci. Furthermore, when estimating the precision for rare inherited CNVs, we excluded calls within segmental duplications or CNP loci, as the inheritance pattern for these loci cannot be determined without phased copy number information.   For discovery of rare CNVs (Fig. 2.1, Fig. S1), we removed between 12 and 15 (k) singular values, a number we empirically adjusted based on the inflection point of the scree plot (Fig. S2). We set discovery thresholds at -1.5 or +1.5 SVD-ZRPKM for rare deletions and duplications, respectively, and required at least three exome probes to exceed the threshold (Supplementary Note). For genotyping CNP regions in the genome, we opted to remove only five components in order to prevent the SVD algorithm from 
 33 
removing bona fide signal from highly CNP loci. The genotype value was calculated by determining the average of the SVD-transformed ZRPKM values for the exons/targets in the region of interest. As the output from our algorithm provides a relative value, we estimated absolute copy number from the SVD-ZRPKM values via two methods: 1) by using population frequency information of copy number states (Campbell et al., 2011) and 2) by creating a standard curve using copy number estimated from whole-genome sequencing data of matched HapMap samples (Sudmant et al. 2010); Supplementary Note).  An overview of our method is presented in Figure 2.1. Briefly, sequence reads from each exome were mapped to exons using mrsFAST (Hach et al. 2010), which allows for reads given a set edit-distance to map to multiple locations. Similar to RNAseq data analysis, we calculated RPKM values and transformed these into standardized z-scores (termed Z-RPKM values) based on the mean and standard deviation across all analyzed exomes. These were subsequently organized into an exon-by-sample matrix (X). Next, we implemented an SVD algorithm to overcome the systematic biases that pervade exome capture reactions. Since ?singular values? can be used to examine the relative amount of contributed variance from each component, we used a plot of these singular values, known as a ?scree plot? to identify this experimental noise. Our analysis reveals that the first 10?15 components disproportionally contribute to the variance of the data (Fig. S2). Given that we expect biological variation, in the form of rare CNVs as well as common CNPs, to be a minor contributor to the overall variance of the exon-by-sample matrix X, we formulated the basis of our algorithm by eliminating these strongest components. We selected the number of components for elimination based on the inflection point of the scree plot.  
 34 
 Figure 2.1: Method overview and CNV discovery (a) Exome sequencing reads from FASTQ files were divided into non-overlapping 36 bp constituents and (b) aligned to targeted regions, allowing for up to two mismatches per 36 bp alignment. (c) For each exon or targeted region, we calculated RPKM values and then transformed these into ?ZRPKM? values based on the median and standard deviation of each exon across all samples. (d) ZRPKM values were inputted into the SVD transformation, where we removed the first 12-15 singular values. Finally, a centrally weighted 15-exon average was passed over the SVD-ZRPKM values in order to reduce false positives, and a ?1.5 SVD-ZRPKM threshold is used to discover CNVs. Final image (e) shows ZRPKM values from 1,000 consecutive exons on chromosome 16, plotted for 533 ESP exome background samples (black traces) and NA18507 (pink trace). Blue bar corresponds to a rare duplication in NA18507 at the METTL9/OTOA locus at chr16p12.2 that was validated by SNP microarray CNV analysis.  The number of components selected for removal is an important parameter in our algorithm and warrants further consideration. Removing too few components leaves the algorithm at risk for residual systematic bias; conversely, removing too many components will begin to remove bona fide signal from exomes, especially at large, common segmental duplications within which a large proportion of analyzed exomes contribute strongly altered read-depth signal. However, individually rare CNVs do not contribute significantly to the overall variance of the sample-by-exon matrix, thus making 
 35 
it unlikely that removing the first 12?15 components of the SVD decomposition results in loss of signal for these rare events. The SVD method depends on concurrently analyzing many samples, so that systematic noise becomes evident and can subsequently be removed. For the eight HapMap samples, we included an additional 533 ESP samples and removed 12 components. For analysis of the ASD trios, we combined the 122 trios (366 samples) with 366 randomly selected samples from the ESP dataset and removed 15 components. In our comparison of mrsFAST and BWA mappings, we used 492 ESP samples (for which BWA mappings were available) and the eight HapMap samples. Overall variance was lower in the BWA-based mappings, thus only six components needed to be removed during the SVD normalization. 
Rare CNV discovery To discover rare CNVs, we initially restricted our analysis to events where there was a change in copy number state for three or more consecutive exons. In order to assess the precision of our method, we intersected our exome-based deletion and duplication calls from five of the HapMap control individual genomes that had been previously analyzed using high-resolution array-CGH (2010). Of the 32 detected events (Table S1), seven were rare CNVs and 25 were CNPs; after intersecting with the reference set and requiring 10% reciprocal overlap, our method yielded 6/7 (86%) precision for rare CNVs and 16/25 (64%) for CNPs (Table 2.2). We also estimated sensitivity from this comparison: starting with 486 high-resolution array-CGH calls from the five HapMap samples that overlapped at least three exome probes, we restricted calls to those in unique/diploid regions of the genome (i.e., outside of segmental duplications, duplicated genes, and regions of somatic variation such as the HLA locus; Fig. S4). In this set of 41 calls (Table S2, lines annotated ?Rare? and ?CNP?), our algorithm identified 5/5 rare CNVs, but only 3/36 (8%) of CNPs (example calls in Fig. S7).  
 36 
  Rare CNVs Common CNPs Total ? 10% Reciprocal Overlap 6/7 (86%) 16/25 (64%) 22/32 (69%) 
Any Overlap 7/7 (100%) 19/25 (76%) 26/32 (81%) 
No Overlap --- 6/25 (24%) 6/32 (19%) Table 2.2: Precision of exome-based CNV calls in HapMap samples  The relative paucity of rare CNVs in the HapMap cohort prompted us to estimate the precision of our method for rare CNVs using a larger set of 122 ASD trios. In our initial analysis, we applied the same filters for unique/diploid calls as above (Fig. S3), resulting in 191 calls among 97 probands. We identified eight putative de novo events (6.6% incidence; Table S3). For six of these, we were able to validate the event using available Illumina SNP microarray data as well as targeted array-CGH experimental data (Sanders et al. 2011). We could not confirm two de novo events, both of which were single-exon duplications of FAF2 (data not shown). Next, we considered inherited events using our exome read-depth analysis and found that 128 events in the probands were inherited from either the mother or the father (Table S3a). For 117/128 (91.4%) of these events, the SVD-RPKM values of both the proband and the parent exceeded the detection threshold (?1.5). However, for 11/128 (8.6%) of these calls, the SVD-RPKM values between proband and parents were just below the deletion or duplication threshold required for calling, and the inheritance status was determined by manual inspection.   Finally, inspection of the SVD-RPKM values for remaining 55 events (14 loci; Table S3b) revealed that these events strongly resemble CNP sites or mapped to genes for which processed pseudogenes exist. An example of such a locus can be seen at the DAZL gene (Fig. S8: sample 13517.p1; chr3:15,636,820-17,640,105). The lack of phased copy number information precludes these loci from inheritance-based precision analysis (as the independent assortment of haplotypes can alter copy number in the offspring) and we, thus, excluded them from the precision analysis of rare CNV events. In total, we found 
 37 
128 rare inherited events and validated 6/8 rare de novo events (excluding processed pseudogenes and duplicated exons), leading to a precision for the discovery of rare CNVs in the autism cohort of 134/136 (98.5%; Table 2.3). Example calls from this experiment are shown in Figure S8.   Total Validated de novo CNVs 8 6 validated by SNP microarray 
Inherited CNVs 127 116 have call made in parents (exceeded threshold) 11 manually inspected SVD-ZRPKM values revealed inheritance Overall Precision 133/135 (98.5%)  Table 2.3: Precision of exome-based CNV calls in autism trios  We also gauged the accuracy of the exome-based CNV discovery against previously generated Illumina SNP microarray experiments (Sanders et al. 2011). SNP microarray data was available for 108 of 128 predicted inherited events and all eight predicted de novo events. We found 70% (76/108) of the inherited and 75% (6/8) of the de novo CNV events were confirmed by the SNP microarray (Table S3a). Given the high concordance rate of exome-based events within trios (>98%), the lower overlap vis-?-vis SNP microarray experiments likely reflects platform-specific differences in resolution and sensitivity and not an increased false-positive rate in the exome data. 
Genotyping copy number polymorphic variants We took two approaches in assessing our method?s ability to determine the copy number of CNPs: 1) a relative correlation approach between the continuous SVD-ZRPKM values and whole-genome-sequence derived copy number estimates, and 2) an unsupervised clustering approach of exome-based genotype values in order to derive absolute copy number states for CNP loci. 
 38 
 Figure 2.2: CNP locus genotyping of RHD and C4A (a) SVD-transformed values for exons for the Rhesus deletion factor locus (RHD/RHCE) show distinct copy number states across both paralogous genes. (b) Histogram of average SVD-ZRPKM values for ESP dataset (533 individuals) and seven HapMap samples. Clustering was performed using an unsupervised algorithm (Supplementary Note). (c) Correlation between SVD-ZRPKM genotype values (y-axis) and absolute copy number estimate (x-axis) based on whole-genome read-depth for seven HapMap samples and experimentally validated by array-CGH. (d-f) Similar to above, for C4A locus.  For the first approach, we selected 62 previously identified CNP loci and genes (from (Sudmant et al. 2010); Table S4) and calculated the copy number of each locus based on whole-genome read-depth data using previously described methodology, which has been experimentally validated using single-channel array-CGH intensity data9. For each locus, we correlated the estimated whole-genome copy number with the average of SVD-ZRPKM values for the exons in the locus (Fig. 2.2). The median r2 value between exome-based and whole-genome-based genotyping at each locus was 0.91 (Fig. 2.3a; Table S4), indicating a high degree of reliability between exome and whole-genome copy number estimation for CNP loci. Furthermore, after stratifying the results by the median copy number of each locus, we found that for loci with median copy number of eight or less, 32 of 39 loci (82%) were highly correlated (r2 value ? 0.9), but for loci with median copy 
 39 
number greater than eight, the median locus r2 was only 0.32.   Secondly, we assessed the accuracy of our approach in determining the absolute copy number of common CNPs. We leveraged available genotype information for seven of the HapMap samples in this study across 43 autosomal CNP loci previously studied by Campbell and colleagues (Table S5; Campbell et al. 2011). For each locus, we again used the locus-average of SVD-ZRPKM values and clustered these genotype values using an unsupervised clustering algorithm (Supplementary Note). Each cluster was then assigned the most likely copy number based on the most common copy number state previously identified. Using this unsupervised method, we correctly predicted absolute copy for 235/301 (78%) calls (Table S5) with an overall absolute genotype correlation across all 43 CNP loci of r2 = 0.74.  
  Figure 2.3: Genotyping accuracy across 62 CNP loci (a) Distribution of correlation coefficients of SVD-ZRPKM to whole-genome copy number estimate across 62 CNP loci for seven HapMap samples, split by the median copy number of each locus. (b) Correlation between SVD-ZRPKM score and relative (by median and standard deviation) whole-genome copy number estimate for 39 loci with ? 8 copies; and (c) for 23 loci with > 8 copies. Whole-genome read-depth copy number estimates for these specific sites and genomes were orthogonally validated using single-channel intensity data from previous array-CGH experiments.  
 40 
Our algorithm uses relative read-depth values (introduced both by the Z-transformation and the SVD algorithm itself) in order to overcome significant batch biases in exomes, thus sacrificing the genome-wide linear model of read-depth and copy number exploited by whole-genome structural variation discovery algorithms. Nonetheless, the two approaches presented above can be used to ?anchor? the relative SVD-ZRPKM values to absolute copy number. First, the strong r2 correlation for many loci can be exploited as a ?standard curve? for each locus, and the absolute copy number for exome samples can be estimated. Alternatively, SVD-ZRPKM values can be clustered (Supplemental Note) into copy number groups, thus facilitating absolute copy number estimates without the use of whole-genome data. 
Comparison with other methods As we generated our read-depth estimates from mrsFAST-based alignments, we were interested to see how our method would perform using BWA-based alignments. The BWA alignments were generated using commonly used parameters and filtering steps suitable for SNP-centric analyses, including removal of reads with multiple mappings (Supplementary Note). We calculated RPKM values from these BWA alignments for the HapMap samples and a subset of the ESP exomes. We observed that signal for rare deletions and duplications in the HapMap samples were attenuated (Fig. S5), and that the median signal-to-noise ratio for the seven rare deletions and duplications was 58% lower for the BWA-based mappings (Table S6; Supplementary Note). In addition, we genotyped 47/62 loci in Table S4 and found a striking difference in the correlation between BWA-based mappings (median r2 = 0.36) and mrsFAST-based mappings (median r2 = 0.92). The remaining 15/62 loci did not have any probes with adequate BWA read-depth, making them intractable and false negatives by this approach. The difference in correlation with mrsFAST mappings was mostly notable for loci with copy numbers ranging between 7 and 12 (Fig. S6b). These data highlight the importance of considering reads with multiple mappings, especially for loci with increased copy number (e.g., the LRRC37A3 locus; Fig. S6c). These differences, however, do not solely reflect differences between the alignment algorithms, but rather of the entire alignment and post-processing pipeline.  
 41 
Finally, we compared our algorithm to ExomeCNV (Sathirapongsasuti et al. 2011), which is designed to detect copy number aberration in the context of cancer using closely matched tumor-normal pairs of exomes. Nevertheless, we were interested to see if ExomeCNV could be used to detect germline variation. We analyzed (using default settings; see Supplementary Note) four HapMap exomes with NA19240 as the reference and compared the results to a validated call set from these genomes (2010). Overall, ExomeCNV predicted 450 CNVs, of which only 63 (14%) had more than 10% reciprocal overlap with the validated call set. In contrast, our algorithm identified 24 calls among these four samples, of which 21 (88%) overlapped the validated call set. We note that ExomeCNV uses uncalibrated read-depth to estimate copy number, and, depending upon batch effects, this can result in the algorithm reporting a significant fraction of the exome as non-diploid (Fig. S9). Furthermore, similar to the BWA-based alignments (see above), ExomeCNV has limited dynamic range in CNP loci and duplicated genes: the average r2 correlation across tested CNPs was 0.57 (compared to our algorithm, r2 = 0.92; Fig. S10).   2.5 Discussion We have outlined a method for making read-depth data from exomes amenable to rare CNV discovery, as well as copy number genotyping of CNP loci. We used SVD normalization to overcome a host of coverage biases introduced by the capture and sequencing of exomes. Our method allows for differing sample preparations and capture reactions to be integrated into the same experiment, provided each ?batch? is sufficiently large (n ? 8). This includes correct normalization of the X chromosome, such that deletions and duplications can be assayed regardless of the sample?s sex. Additionally, our method can integrate exomes captured with different exome capture target designs: the eight HapMap exomes were captured using the Roche NimbleGen SeqCap EZ Version 1, while all other exomes in our experiments were captured using the SeqCap EZ Version 2 capture kit.  Remarkably, we find that sufficient dynamic range response remains to accurately predict the copy of duplicated genes up to eight copies. The upper limit of this response is likely an effect of the stoichiometry of the exome-capture reaction and we suggest that this may 
 42 
be improved simply by adjusting the concentrations and targets of exome-capture platforms. Another important consideration in interpreting exome read-depth data is the presence of polymorphic processed pseudogenes. In our study of autism trios, we found that 14% (26/191) of events correspond to changes in the copy of processed pseudogenes residing elsewhere in the genome, often in segmental duplications. Such events have been difficult or impossible to discover using traditional SNP microarray approaches, as the probes for these assays often do not explicitly target the coding exons themselves. While such events may be easily inferred based on the absence of intronic sequence, a comprehensive catalog of polymorphic processed pseudogenes will improve detection of bona fide exonic deletions and duplications.   We envision a number of algorithmic improvements. Although using mrsFAST mappings both increases the signal-to-noise ratio for rare CNVs and improves genotyping accuracy for CNPs, these mappings often cannot distinguish between paralogous genes. By restricting the RPKM calculation to exons and regions that contain paralog-specific single nucleotide variants (Sudmant et al. 2010), we hope to be able to extend our method to genotype duplicated genes in a paralog-specific manner. We also expect to lower the minimum number of exons required to detect a CNV. We applied our method to genotyping single exons (such as the third exon of GHR; Santos et al. 2004) and found the SVD-ZRPKM values robustly distinguished different copy number classes. By developing a discovery set of copy number polymorphic exons, genes, and loci?as well as their copy number states in populations?future disease-association studies will be better informed. Finally, though array-based technologies have described many CNP-disease associations (Girirajan et al. 2010), discovery of loci has been limited to those with low median copy number, and our approach here will be able to examine CNP loci with higher copy number. Using our approach with large clinical cohorts currently undergoing exome sequencing, we expect to find new disease associations with rare CNVs, CNP loci, and paralog-specific copy number of known CNP loci.  
 43 
ACKNOWLEDGEMENTS  We thank S. Ng, S. McGee, and T. Brown for helpful comments in the preparation of this manuscript, M. State and the Simons Simplex Collection Genetics Consortium for providing Illumina genotyping data, and A. Schachtel for suggesting the CoNIFER name. This work was supported by NIH grants HD065285 (E.E.E.), HHSN273200800010C (D.A.N.), and HL102926 (D.A.N.) and the Simons Foundation Autism Research Initiative (E.E.E.). E.E.E. is an Investigator of the Howard Hughes Medical Institute.  
 44 
III. Transmission disequilibrium of small CNVs in simplex autism   3.1 Summary We searched for disruptive, genic rare CNVs among 411 families with sporadic autism spectrum disorder (ASD) from the Simons Simplex Collection using available exome sequence and CoNIFER (copy number inference from exome reads). Our approach yielded increased sensitivity for smaller genic rare CNVs compared to high-density SNP microarrays (~2X higher yield), especially for CNVs smaller than 20 kbp. We find that affected probands inherit more CNVs than their siblings (453 vs. 394, p=0.004; OR=1.19), and these affect more genes (921 vs. 726, p=0.02; OR=1.30). These smaller CNVs (median size 18 kbp) are transmitted preferentially from the mother (136 maternal vs 100 paternal, p = 0.02) although this bias occurs irrespective of affected status. The excess burden of inherited CNVs among probands is driven primarily by sib-pairs with discordant social behavior phenotypes (p < 0.0002, measured by SRS score), in contrast to families where the phenotypes are more closely matched or less extreme (p > 0.5). Finally, we found strong enrichment for brain-expressed genes unique to probands, especially in the discordant SRS group (p = 0.0035). In a combined risk model, our set of inherited CNVs, de novo CNVs and de novo SNVs all independently contributed to the risk of autism (p < 0.05). Taken together, these results suggest that small transmitted rare CNVs play a role in the etiology of simplex autism. Importantly, the small size of these variants aids in the identification of specific genes as additional risk factors associated with ASD.  This chapter has been published: Krumm, N., O'Roak, B. J., Karakoc, E., Mohajeri, K., Nelson, B., Vives, L., et al. (2013). Transmission Disequilibrium of Small CNVs in Simplex Autism. American Journal of Human Genetics, 93(4), 595?606. doi:10.1016/j.ajhg.2013.07.024  
 45 
3.2 Introduction Discovering the mutations and the genes responsible for autism spectrum disorder (ASD) requires an assessment of the full-spectrum of genetic variation within families including both de novo and inherited events. There is compelling evidence that a diverse range of de novo mutations play an important role, including copy number variants (CNVs; Levy et al. 2011; Sanders et al. 2011; Sebat et al. 2007; Glessner et al. 2009; Pinto et al. 2010), single nucleotide variants (SNVs) and insertions and deletions (indels) (Iossifov et al. 2012; O'Roak et al. 2012b; Sanders et al. 2012; O'Roak et al. 2011). However, taken together, de novo variation does not fully explain the genetic etiology of ASD: only ~8% of probands carry a de novo CNV and ~10-20% carry a pathogenic de novo SNV or indel. Many of these mutations likely play a pathogenic role in the development of ASD, especially in the context of sporadic (or ?simplex?) ASD. However, the heritability of ASD is estimated to be between 50% and 90% (Bailey et al. 1995; Hallmayer et al. 2011)?much higher than the to-date explained fraction of disease?suggesting that additional genetic factors contribute to the etiology of ASD.   The prevalence of rare CNVs smaller than 50 kbp has been underestimated in previous surveys using oligonucleotide microarrays (Levy et al. 2011; Sanders et al. 2011) and their role in ASD has yet to be explored. Such pathogenic events could in principle provide as much specificity as exonic de novo mutations with respect to genes and informative protein networks. Several recent methods based on exome sequencing read-depth data have enabled the discovery of small genic CNVs previously missed by microarray (Krumm et al. 2012; Fromer et al. 2012). In this study, we tested the hypothesis that small genic inherited CNVs also contribute to the genetic etiology of sporadic autism. Several lines of evidence are potentially supportive of this hypothesis, including increased prevalence of the broader autism phenotype (BAP) in parents of affected children (Losh et al. 2008; Davidson et al. 2012), trends for higher burden of extremely rare singly-transmitted CNVs in simplex families (Levy et al. 2011), and enrichment for large CNVs in cases versus unrelated controls (Pinto et al. 2010). In contrast other previous studies which have examined inherited CNVs in ASD found no significant excess of inherited burden in probands with ASD, although these studies were 
 46 
mainly designed to detect de novo CNVs (Sanders et al. 2011).  Here, we present evidence for transmission distortion for smaller CNVs (median size ~18 kbp) by investigating families where both affected and unaffected siblings have been exome-sequenced. The availability of whole-exome sequence data for our samples has the advantage of increased sensitivity for small, genic CNVs affecting two or more exons, as well allowing us to integrate both rare SNV and CNV to develop a model to explain the genetic architecture of ASD.  3.3 Methods CNV Detection from exome sequence data We analyzed exome sequencing data from families ascertained as part of the Simons Simplex Collection (Fischbach and Lord 2010). Underlying FASTQ sequence data was obtained from 391 published ASD quads (O'Roak et al. 2012b; Sanders et al. 2012; Iossifov et al. 2012) and we generated sequence data for an additional 19 unaffected siblings from published trios (O'Roak et al. 2011). The data set include sequence data (median coverage >50x) from 411 families where a proband, unaffected sibling, mother and father (termed quad) all had been sequenced for a total of 1644 samples (see Table S1 and S2 for details). Sequence reads were split into 36mers, and mapped using the mrsFAST alignment program (Hach et al. 2010) to the Nimblegen EZ-SeqCap v2 targets (including 300 bp around each target and allowing two mismatches per 36mer). We used CoNIFER(Krumm et al. 2012) to calculate exon-level coverage and removed systematic bias between samples and targets. Using a custom pipeline (Figures S1 and Supplemental Methods), we 1) segmented our CoNIFER SVD-ZRPKM values using the DNACopy algorithm (Venkatraman and Olshen 2007), 2) minimized false-negatives by a quad-based genotyping method, 3) clustered CNVs into overlapping CNVRs, and 4) removed CNVs found in duplicated or repetitive genomic space. We limited our final call set to inherited CNVs (i.e., transmitted CNVs) that were present in 10 or fewer families (or approximately 1% population frequency), and we excluded CNVs which primarily fell within duplicated or highly polymorphic regions of the genome. We considered a CNV ?rare? if it occurred in 10 or fewer families and a CNV private if it was observed only in 
 47 
one family. Lastly, we did not include CNVs on the X chromosome in any analysis, and all de novo CNVs were excluded from burden analyses except where noted. Throughout this paper, we define ?CNV burden? as the number of rare CNVs per individual. 
Array comparative genomic hybridization  We designed a customized CGH microarray (Agilent SurePrint G3 4x180k CGH microarray; probe density ranging from 125 bp-1 to 5 kbp-1 depending on the size of the event to be validated) and selected 161 CNVs from a subset of 80 samples, stratified by proband/sibling (36 probands and 44 siblings), and by dataset (26 from Iossifov et al., 22 from O?Roak et al., 32 from Sanders et al.; Table S1 and S5). Minimum deletion and duplication thresholds for validation were determined by ROC curve analysis of known positive and negative control CNVs (Figure S3).  
Phenotypic measures and models Social Responsiveness Scale (SRS) was used as a quantitative measure of social deficits(Constantino and Gruber). We had complete phenotype information (SRS for both proband and sibling, and full-scale IQ for proband) for 389 families in this study (Table S7) based on data from the SSC. The probands in this study had a median SRS t-score of 82, significantly higher (i.e., more severely affected) than the median SRS score of our unaffected siblings (45; p < 0.00001, two-tailed paired t-test). We defined mild, moderate and severely affected individuals based on published thresholds (Constantino and Gruber).  
Expression analysis Gene expression data was from the Human U133A/GNF1H Gene Atlas (GEO: GSE1133), comprising 79 human tissues, including 18 nervous system tissues(Su 2004). Expression values were averaged across multiple probes when available. We defined a gene to be expressed in a given tissue if it ranked in the top 5% of all genes for that tissue. To measure enrichment, we compared the fraction of genes unique to either siblings or probands expressed in each tissue, and empirical p-values were calculated by shuffling proband/sibling labels 20,000 times and recomputing tissue-level expression enrichment. We FDR-corrected for 79 tests (i.e., for tissues) and statistical significance was assessed at q < 0.05. 
 48 
 
Combined mutation model We generated a list of truncating de novo SNV mutations (nonsense, frameshift or splice mutations) discovered in our 411 quads from published lists (Iossifov et al. 2012; O'Roak et al. 2012b; 2011; Sanders et al. 2012). Both de novo and inherited CNV burden was derived from this work (Table S3 and S4). We used a logistic regression model, which transforms the binary outcome (i.e., affected vs. unaffected) such that linear predictors can be used. The model shown in Figure 3.5 is summarized as: logit[P(Affected=1)] ~ intercept + (de novo CNV burden) * (inherited CNV burden) * (de novo SNV burden).  
Data availability The CoNIFER output files for 1,644 samples are deposited in the National Database of Autism Research (NDAR) under the NDAR Collection ID 1878 and the title of this manuscript.  3.4 Results Samples and CNV discovery We discovered a total of 847 transmitted, exonic, rare, autosomal CNVs (Table 3.1). This included 453 CNVs transmitted to probands and 394 transmitted to unaffected siblings. Overall, the median estimated CNV size was 18.1 kbp (range 150 bp ? 5.18 Mbp, or 2 ? 320 exons). The median size of inherited CNVs was slightly larger in probands (19.4 kbp) when compared to unaffected siblings (16.6 kbp) but this difference was not statistically significant. As expected, duplications outnumbered deletions (519 vs 328; p < 1*10-10, binomial two-tailed test) and duplications were significantly larger than deletions (two-sided Mann-Whitney-U test, p < 1*10-16). The excess of duplications depended upon the size of event. For example, rare CNVs involving 20 or more exons were overwhelmingly duplications (139 duplications vs. 25 deletions), while small events were not significantly different (73 duplications and 93 deletions for 2-exon CNVs). This difference is observed irrespective of disease status (Figure S4).    
 49 
 Category CNVs Dups Dels Median Size (est) % of Samples CNVs >500 kbp CNVRs All Proband CNVs 453 277 176 19.4 kbp 64% 21 390 All Sibling CNVs 394 242 152 16.6 kbp 60% 16 345 Father ? Both 199 130 69 16.7 kbp 41% 7 94 Father ? Proband Only 100 67 33 25.0 kbp 19% 7 93 Father ? Sibling Only 82 52 30 15.4 kbp 18% 2 80 Mother ? Both 233 127 106 15.0 kbp 48% 10 118 Mother ? Proband Only 136 82 54 24.9 kbp 26% 5 128 Mother ? Sibling Only 97 61 36 21.7 kbp 21% 6 94 Either Parent ? Proband Only 236 149 87 25.0 kbp 39% 12 211 Either Parent ? Sibling Only 179 113 66 19.3 kbp 36% 8 168 Mother ? Either Offspring 466 270 196 17.8 kbp 86% 21 313 Father ? Either Offspring 381 249 132 18.6 kbp 72% 16 252 Totals 847 519 328 18.1 kbp 62% 37 525         Table 3.1: Summary of transmitted CNVs in 411 ASD quads Validation using single nucleotide polymorphism SNP microarray and targeted array-CGH We assessed the specificity of our call set by comparing our larger calls to Illumina 1M/Duo SNP microarray data and then selecting a subset of 80 samples for validation of smaller CNVs by array comparative genomic hybridization validation. These 80 samples carried a total of 161 exome-based CNV calls of which 69 (43%) were confirmed by SNP microarray (Figure 3.1a). Using a customized microarray design (Methods), we were able to test 86/92 of the remaining calls and confirmed an additional 65 events (nearly a twofold increased yield of CNVs) (Table S5). Of the 27 events which were not validated by array-CGH, 14 (or 9% of all 161 calls) were found to be specifically part of processed pseudogenes (i.e., retro-transcribed mRNA), which masquerade as duplications in exome-based discovery of CNVs, indicating that these events? while not genomic CNVs? are in fact true duplications of these genes or exons. Thus, we estimate an overall false positive rate (FPR) of 4%?8% (7/155 tested, or 13/161 in total; Figure 3.1a), dependent on the number of probes (or exons) in each call: for calls with fewer than 10 exons, the false positive rate was ~7% (6/104), while only one calls with 10 or more exons did not validate (2% or 1/51). There was no difference in the FPR between probands and siblings (3/68 [4.2%] for probands, 4/80 [4.5%] for siblings; Table S6). 
 50 
 
 Figure 3.1: Discovery and validation of previously undiscovered CNVs using exomes. (a) Fraction of CNVs previously identified using Illumina 1M SNP microarray (gray, ?known true positives?), the fraction of previously undiscovered CNVs identified and confirmed by targeted array-CGH in this study (green, ?previously undiscovered CNVs?), confirmed processed pseudogenes (hatched green) and the overall false positive rate for unconfirmed CNVs (gray). (b) The majority (73%, 152/207) of all previously undiscovered calls (green) discovered using exomes were smaller than 20 kbp (c-d,f) Three examples of previously undiscovered CNVs in this study. Top: CoNIFER output and normalized coverage at each exon. Middle: targeted array-CGH at CNV locus, with threshold for deletion/duplication (dotted red line) as determined by ROC-curve analysis of known CNVs (Supplemental methods). Bottom: Illumina 1M SNP microarray data for locus, showing poor probe coverage (c and d only). (e) Exome-based CNV discovery affords high exon-level specificity, as indicated by duplication of NETO1 exons (?, CoNIFER call). Previous work (Sanders et al., 2011) had discovered this CNV (*), but the (incorrect) breakpoints did not extend into NETO1.  We also assessed the sensitivity (or false negative rate, FNR) of our calls versus the previously identified CNVs from SNP microarray data. We found that our pipeline identified 72% (FNR of 0.28) of all known CNVs intersecting at least two exons and 
 51 
supported by 10 SNP microarray probes. False negative CNVs corresponded to samples with reduced mapped sequence coverage (Figure S2). For example, the Iossifov dataset (Iossifov et al. 2012)had an approximately twofold higher FNR, likely due to the lower overall sequence coverage in these exomes (a known factor in exome-based CNV discovery; Krumm et al. 2012; Fromer et al. 2012), and that the FNR for calls affecting only two exons was significantly higher than those with 3 or more exons (Table S6). We found no differences in the mapped coverage, estimated false positive rates or false negative rate among siblings and probands (p>0.3, Fisher?s two sided exact test and Table S6).  
Increased inherited CNV burden among autism probands. We compared the burden of inherited CNVs in the 411 probands and their siblings in terms of the total number of CNVs and the total number of genes ?hit?. We find that probands inherit more CNVs than siblings (453 vs. 394; Figure 3.2a) and these harbor more genes (921 vs. 726; Figure 3.2b). These comparisons are significant when using a paired t-test of probands-sibling pairs (p = 0.02 for genes and p = 0.004 for CNVs, two-tailed paired t-test) and when comparing the summed values for probands and siblings in aggregate (p < 1x10-6 for genes and p = 0.046 for CNVs, binomial two-tailed test). In order to ensure that these results were not driven by a few outlier families, we bootstrapped our data and calculated the confidence intervals for the proband-to-sibling burden (Figure S5). For CNVs, we found a median burden increase of 1.19 (95% CI: 1.09 ? 1.29) and for genes a burden increase of 1.30 (95% CI: 1.10 ? 1.52) across 10,000 bootstrap replicates, thereby rejecting the null hypothesis that probands have no increased inherited CNV burden in comparison to their siblings (Figure S5). Proband CNV burden was elevated over siblings across all size ranges, although individual quintile bins did not independently achieve statistical significance, due to their smaller size (Figure 3.2c). We find no significant enrichment of burden in either the smallest or the largest CNVs (by chi-square x2= 1.18, p = 0.95, df = 5, suggesting that the burden was not exclusively the result of either small or large CNVs.  
 52 
 Figure 3.2: Increased inherited CNV burden in ASD probands for large and small CNVs. (a) Total number of rare (observed in fewer than 10 families) inherited CNVs (? 2 exons) for 411 ASD probands (Pro) and their unaffected siblings (Sib). (b) Total number of affected genes in rare inherited CNVs. P-values are two-tailed paired t-tests between proband and sibling counts. (c) Burden of inherited CNVs across six size categories.  Previous work has indicated that private or ultra-rare CNVs may be more likely to be pathogenic than simply ?rare? (e.g., < 1% frequency) CNVs (Levy et al. 2011). We therefore examined if the inherited burden in probands was due to private CNVs in a small subset of the 411 families. We examined 271 private CNVs in probands and 245 private CNVs in siblings, but found no enrichment of private burden when compared to rare CNVs (p = 0.74, Fisher?s exact test; Figure S6a), nor did we find enrichment for the number of affected genes (p=0.46, fisher?s exact test; Figure S6b). (Note: the burden was in fact slightly increased when considering all rare events). We searched for additional factors which could underlie the proband-sibling burden differential. We found no significant differences in CNV burden dependent on the sex of the proband or the sibling, the concordance of their sexes, or the birth order of proband and sibling (p > 0.5, Fisher?s 
 53 
exact test; Table S8). However, we note that the highest overall CNV burden was found in families with one affected proband and at least three unaffected siblings. In fact, there was a linear increase in burden between probands and siblings across increasing family size, culminating in a 1.38x higher burden of CNVs in probands with three or more unaffected siblings.   Finally, we analyzed our dataset for parent-of-origin effects, and found a greater number of maternally transmitted CNVs, (136 maternal vs 100 paternal, binomial two-tailed p-value = 0.02); but this effect was not significantly enriched in probands versus siblings (Fisher?s exact test odds ratio = 1.14, two-tailed p = 0.49). Nonetheless, when we considered a null hypothesis in which a given transmitted CNV was equally likely to be transmitted to the proband only, the sibling only, or both (each with 1/3 probability), we found strong evidence that CNVs were not transmitted in equal fashion (Table 3.1; chi-square test with equal expected proportions, p = 0.0058, x2=16.4, df = 5), and that CNVs transmitted from the mother to the proband only were significantly more common than other transmissions. 
CNV burden-phenotype correlation. We assessed whether the increased inherited CNV burden would segregate with markers of ASD phenotypic severity using phenotype data from the SSC. First, we utilized the Social Responsiveness Scale (SRS), a standardized parent- or teacher-completed questionnaire which measures the severity of autism symptoms in social settings (but is not a diagnostic indicator of ASD and was not used in ascertainment of the SSC). We partitioned our 411 families into two groups based on the SRS t-score: 1) We defined ?Discordant SRS quads? as those where the proband was severely affected (SRS t > 75) and the sibling mildly affected (SRS t < 60), and 2) ?Concordant SRS quads? as all others (Figure S7). The concordant group encompassed a range of moderately affected probands as well as some moderately affected siblings (Figure S7). There were a total of 276 discordant SRS proband-sib pairs and 115 concordant pairs based on this definition. We found a striking split between the discordant and concordant proband-sibling pairs: the increased CNV and gene burden was almost completely driven by the discordant pairs (Figure 3.3a; p < 0.0002 for CNVs, p < 0.02 for genes, two-tailed paired t-test), and there 
 54 
was virtually no difference at a group or family level for concordant SRS pairs overall (1.04x, p> 0.5). Moreover, the burden ratio between probands and siblings was increased in the discordant group (for CNVs: 1.27x; for genes: 1.41x) over the ratio for the full set of 411 quads. Finally, we found that offspring (probands and siblings) with SRS scores ? 60 (?moderate? and ?severe? range) had higher CNV burden than did all offspring with SRS score < 60 (361 CNVs in 390 mildly affected offspring [1.12] vs. 436 CNVs in 388 moderately/severely affected offspring [0.92]; two-tailed independent t-test p < 0.0094). There was no statistically significant difference in burden between probands and siblings within each group (i.e., SRS < 60 or ? 60), however the relatively low number of ?affected? siblings and ?unaffected? probands hampers these comparisons.  
 Figure 3.3: Inherited CNV burden correlates with SRS phenotype. The Social Responsiveness Scale measures autism features in social settings via parent report on 65 items.(a) We classified proband-sibling pairs with severely affected probands but mildly or unaffected siblings as ?Discordant SRS? quads (276 quads), and all other quads as ?Concordant SRS? quads (115 quads). Strikingly, the discordant SRS quads fully recapitulated the inherited CNV transmission bias, whereas the concordant SRS quads did not show a differential burden. (b) CNV burden was independent of full scale IQ (FSIQ), and probands with either low FSIQ (?70) or high FSIQ had more CNVs than did their siblings. P values refer to two-tailed paired t-tests between probands and siblings.  
ProPro SibSib
Discordant
SRS
Concordant
SRS
T
ra
n
sm
itt
e
d
 C
N
V
s
a b
T
ra
n
sm
itt
e
d
 C
N
V
s
ProPro SibSib
Proband
IQ > 70
Proband
,4?
Figure 3: CNV burden and phenotype
Inherited by bothProband CNVs Sibling CNVs
 55 
  Proband FSIQ Proband CNVs Sibling CNVs Ratio Two-tailed t-test Probands vs. Sibs 
All Quads ? 70 157 126 1.25 p = 0.014 71 ? 85 89 70 1.27 p = 0.029  ? 86 184 166 1.11 NS Discordant SRS Quads Proband SRS < 60 Sibling SRS > 75 ? 70 138 104 1.32 p = 0.004 71 ? 85 62 44 1.40 p = 0.012  ? 86 113 101 1.12 NS 
Concordant SRS Quads ? 70 19 22 0.86 NS 71 ? 85 27 26 1.04 NS  ? 86 71 65 1.09 NS   Table 3.2: Summary of IQ and SRS burden. P-values represent two-tailed paired t-tests between probands and siblings in each group.  Next, we considered if the full-scale IQ (IQ) of the probands was affected by inherited CNVs. Since IQ scores were only available for probands (Table S7), we grouped quads into three groups: IQ ? 70 (?low?, consistent with a diagnosis of intellectual disability), between 71 and 85 (?intermediate?), and or greater than 85 (?high?). The CNV burden was significantly greater for probands in the ?low? and ?intermediate? proband IQ bins (1.25-1.27x burden, Table 3/2). Probands with ?high? IQ did not show statistically significant enrichment over siblings, although a trend was still apparent (1.11x, Table 3.2). When we examined the effect of SRS and IQ together (Table 3.2, Table S8 and Figure S8), we found that the burden differential was strongest for the most severely affected probands (those with IQ ? 85 and part of discordant SRS quads), reaching 1.32?1.40x for CNVs (p = 0.004). However, there was no significant burden between probands and siblings in SRS concordant quads, even with ?low? IQ probands (0.8x-1.09x, p>0.5; Table 3.2), indicating that the inherited burden may be most closely aligned with SRS score, and not IQ (however, we caution that there were only 22 quads total in this group). 
Enrichment for brain-expressed genes in inherited CNVs: We observed a trend for more of the proband-only genes to be highly expressed in brain-related tissues (19/317 or 6% proband vs. 6/224 or 2.7% for sibling only; Table S8; see Methods). The effect becomes most pronounced when considering discordant SRS quads 
 56 
(15/256 genes (5.9%) in probands, and 2/170 (1.2%) in siblings (p = 0.007) (Figure 3.4). When we considered all genes highly expressed in at least one brain-related tissue, we found significantly more 57/411 (13.9%) of probands had a CNV than did their siblings (33/411, or 8.0%; OR = 1.85, p = 0.009, Fisher?s exact test). These results suggest that a fraction of proband-specific genes are expressed in the nervous system tissues, and that this fraction is higher in proband-only genes than in sibling-only genes. While we caution that expression does not definitively imply pathogenicity, many of these genes and their biological pathways may be of interest for further study, both in these particular individuals and for ASD genetics in general. 
 Figure 3.4: Genes in proband-only CNVs from SRS-discordant quads are more likely brain-expressed. We defined a gene to be expressed in a tissue if it ranked in the top 5% of all genes in that tissue, and calculated the fold enrichment of proband and sibling genes expressed in each tissue. Tissues part of brain structures had the strongest proband enrichment (black bars), as did a computed average of expression across 18 brain regions (?Brain average?) in comparison to the average expression across other regions. However, the particular brain tissues with the strongest apparent enrichment should not be considered as independently enriched, as expression values for individual genes between brain regions are highly correlated. Stars indicate a FDR-corrected p-value < 0.05. See Figure S9 for results from all 411 quads.  We compared the genes detected in the CNVs in this study to a set of 1,560 genes that have been previously observed in autism/ASD, intellectual disability or schizophrenia (Table S10; Figure S10). Among SRS-discordant quads, we found significant enrichment 
43 additional tissues 
listed in Table S9
P ? 0.05 (FDR/q-value adjusted)*
Brain-related tissues (18)
Non-brain tissues (61)
Brain/non-brain computed averages
Figure 4: Ratio of highly-expressed genes in 79 human tissues found in inherited CNVs
 57 
of autism genes among proband CNVs compared to unaffected siblings (66 vs 35 genes, p=0.006 two tailed paired t-test; Table 3.2a and Table S8) In contrast, there was no enrichment among ?concordant SRS? proband-sibling pairs for previously observed genes (In fact, siblings had more genes: 17 vs. 24, p = 0.069). Overall, 16% of probands (44/276) in the SRS discordant group had a CNV in a previously observed gene, while only 10% of probands (12/115) in the concordant group had such an event. Intersecting the brain-expressed genes and previously observed disease genes we found that 13 genes matched both criteria, corresponding to 1.7% of all proband genes, and only two genes (0.3%) in siblings (Figure S10). The 13 convergent proband-only genes were found exclusively in discordant SRS families, indicating that these genes may be associated with more severe phenotypes (Table S11).   3.5 Discussion In this study, we report a significant CNV transmission bias for autism, finding an enrichment of inherited CNVs in sporadic cases versus their unaffected siblings. The targeted nature of exome sequencing enabled us to explore a smaller CNV landscape largely inaccessible by high-density SNP microarray data (Pinto et al. 2010; Sanders et al. 2011; Levy et al. 2011). We estimate that the use of exome data increased our power to detect gene-disruptive CNVs smaller than 20 kbp by ~2.25-fold. These CNVs provide potential insight in the pathophysiology of inherited CNVs in sporadic autism. We find that the CNV burden is more strongly correlated with measures of ASD phenotypes (such as the SRS score) as opposed to IQ; for proband-sibling pairs with concordant SRS scores, IQ was not dependent on the probands CNV burden. Genes already associated with autism and/or highly expressed in the brain are more likely to be disrupted. Private CNVs (seen once) were no more likely to be found in probands than simply rare variants (seen fewer than 10 times in our families). Burden was consistent across all sizes of CNVs, and we did not find any enrichment for either small or large events. Mothers are significantly more likely to be carriers of transmitted CNVs than fathers irrespective of disease status of the child. This finding is consistent with our recent analysis of ?secondary? CNVs being transmitted from mothers to children with developmental delay and multiple CNVs (Girirajan et al. 2012). We also noted that the transmission bias 
 58 
becomes more significant in probands from ASD families with many siblings as opposed to fewer individuals. Although this observation is inconsistent with the assumption that that probands in larger families with many unaffected siblings are more likely to have an underlying sporadic genetic etiology, it likely reflects an ascertainment in selecting the ?least affected? sibling in a large family as the ?designated sibling? for the purposes of forming a quad (Sanders et al. 2012). This suggests that a significant fraction of the underlying genetic etiology in the SSC may be inherited, a notion that has been examined previously(Davidson et al. 2012).  Our study benefited from the quad-based design of the SSC (Fischbach and Lord 2010), which provided a robust genetic control for each ASD proband, as well as the detailed phenotypic information available, which sharpened the contrasts between severely affected and less affected probands and their siblings, some of which showed subtle signs of the Broader Autism Phenotype (BAP; Davidson et al. 2012). Most of our observations were strengthened or restricted to ?SRS discordant? quads, where the proband was severely affected in terms of the SRS scale, but the sibling was unaffected. Approximately 67% (276/411) of the quads in this study were categorized as SRS discordant, and these quads explained virtually the entire overall CNV burden, encompassed the majority of brain-expressed genes and strengthened the association with previously implicated disease genes. This effect may be driven by inherent ambiguity in the simplex and multiplex classification scheme-- a scheme that is not truly binary but rather a continuous probability based on the number of unaffected siblings in the family (the more unaffected siblings, the greater the likelihood that the family is simplex). In essence, by focusing on the SRS discordant quads only, we have enriched not only for a more severe proband phenotype, but also a truly ?simplex? genetic etiology (as opposed to an environmental and/or stochastic one), thus enhancing the observed transmission disequilibrium of CNVs.  Our results should be viewed carefully in the context of previous studies. Notably, two recent studies (Levy et al. 2011; Sanders et al. 2011)failed to find statistically significant enrichment of inherited CNVs in sporadic autism probands compared to their siblings. 
 59 
These studies, which also analyzed families from the SSC, used high-density microarray platforms to discover CNVs in genome-wide fashion. It is possible that the increased sensitivity of our exome-based method for genic events? which are most strongly implicated by both de novo CNVs and de novo SNV studies? revealed the difference in burden between probands and siblings. Additionally, our study found that the differential burden was dependent on the SRS score and not IQ, a factor which has not been previously examined in the context of ASD and CNVs. In contrast, our results are in good agreement with those of the case-control study by Pinto and colleagues (Pinto et al. 2010), who found an overall case/control ratio of 1.19 for genic CNVs and no enrichment for ?ultra-rare? CNVs seen only once in their cases (although this study was largely limited to CNVs larger than 50 kbp).  The smaller size of the CNVs provides increased specificity to define individual genes when compared to previous studies focused on large CNVs, which typically encompass dozens of genes. The deletion or duplication of a subset of exons can have, in principle, the same impact on gene function as disruptive point mutations. Accordingly, several genes in our brain-expressed/SRS-discordant set of CNVs have been previously identified as part of severe neurological disorders. Among these was a CNV affecting DDHD2, an intracellular phospholipase which plays an essential role in synaptic function, and which has recently been implicated in a recessive form of complex hereditary spastic paraplegia (HSP [MIM 615033])-- a syndrome characterized by early-onset intellectual disability and spastic paraplegia (Schuurs-Hoeijmakers et al. 2012). We did not observe any CNVs in this gene in 2,972 control exomes. Similarly, another proband (and 0/2,972 controls) carried an inherited CNV affecting only the PACS2 gene (MIM 610423), part of six genes in critical region of 14q32 deletion syndrome, characterized by intellectual disability and mild facial dysmorphology (Holder et al. 2012). Lastly, in two families, we identified a previously unidentified small (~5 kbp), 2-exon deletion of the ZNF396 gene (Figure 3.1c), which was identified as a candidate gene for Alopecia with Mental Retardation syndrome (MIM 613930) by microsatellite linkage analysis (in fact, ZNF396 was the closest gene to the linkage peak; Wali et al. 2007). The frequency of this deletion in our control set was 3/2,972 (0.1%). Although these 
 60 
identified genes and CNVs may play an important role in the pathogenesis of ASD on the basis of their previously identified roles in Mendelian disorders, we would like to emphasize that their individual rarity and overall small effect prevents them from being conclusively identified as having Mendelian effects.  Other genes disrupted by CNVs have functional roles in neural function, brain development or neurobehavioral phenotypes in model organisms (Table 3.3). For example, we identified two independent disruptions of ORC3 gene (MIM 604972; one shown in Figure 3.1d) encoding a protein of the Origin Recognition Complex. The complex regulates dendritic spines and dendrite arborization in post-mitotic neurons, and has been implicated in olfactory learning and memory in Drosophila (Huang 2005). Notable also was a CNV affecting CPLX1 (MIM 605032; Figure 3.1f), specific to the SNARE neuronal vesicle exocytosis pathway in neurons, as well as CNVs affecting neural receptors such as HTR3E (MIM 610123; a subunit of the ionotropic serotonin receptor) and NETO1 (MIM 607973; Figure 3.1e), a key component of the NMDA-receptor complex and critical for synaptic plasticity and learning in mice (however, this CNV was transmitted to both proband and sibling; Ng et al. 2009a). Previous work has implicated the ubiquitin processing pathway (Glessner et al. 2009), and we found a rare CNV in UCHL1 (MIM 191342), a ubiquitin-adduct processing enzyme which has strong and specific brain expression, knockout mice show specific neurodegenerative phenotypes (Wada et al. 1999), and recent work has shown it to regulate the NCAM1 neural cell adhesion molecule (MIM 116930; Wobst et al. 2012) Finally, we found several interesting genes on the basis of brain expression pattern, including 1) an inherited deletion of the IQSEC1/BRAG2 gene (MIM 610166), which is strongly expressed in the prefrontal cortex and involved in clathrin-mediated endocytosis of AMPA receptors critical to long-term potentiation in mice (Scholz et al. 2010), 2) a duplication of the ZNF251/ZNF517 cluster on 8q24.3, which have tissue-specific expression highest in the fetal brain and cerebellum (Peter Lorenz 2010), and 3) duplication of AQP4 (MIM 600308), the primary water transporter in brain glial cells, especially in the amygdala and prefrontal cortex and has been implicated in epilepsy (Binder et al. 2012). 
 61 
     
Table 3.3: Selected inherited CNVs    
Sample Chr/Position (hg19) Size (kbp) State # exons Freq. in 411 quads Genes in transmitted CNV 12647.p1 1 32,084,793 32,110,465 25.7 Dup 12 2 HCRTR1, PEF1 11872.p1 1 65,730,593 65,831,879 101.3 Dup 4 1 DNAJC6 12719.p1 1 146,715,494 146,767,190 51.7 Del 23 1 CHD1L 12997.p1 2 230,632,269 230,724,290 92 Dup 39 1 TRIP12 12394.p1 2 241,538,067 241,709,123 171.1 Dup 42 3 KIF1A, GPR35, AQP12B, AQP12A, CAPN10 12534.p1 3 12,940,888 12,978,197 37.3 Del 13 2 IQSEC1 13099.p1 3 97,486,951 97,634,880 147.9 Del 19 1 ARL6 12645.p1 4 818,279 845,762 27.5 Dup 5 1 CPLX1, GAK 11773.p1 4 2,641,461 2,835,561 194.1 Dup 34 1 TNIP2, FAM193A, SH3BP2 11066.p1 4 41,258,993 41,259,143 150 bp Dup 2 3 UCHL1 13385.p1 4 169,083,678 169,086,477 2.8 Del 3 1 ANXA10 13293.p1 5 619,104 644,540 25.4 Dup 9 2 CEP72 12758.p1 6 24,454,242 24,523,153 68.9 Dup 20 1 ALDH5A1, GPLD1 11551.p1 6 88,315,634 88,318,947 3.3 Del 3 1 ORC3 11459.p1 6 88,317,390 88,366,700 49.3 Del 10 1 ORC3 13412.p1 7 33,102,179 33,185,976 83.8 Dup 7 3 RP9, BBS9, NT5C3 
11722.p1 7 48,308,576 48,416,169 107.6 Del 20 1 ABCA13 11716.p1 8 38,090,512 38,117,639 27.1 Del 16 1 DDHD2 13412.p1 8 86,351,940 86,575,726 223.8 Dup 14 1 CA3, CA2, REXO1L1 12534.p1 8 145,947,028 146,033,780 86.8 Dup 18 1 ZNF251, ZNF34, ZNF517, RPL8 11356.p1 9 139,634,401 139,651,044 16.6 Dup 16 1 LCN6, LCN10, LCN8 
13162.p1 10 5,203,384 5,260,723 57.3 Del 12 1 AKR1C4, AKR1CL1 13843.p1 11 43,772,460 43,775,671 3.2 Del 2 1 HSD17B12 11241.p1 12 120,875,929 120,884,632 8.7 Dup 7 2 GATC, COX6A1, TRIAP1 12396.p1 14 105,836,177 105,861,009 24.8 Dup 17 1 PACS2 11479.p1 15 43,696,610 43,701,294 4.7 Dup 5 1 TP53BP1, TUBGCP4 13843.p1 15 55,475,512 55,497,903 22.4 Dup 6 2 RAB27A, RSL24D1 12837.p1 15 57,730,197 57,754,090 23.9 Dup 7 2 CGNL1 13543.p1 15 91,488,121 91,520,001 31.9 Del 25 2 RCCD1, PRC1, UNC45A 13215.p1 16 15,596,178 15,609,285 13.1 Del 6 1 C16orf45 14201.p1 16 68,710,287 68,713,877 3.6 Dup 5 2 CDH3 12100.p1 16 70,714,696 70,714,928 232 bp Dup 2 3 MTSS1L 
12373.p1 16 81,314,461 81,396,216 81.8 Dup 10 1 GAN, BCMO1 12697.p1 18 24,436,174 24,628,467 192.3 Dup 10 1 CHST9, AQP4, CHST9-AS1 12869.p1 18 72,229,281 72,251,798 22.5 Dup 8 1 CNDP1 11356.p1 18 77,470,345 77,891,075 420.7 Dup 28 2 KCNG2, RBFA, CTDP1, ADNP2, TXNL4A, PQLC1 13296.p1 19 6,681,951 6,686,913 5 Dup 8 1 C3 11298.p1 19 18,704,375 18,704,917 542 bp Dup 2 1 CRLF1 13815.p1 19 57,835,049 57,932,849 97.8 Del 15 1 ZNF547, ZNF304, ZNF17, ZNF548, ZNF543 13396.p1 21 19,628,825 19,632,603 3.8 Del 3 1 CHODL 13327.p1 21 35,742,777 35,899,047 156.3 Dup 8 2 KCNE2, RCAN1, KCNE1, FAM165B 
 62 
 
 Figure 3.5: A combined model of inherited and de novo mutations reveals independent risk for both. A logistic regression model estimates the odds ratio for each inherited CNVs (blue), de novo CNV (red) or disruptive de novo SNV variant (gray; nonsense, splice and indels only) in probands and siblings. Odds Ratios and burden (proband vs. sibling ratio) given in accompanying table, revealing independent risk for each type of mutation. The line width for each type of mutation in the figure indicates if a bias has been observed for new mutations arising on the maternal or paternal haplotypes (see also: for SNVs, O?Roak et al. 2012 and for CNVs: Hehir-Kwa et al. 2011).  Since the patients and families have been analyzed for both de novo CNVs and SNVs, we can develop a model to assess the relative contribution of each class of genetic variant to autism. First, we confirmed that inherited CNVs were enriched in the set of probands without other known de novo CNVs or SNVs (368 inherited CNVs in probands vs. 327 CNVs in siblings of 336 quads; p < 0.03, two-tailed paired t-test). Second, we developed a logistic regression model, in which the binary outcome of either proband or sibling is predicted by the count of disruptive de novo SNVs, de novo CNVs and the count of our rare transmitted CNVs. We performed regressions on both the set of all 411 quads, as well as the set of 276 proband-sibling pairs with discordant SRS scores. The results (Figure 3.5 and Table S12) reveal a strong) effect for disruptive (nonsense, splice and frameshift) de novo SNVs (OR 4.30, p < 0.001) and CNVs (OR 6.65, p <0.02), and also confirmed a statistically independent effect for transmitted CNVs (OR 1.16, p <0.04); again, in this model, the effect was primarily driven by discordant SRS quads. Although the strength of de novo SNVs strongly outweighs the pathogenic effect of inherited CNVs, our model predicts that the inherited CNV contribute significantly to sporadic disease, especially in the case of discordant SRS pairs (where the OR increased to 1.26, p< 0.015). We did not find any significant interactions between our predictors, reflecting 
 63 
the relative infrequency of co-occurring CNVs and de novo SNVs but also the limited sample size. It is also possible that careful consideration of rare and disruptive, inherited SNVs could statistically interact with other classes of mutation, but we did not take these into account in building our model. Taken together, our model suggests that disruptive de novo SNVs and both inherited and de novo CNVs contribute independently to the risk of autism. We believe that future studies of ASD and other complex neurological disorders will make significant strides in understanding the genetic underpinnings of disease, especially if an integrated approach considering all disruptive mutations? inherited and de novo, CNV and SNV, small and large? is applied. 
 64 
   ACKNOWLEDGEMENTS: We thank the National Heart, Lung, and Blood Institute, NIH Grand Opportunity (GO) Exome Sequencing Project and its ongoing studies, which produced and provided exome variant calls for comparison: the Lung GO Sequencing Project (HL-102923), the Women?s Health Initiative Sequencing Project (HL-102924), the Broad GO Sequencing Project (HL-102925), the Seattle GO Sequencing Project (HL-102926), and the Heart GO Sequencing Project (HL-103010). We are grateful to all of the families at the participating Simons Simplex Collection (SSC) sites, as well as the principal investigators (A. Beaudet, R. Bernier, J. Constantino, E. Cook, E. Fombonne, D. Geschwind, E. Hanson, D. Grice, A. Klin, R. Kochel, D. Ledbetter, C. Lord, C. Martin, D. Martin, R. Maxim, J. Miles, O. Ousley, K. Pelphrey, B. Peterson, J. Piggot, C. Saulnier, M. State, W. Stone, J. Sutcliffe, C. Walsh, Z. Warren and E. Wijsman). We also acknowledge M. State and the Simons Simplex Collection Genetics Consortium for providing Illumina genotyping data, T. Lehner and the Autism Sequencing Consortium for providing an opportunity for pre-publication data exchange among the participating groups. We are grateful for helpful discussion and manuscript preparation help from T. Brown and P. Sudmant, as well as all members of the Eichler lab. We appreciate obtaining access to phenotypic data on SFARI Base. This work was supported by the Simons Foundation Autism Research Initiative (SFARI 137578 and 191889; E.E.E., J.S. and R.B.) and NIH HD065285 (E.E.E. and J.S.). E.B. is an Alfred P. Sloan Research Fellow. E.E.E. is an Investigator of the Howard Hughes Medical Institute. 
 65 
IV. Inherited SNV mutations in Autism Spectrum Disorder Towards a holistic model of genetic variation in ASD   4.1 Summary 
We describe the creation of a combined dataset and resource of inherited and de novo SNVs and CNVs across 2,377 Simons Simplex Collection (SSC) families. The dataset includes 1,786 parent-child-unaffected sibling ?quads?, which enable comparison of the burden of inherited and de novo mutations between affected and unaffected siblings in simplex autism families. We find that private inherited truncating SNV mutations in conserved genes are significantly enriched in probands (OR=1.14, p < 0.0002, two-tailed paired t-test), and we observe that this effect that becomes more pronounced with increasing gene-level conservation (assessed via the RVIS score). Likewise, we confirm previous reports of transmission disequilibrium for inherited CNVs. Transmission disequilibrium of SNVs was strongest in probands with diagnoses of Autism Disorder or Pervasive Developmental Disorder, and we did not observe a significant enrichment in probands with Asperger?s Disorder; similar results were observed when stratifying by IQ. We quantified ASD risk for de novo and inherited CNVs and SNVs by using a conditional logistic regression model, and found that inherited private truncating SNVs and rare inherited CNVs contribute an independent increase in risk of 1.11 (p=0.0002) and 1.23 (p = 0.01), respectively. Our results confirm a statistically independent role for inherited mutations in ASD risk and identify additional candidate genes (e.g., RIMS1, CUL7 and CSMD1) where inherited and de novo burden converge.   This chapter is in preparation for publication. 
 66 
4.2 Introduction Autism spectrum disorder (ASD) is a common neurodevelopmental disorder diagnosed in approximately 1/88 children and manifests as deficits in social behavior and language development, as well as restricted or stereotyped interests. Several large studies have confirmed initial observations that ASD is a highly heritable, and consensus estimates suggest that ~50-60% of ASD etiologies are genetic (Hallmayer et al. 2011; Bailey et al. 1995; Constantino et al. 2013; Steffenburg et al. 1989). In particular, de novo mutations have been implicated as the underlying genetic cause in cases, and these mutations have provided a rich source for understanding the pathogenic genes and neurobiological mechanisms of ASD. However, de novo mutations are rare, and their overall contribution is estimated to be 25-35% (Krumm et al. 2014; Ronemus et al. 2014), shy of the overall heritability estimate, suggesting that other genetic etiologies contribute to ASD.   Previous reports have suggested additional genetic models for ASD, in which rare inherited copy number variants (CNVs) are disproportionally inherited by affected probands, rather than their unaffected siblings (Krumm et al. 2013; Poultney et al. 2013; Pinto et al. 2010). These studies describe a transmission disequilibrium for mutations of very low population frequency, suggesting that the pathogenic CNVs for ASD are of relatively young age and under strong purifying selection. In the present study, we hypothesize that inherited single nucleotide variants (SNVs) also contribute to a model of rare inherited variation underlying the genetic etiology of autism. We leverage a family-based study design of simplex autism, in which one offspring carries a diagnosis of autism and an unaffected offspring acts as a genetic control, to discover specific ASD risk genes and integrate inherited factors with de novo factors in a general ASD risk model.  This study, in conjunction with the National Database for Autism Research and the Simons Simplex Consortium, also makes available a resource of uniformly processed raw exome sequence (bam) files as well as raw variant (vcf) files from multiple variant calling tools. We have generated both inherited and de novo SNVs and CNVs across 2,377 families, include 1,786 quads. We envision that this data becomes a resource for 
 67 
further study and investigation in the autism research community. The complete set of raw underlying data, variant calls and methods are available through the National Database for Autism Research (NDAR) under the Study ID 334 (see Web Resources for link)   4.3 Methods Dataset We analyzed exome data from 2,377 ASD families after quality control of 2,391 families from the Simons Simplex Collection (Fischbach and Lord 2010), including 1,786 quads and 591 trios (total n=8,917 exomes). A subset of these families (n=752 families) were sequenced as part of three previous publications (Sanders et al. 2012; Iossifov et al. 2012; O'Roak et al. 2012b). We note that Iossifov and colleagues report on aspects of sequence generation, and de novo rates (Iossifov et al., in preparation, Nature, 2014). This study was approved by the institutional review board of the University of Washington. 
Alignment and SNV discovery We aligned exome data to the GRCh37 reference genome using BWA-MEM (Li 2013; v0.7.5a) and post-processed alignments using the GATK ?best practices? pipeline, including indel realignment and BQSR. Exome data was matched to existing SNP barcodes (generated from Illumina 1M/1MDuo SNP microarrays and/or 96-SNP fingerprints collected by the Rutgers sample distribution center) in order to eliminate sample identity/paternity mix-ups. We called SNVs and indels with both GATK HaplotypeCaller (McKenna et al. 2010; v 2.7-4) and FreeBayes (Garrison and Marth 2012; v0.99) to within 20 bp of the exon targets, and were calls were annotated using SnpEFF and merged into union and intersection sets. Allele frequency was estimated by counting non-reference alleles across all parents (n=4,754 parents). We define all stop-gained, frameshift and splice-site variants as ?Likely Gene Disrupting? or LGD variants.   For de novo events, we applied minimum a read-depth of six alternate alleles in offspring and a depth of >10 reference reads in parents, including no more than two low-quality bases of the de novo allele. To exclude common artifacts, we only accepted unique de 
 68 
novo sites across all families. Inherited events were derived from the intersection set of both algorithms, with filters depth and quality filters set to DP > 20 and QUAL > 50. 
CNV discovery We used CoNIFER (Krumm et al. 2012)and XHMM (Fromer et al. 2012)algorithms to discover copy number variation from exome data at high exonic resolution. Calls from each algorithm were reconciled, merged, and genotyped within each family to determine inheritance patterns. In order to maximize both sensitivity and the precision of the callset, we used targeted in silico genotype information based on available SNP microarray data of each call (n=1,266 families; CRLMM algorithm (Scharpf et al. 2011), see supplementary methods). In order to focus our analysis to those CNVs most likely relevant to ASD pathogenesis, we restrict our analysis to rare CNVs found at less than 0.8% frequency (< 10 events/1,266 families) and outside of repetitive genomic elements.  4.4 Results Discovery and validation of inherited and de novo SNVs  Starting from raw sequence data, we reprocessed 8,917 exomes from the Simons Simplex Collection in order to standardize their analysis and allow comparison among the entire data set of 2,377 families. Our pipeline entailed remapping using bwa-mem and variant calling using both GATK HaplotypeCaller and FreeBayes, which amounted to over 1,000,000 CPU-hours of computation using Amazon Web Services. Using FreeBayes and GATK, we found a median of 26,920 transmitted variants per family (95% CI 23,394?31,401). Overall, 81% of all transmitted variants were found by both FreeBayes and GATK, 12% by FreeBayes alone and 7% by GATK. Of all transmitted mutations in the intersection set, an average of 341 (95% CI: 133-632) sites per family were novel and not observed in dbSNP (v137); 98.6% of sites were in dbSNP with a mean concordance rate of 99.7% (for the union of transmitted variants, 93.4% of variants were found in dbSNP and 99.5% were concordant; for events discovered only by GATK, 76% were found in dbSNP and these were on average 96.7% concordant; for events discovered only by FreeBayes, 64% were found in dbSNP and these were 99.4% concordant). For intersecting variants, the Ti/Tv ratio was 2.94 (95% CI 2.79?3.03) for all sites, 2.95 
 69 
(2.83?3.04) for dbSNP sites and 1.94 (1.05?2.75) for novel sites. The median Ti/Tv ratios for transmitted variants specific found only by GATK was 1.34 overall and 1.54 for dbSNP sites; for variants specific to FreeBayes discovery, the Ti/Tv was 2.15 overall and 2.35 for dbSNP sites. 
New de novo mutations (SNVs and CNVs) Our analysis benefited from the use of newer bioinformatics tools, allowing us to discover 1560 new de novo mutations previously not detected. We tested a small subset of these (n=75) and validated set of 23 new likely gene disrupting (LGD) mutations and 26 new missense de novo mutations in probands (Table 4.1 and Table S1). Notably, these validated mutations established ASH1L as a newly recurrently truncated gene, added a new LGD mutation to GIGYF1 for a total of three LGD de novo mutations observed. In addition, the new mutations established recurrent hits in seven new genes, including GIGYF2 (see below), ATP1B1 (a gene with strong brain expression), SSPO (a brain-secreted protein involved in axon growth), and JAKMIP1 (previously implicated in ASD and with strong and specific brain expression).   Mutations  (this work) Previously identified mutations Status Siblings ASH1L 1 LGD 1 LGD Newly Recurrent LGD  GIGYF1 1 LGD 2 LGD Multiple recurrent LGD  GIGYF2 1 Ms 1 LGD Newly recurrent  ATP1B1 1 Ms 1 LGD Newly recurrent  SSPO 1 LGD  1 Ms Newly recurrent  w/ new LGD  JAKMIP1 1 Ms  (NV: 12607.p1) 1 LGD Newly recurrent 1 Ms, 1 LGD RNF213 1 Ms 1 LGD Newly recurrent 2 Ms UBR5 1 Ms  (NV: 11102.p1) 1 LGD Newly recurrent  ZBTB45 1 LGD 1 Ms Newly recurrent  w/ new LGD       Table 4.1: genes with new recurrent de novo mutations. All mutations validated, except those marked NV (No validation attempted)  
 70 
We used an intersection of both CoNIFER and XHMM to discover small, exonic CNVs and validated these using custom array-CGH and targeted in-silico genotyping using Illumina SNP microarray data. Of 52 tested CNVs, we validated 21 new (previously unvalidated or undetected) de novo CNVs; several of these affected genes recurrently hit by de novo SNVs, including DSCAM, CHD2, and TNRC6B.   Investigation of the newly recurrent genes found that three genes from this list (GIGYF1, GIGYF2 and TNRC6B) and three additional genes with single de novo mutations (GRB10, RBM12, and ZNF598) are closely linked with one-another using protein-protein interaction data (Figure 4.1; Table 4.2). Gene ontology annotation of the genes in this network suggests involvement of the IGF (Insulin Growth Factor) signaling pathway (GIGYF1, GIGYF2, GRB10; accession GO:0048009), which has been previously implicated in the development of ASD (Bozdagi et al. 2013) Furthermore, GIGYF2 and ZNF598 form part of the m4EHP mRNA binding complex and have widespread translational repression roles, especially in the brain and lungs (Masahiro Morita 2012).  
 Figure 4.1: Network of genes with recurrent de novo hits, based on new de novo mutations identified in this study. Red Stars: de novo LGD mutations (Frameshift, Stop-gained, Splice-site); Blue stars: de novo missense mutations; Purple star: CNV deletion (see Figure S1)  
Figure 1
GIGYF1
GRB10
GIGYF2
TNRC6B ZNF598
RBM12
Part of IGF signalling 
pathway
de novo
missense SNV
de novo
CNV deletion
de novo
LGD SNV
 71 
    De novo mutations in probands ESP6500 Rare LGD SNVs Allele Count ^ Brain Expression Part of  IGF Pathway GIGYF1 Fs, Fs*, SS 3 +++  (Cerebellar) Yes GIGYF2 Stop, Ms* 0  Yes GRB10 Ms* 1 + Yes TNRC6B Fs, Stop,  CNV del* 101& +++  (Cerebellar)  ZFN598 Stop 4 ++  (Cerebellar)  RBM12 Ms 0    Table 4.2: Summary of mutations in IGF-related ASD network * Mutations newly identified in this study ^ Must have minimum read depth of 10 + bi-allelic & 99/101 are a single 5' frameshift variant (AA position 4949/5502) Transmission disequilibrium of SNVs between probands and siblings We tested for transmission disequilibrium between probands and siblings in three ways: i) by a Fisher?s exact test, ii) by paired Student?s t-test or Mann-Whitney U test, and iii) by logistic regression (where the dependent variable was if the variant was found in a proband or sibling). We used only variants called by both FreeBayes and GATK variant callers in order to minimize false positives.  We found no statistically significant overall burden when considering all rare or private protein-altering mutations (LGD + missense) together, even when considering additional hypotheses based on highly brain expressed genes, mutations with high CADD scores, or mutations in genes with de novo mutations in other probands (p > 0.05 in all comparisons). In contrast, we found private LGD mutations in genes which are intolerant of deleterious mutations (as informed by the RVIS (Petrovski et al. 2013) in the lower 50% of all scores) were statistically enriched overall in probands (OR=1.14, p < 0.0002, Fisher?s exact test) and at a family level (p < 0.0001, two-tailed paired t-test; Figure 4.2A). These effects persisted even for all LGD mutations in genes (regardless of frequency) with RVIS scores <50% (OR=1.06, p=0.03 Fisher?s exact test; p=0.02 two-tailed paired t-test). Furthermore, the RVIS score was a significant predictor of proband 
 72 
or sibling inheritance in a logistic regression model build on all LGD mutations (p=0.028, OR=1.01 per RVIS percentage point). As suggested by this model, the burden of private LGD mutations in genes with progressively lower RVIS scores continues to increase (Figure 4.2B). At the extreme, the burden between probands and siblings in genes with the lowest 1% of all RVIS scores reaches an odds ratio of 1.4, although this comparison at the extreme is not yet statistically significant (due to the small number of mutations present in this bin).  
 Figure 4.2 Transmission disequilibrium of SNVs in ASD A) Private LGD (red bars) inherited SNVs in genes which are not tolerant to functional variation were significantly enriched in probands. The analysis examines only SNVs in genes with an RVIS score in the lower 50%. Non-private rare variants, or missense (gray bars) inherited SNVs are not enriched in probands. B) The RVIS score is a critical determinant for enrichment in probands: Burden was highest (reaching OR=1.4) for private inherited LGD SNVs amongst genes with the lowest RVIS scores.   We looked for a relationship between the set of private LGD mutations in RVIS-restricted genes and the phenotype of probands in the SSC (Figure 4.3). First, we examined how the overall clinical diagnosis impacted burden: for the 1,575 probands 
 73 
with a diagnosis of ?autism? or ?pervasive developmental disorder?, the odds ratio was 1.14 and 1.18 (p=0.001 and 0.05), respectively; in contrast, probands with a diagnosis of ?Asperger?s? (n=205) had a lower odds ratio of 1.05 (p> 0.7; Figure 4.3A). Consistent with this, we found that probands with full-scale IQ lower than 70 had an odds ratio of 1.18 (p = 0.014, n=530), whereas those with IQ above 100 had a lower, non-significant odds ratio of 1.06 (n=454; Figure 4.3B). In contrast to IQ, there was no difference in transmission disequilibrium between probands and siblings with highly differential Social Responsiveness Scale scores (?discordant SRS? quads) and those with less extreme scores (data not shown). 
Transmission disequilibrium of CNVs between probands and siblings:  There were 2,891 total autosomal CNVs detected in this study with child specific event counts of 854 in probands and 743 in siblings (ratio=1.25, p=0.006, binomial two-sided test) with 47.4% of probands and 44% of siblings having a CNV. Overall, proband CNVs (median=40.6 kbp) were slightly larger than sibling events (median=38.4 kbp) but not statistically significant (p = 0.09, Wilcoxon). The overall ratio of duplications to deletions was 1.6 consistent with previous results for a smaller SSC dataset (Krumm et al. 2013). Lastly, the number of proband CNVs >500 kbp (n=85, median size=1,211 kbp) identified in probands was 2.3-fold higher than in siblings (n=37, median size=889 kbp). 
Phenotypic measures and autism CNVs A previous publication has identified significance for CNVs in proband-sibling pairs that are discordant for their Social Responsiveness Score (SRS, where discordant is defined as a proband with SRS > 75 and an unaffected sibling with a score < 50) and not in those that are concordant (Krumm et al. 2013). Here, we confirm these results: we find that probands in discordant pairs have a significantly higher burden of CNVs (OR=1.16, p=0.008; Figure 4.3C). In contrast, probands were not enriched for transmitted CNVs when their SRS scores were concordant or unremarkable in comparison to their siblings (OR=1.02, p > 0.1; Figure 4.3C). When examining IQ, we find that probands with low IQ (FSIQ < 70) are enriched for inherited CNVs in comparison to their siblings (OR=1.16, p=0.04; Figure 4.3D), but that probands with higher IQs (>70) are not enriched. This is in agreement with our previous report, where we found that probands with low IQ (either 
 74 
<70 or <85) were enriched for inherited CNVs versus their siblings, while those with higher IQ (>85) did not have a higher burden.  
 Figure 4.3. Transmitted mutations and their effect on phenotype. Clockwise from top left: (a) Private inherited LGD SNVs enriched in probands with Autism and Pervasive Developmental Disorder (PDD) diagnoses, but not Asperger?s Syndrome (AS). (b) Private inherited LGD SNVs primarily enriched in cases with lower IQ than average (<100). (c) We observe transmission disequilibrium of rare inherited CNVs in SRS Discordant families (Proband SRS score > 75, Sibling < 50), but not in families where the SRS score is mild or more balanced between proband and sibling. (d) Rare inherited CNVs are enriched in probands (versus their siblings) with IQ lower than 70, but the effect is not significant in probands with IQ > 70. All tests and reported p-values are paired t-tests based on proband-sibling pairs. 
Autism PDD AS < 70 70 -100 100+
Full-scale IQClinical impression
Probands (n=1,786)
Siblings (n=1,786)
Pr
iv
at
e 
LG
D 
SN
Vs
OR=1.15
p = 0.001
OR=1.18
p = 0.05
OR=1.04
NS
OR=1.18
p = 0.014
OR=1.18
p = 0.002
OR=1.06
NS
Pr
iv
at
e 
LG
D 
SN
Vs
400
300
200
100
SRS Discordant
(Probands SRS >75 &
Sibling SRS < 50)
SRS concordant
100
200
300
400
500
600
IQ < 70 ,4?
Probands (n=1,786)
Siblings (n=1,786)
OR=1.16
p = 0.008
OR=1.02
NS
OR=1.16
p = 0.04
OR=1.07
NS
Ra
re
 T
ra
ns
m
itt
ed
 C
NV
s
Ra
re
 T
ra
ns
m
itt
ed
 C
NV
s
Inherited SNVs
Inherited CNVs
 75 
Integration of mutational spectrum suggests new ASD candidate genes  We jointly examined SNVs and CNVs at a gene level in order to suggest new ASD candidate genes (Table 4.2). Events were tabulated based on mutation type (SNV/CNV) and inheritance class as presented throughout this manuscript. In particular, we counted all de novo CNVs and LGD or missense SNV events, private LGD-inherited SNVs in genes with an RVIS score < 50%, and rare inherited CNV, in which at least one gene had an RVIS score <50%. From these values, we calculated p-values for de novo SNVs (as in O'Roak et al. 2012a) and inherited SNVs and CNVs (using a binomial test). Genes were ranked based on the Fisher?s combined p-value heuristic. Finally, in order to remove common ?false-positive? genes, we restricted our analysis to genes with low RVIS scores (<10%) or those with no events in sibling.   Mutations RVIS Notes/Function 
RIMS1 
- 2 de novo LGD - 2 private inherited LGD - 6 rare inherited LGD - 3 rare inherited LGD in siblings (2 shared with probands) 3.3% 
Strong and specific brain expression; Previous candidate in ASD studies 
CUL7 - 2 de novo missense SNV - 2 private inherited LGD SNVs 3.4% Neuronal dendrite patterning function 
CSMD1 - 3 de novo missense SNV - 4 private inherited LGD SNVs - 5 inherited CNVs (4 focal to CSMD1) <0.5% 
Strong and specific brain expression; previous assoc. w/ SCZ 
    Table 4.3: Converging evidence for RIMS1, CUL7 and CSMD1 from de novo and inherited mutations  The combined gene-level table identifies several new candidate genes. In particular, the three highest ranked genes?RIMS1, CUL7, and CSMD1?each display brain-specific patterns or have identified neural functions. The highest ranked gene, RIMS1, has two de novo LGD mutations and two private LGD-inherited mutations in probands. Additionally, there were six additional LGD non-private inherited mutations in probands (two of which are shared with siblings). RIMS1 has been previously suggested as an ASD candidate by Iossifov and colleagues (Iossifov et al. 2012), and has been also observed by 
 76 
Mathew State and colleagues (personal communication).; it displays brain-specific brain expression, and disruption of the gene in mice leads to increased post-synaptic density and impaired learning.   CUL7 has two de novo and two LGD-inherited mutations in probands (none in siblings); functionally, it is a E3 ligase with high cerebellar brain expression and a selective role in neural dendrite patterning and growth (Litterman et al. 2011; CUL7 is also the causative gene in 3M syndrome [OMIM:273750], which curiously is not associated with abnormal mental development).   Finally, CSMD1 appears as a strong ASD risk factor candidate. Among the 1,386 quad families, we found multiple types of mutations and events in CSMD1: three de novo missense mutations, one shared inherited LGD SNVs, and one four rare inherited focal CNVs (one shared with siblings). Overall, there are eight events in probands and two in siblings. In addition, there were three additional LGD-inherited SNVs in probands within the trios. CSMD1 has the fourth-lowest RVIS score (0.02 percentile) of all genes, suggesting it is highly intolerant to functional mutation; this is born out in examination of mutations in the ESP6500, where CSMD1 has only six LGD mutations. Comparison of the CSMD1-focal inherited CNVs seen in probands with the events seen in the Database of Genomic Variants (DGV) suggests that they are private to the ASD families (i.e., not observed in DGV). Furthermore, the ASD-specific events occur at the exon-dense 5?-end of CSMD1, a region nearly devoid of exonic CNVs in the DGV (Figure S#). Functionally, CSMD1 exhibits strong and specific brain expression; it functions within the complement control pathway, which has been implicated in synaptic pruning. CSMD1 in particular has been associated with schizophrenia (H?vik et al. 2011), and damaging variants of the gene segregated in two ASD families with distantly related probands (Cukier et al. 2014).  
 77 
 Figure 4.4: Convergence of de novo and inherited mutations on CSMD1. From top: (a) RefSeq gene model of CSMD1 (RVIS score < 1%), (b) three de novo missense mutations in probands, (c) four inherited LGD SNVs in probands, (d) five inherited CNVs, four are focal to CSMD1 alone, (e) of all mutations, only two are shared with siblings (and none are specific to siblings), (f) Expression profile of CSMD1 shows strong brain tissue expression (data from GTEx consortium).   
Integration of ASD risk across SNVs and CNVs We quantified the risk for ASD of de novo and inherited CNVs and SNVs by using a conditional logistic regression model (methods; Figure 4.5 and Table 4.4). In this model, the binary outcome of ASD proband or unaffected sibling is predicted by four independent counts: 1) the number of de novo CNVs, 2) the number of LGD de novo SNVs, 3) the set of rare inherited CNVs and 4) the set of private LGD-inherited SNVs in genes in the lower 50% percentile of RVIS scores. Additionally, we accounted for familial stratification effects by adding a family-level stratum to the model. Using data from the 1,786 quads, we found robust effects for de novo events? each de novo CNV increased the risk for ASD by 2.05-fold, while each de novo SNV increased risk by 1.72-fold (p = 0.0004 and p < 1 x10-7, respectively; Table 4.4). In addition, the results from 
Expression pattern 
(GTEx portal data)
Brain tissues Testis
RPKM
4 -
3 -
2 -
1 - 
 78 
this analysis confirm a statistically independent role for inherited mutations in ASD risk: Rare inherited CNVs contribute an increase in risk of 1.23 (p = 0.01), and private LGD SNVs have an odds ratio of 1.11 (p=0.0002). These results suggest that each of the four domains of mutations modeled additively contribute to the risk of ASD, and that they may do so in statistically independent manner.   We examined the ?differential? for each category of mutation in the model, calculated by examining the difference between the percent of probands and percent of siblings with at least one of the mutation type, in order to estimate the proportion of ASD which might be attributable to each type of mutation individually. The strongest differential was seen for de novo SNVs (6.6% differential), suggesting this class of mutations is a large portion of ASD heritability. Although inherited LGD SNVs had a low differential in the model (0.1%), we note that the RVIS score of the mutations plays a crucial role: when examining only inherited LGD SNVs with an RVIS score of 10 or lower, the differential jumps to 2.7% (probands=50.5%, siblings=47.9%). Furthermore, even stronger differentials were observed when examining SRS discordant quads only (3.7%), while SRS concordant quads had only a 1.6% differential in inherited LGD SNVs (with RVIS < 10).   
 Figure 4.5 Combined risk model for SNVs and CNVs, inherited and de novo Integrative risk model for ASD, based on de novo and inherited events, and covering both SNVs and CNVs. The model used is a stratified logistic regression model, which uses proband-sibling pairs to estimate the odds ratio (i.e., risk of ASD) for each type of event.  
sibling
motherfather
proband
de novo SNV (disruptive)
de novo CNV
Inherited CNVs (rare)
Odds Ratio P value
1.72
2.05
1.23
< 1 x 10-5
0.0004
0.01
Inherited SNVs (rare) 1.11 0.0002
Figure 5
P ( ASD ) ~  B0 + B1(de novo SNVs) + B2(de novo CNVs) + 
                    B3(Inherited CNVs) + B4(Inherited SNVs) + 
                    Strata(family)
 79 
 % Probands % Siblings % Differential P-value Odds Ratio Inherited CNVs 26.1 23.7 2.4% 0.01045 1.23 Inherited SNVs 92.1 91.2 0.1% 0.00024 1.11 De novo CNVs 3.8 1.8 2.0% 0.00039 2.05 De novo SNVs 15.3 8.7 6.6% 0.00000 1.72  Table 4.4: Summary of logistic regression model results  Our statistical risk model did not uncover any statistically significant interactions between the main effects, reflecting the relative rarity of each effect type in each individual. In addition, we found no evidence in the data for the presence of non-linear, exponential risk based on the summed number of mutations (methods).  4.5 Discussion We present the complete ascertainment of inherited and de novo mutations in 2,388 families from the Simons Simplex Collection (SSC) of autism. Using a complete ?ground-up? reanalysis of the data, and multiple variant discovery tools, we developed a resource of raw data and genetic variants for use throughout the community. Together with the extensive phenotype information present in the SSC, we believe that this resource will enable new and innovative research on the genetic basis and impact of autism.  In the present analysis, we have explored the effect of rare inherited variation on the risk of autism. Our results extend previous work in understanding the role of rare inherited CNVs in ASD (Krumm et al. 2014; Poultney et al. 2013; Pinto et al. 2010), by providing crucial evidence that rare inherited SNVs are also a risk factor for simplex ASD. In particular, we find that private inherited SNVs which likely disrupt the protein product are enriched in probands?but crucially, that these SNVs which disrupt genes intolerant to functional variation in control populations (measured by the RVIS score in the ESP6500 data set) are most enriched in ASD. Disruptive mutations in these genes face 
 80 
strong selective pressure, suggesting that they may have significant phenotypic consequences.  We have also used the set of 1,786 quads in this analysis to confirm the results of previous work (Krumm et al., 2013) which examined the effect of inherited CNVs in simplex ASD cases on their phenotypes. In particular, we confirm that the SRS score? and especially the differential in scores between probands and their siblings (defined as discordant or concordant SRS scores)? is an important discriminant in CNV burden. Furthermore, we also confirm that probands with lower IQ scores are enriched for inherited CNVs, but those with higher IQ are not enriched in comparison to their siblings. These results suggest that more severely impacted ASD cases (as measured by SRS or IQ) are also enriched for   Exome sequence data provides the basis for detailed, gene-level examination of variants. In this study, we leverage exome sequence data to discover SNVs and CNVs, and use the convergence of inherited and de novo events to identify new ASD risk factors. We hypothesize that rare inherited mutations can highlight genes in one of two ways. First, they can narrow the focus onto those genes with identified de novo mutations but no or few recurrent de novo mutations. In this study, we identify several genes, such as RIMS1 and CSMD1, for which a combination of inherited and de novo mutations of both SNVs and CNVs paints a strong picture of ASD risk for these genes.   In a second approach, we find that examination of multiple mutations within each probands reveals a ?multiple hit? model for ASD. Critically, this study is the first to examine the complete genetic picture at an individual level in the context of autism. In particular, we found an inherited two-exon intra-genic deletion of NRXN3 and a de novo missense mutation of NLGN2 in 13367.p1. Both of these genes have been identified as ASD risk factors, but crucially, they are also protein-protein interacting partners. The neuroligin-neurexin interaction has long been hypothesized to be a key underlying pathway in ASD pathology (for review see Abrahams and Geschwind 2008), but to our knowledge this is the first identification of a case with mutations in both binding partners. 
 81 
 Finally, our ground-up reanalysis resulted in the identification of several new de novo mutations in genes, and added to the extensive existing work on de novo mutations in the SSC. The additional 49 validated de novo mutations discovered in the present analysis added several new genes to the ?recurrently hit? list of genes (Table S2). Several of these ?newly recurrent? genes form a network of protein-protein interactions (Figure 4.1), suggesting an underlying neurobiological pathway for ASD. Interestingly, three of these genes (GIGYF1, GIGYF2, and GRB10) all participate in IGF-pathway signaling, dysregulation of which has been previously suggested as an underlying neurobiological cause of ASD (Chen et al. 2014).  
 82 
V. Summary and Future Directions  5.1 Summary of results This thesis describes the development of a method (CoNIFER) to find a new class of genetic variation (small CNVs) and its use in more comprehensively assaying genetic variation in simplex autism families. Using exome sequence data from over 1,800 quads and nearly 600 trios in conjunction with CoNIFER and newer bioinformatics methods, I discovered new de novo variants and assayed uniformly for the first time in both inherited CNVs and SNVs. Integration of the spectrum of genetic variants yielded new insight into their relative contribution to simplex ASD risk and highlighted how convergence of multiple mutation types can identify new ASD risk genes. Although these results are in strong agreement with the highly heritable nature of ASD established through twin and sibling studies, and exome sequencing has identified over a dozen ASD-related genes, clear and specific genotype-phenotype correlations have not yet been established. Here, I summarize and highlight some of the conclusions from this thesis and contextualize them in light of remaining challenges and questions.  5.2 Towards assaying the complete set of genetic variation The importance of de novo and ultra-rare transmitted variants in the etiology of ASD makes the sensitivity and specificity of variant detection and discovery algorithms a critical factor in our understanding of ASD genetics. In this thesis, I took two approaches to increasing sensitivity and/or specificity of the variants discovered.  In chapter one, I describe CoNIFER, a new algorithm that extends the use of exome sequence data and leverages it to find genic CNVs. In particular, the targeted nature of the exome sequence data to the exons provides a powerful way to assay very small CNVs that disrupt genes. These CNVs are often too small to be detected using standard genome-wide microarrays, suggesting they have not been previously accounted for in our understanding of ASD. In chapter two, I found that CoNIFER was able to identify over 40% of novel, validated gene-containing CNVs, which previously had been missed by 
 83 
high-density Illumina 1M/1M Duo (with 1.1 and 1.3 million probes, respectively). Many of these CNVs disrupted individual genes, which were enriched for brain expression or previous associations with neurodevelopmental disorders. In addition, the high sensitivity afforded by CoNIFER sharpened the contrast between probands and unaffected siblings, and I was able to demonstrate a significant increased burden of CNVs in the affected probands. This differential burden was especially strong for proband-sibling pairs who were strongly discordant for social phenotype (via the SRS score). Finally, the results suggested that inherited CNVs in probands were maternally inherited.  In chapter three, I use multiple tools (i.e., FreeBayes and GATK, CoNIFER and XHMM) to establish both a highly specific set of variants (based on the intersection of each pair) and a highly sensitive set of variants (based on the union). The use of multiple algorithms resulted in an increased yield of loss-of-function de novo variants, added two new genes (ASH1L and IRF2BPL), to the list of recurrently hit genes with multiple observed de novo likely gene disrupting (LGD) mutations and implicated a new functional network of IGF-related proteins. Furthermore, the highly specific intersection set of these tools provided the basis for comparing rates and burden between probands and unaffected siblings.  However, exome sequencing is biased by amplification, sequencing and enrichment biases, which are especially dependent on the %GC content of targets. These biases reduce coverage or prevent sequencing altogether of regions with high %GC nucleotide content. As can be seen in Figure 5.1, GC-bias can prevent adequate coverage of entire domains and exons in genes, in turn preventing variant discovery. Importantly, these regions harbor previously identified autism and neurodevelopmental genes, such as SHANK3. De novo mutations of SHANK3 were expected to account for up to 1% of autism cases, yet no de novo mutations have been observed so far using exome sequencing (in contrast, several mutations in SHANK2 have been observed). These results suggest that a large fraction of ASD-related mutations and genes are not assayed using current data. Several methods are available to assay the GC-rich portion of the exome, including molecular inversion probe (MIP)-based resequencing and whole-genome sequencing, although both are still prone to some bias. Future studies will need to not 
 84 
only critically evaluate which portion of the genome/exome is missed but will need to systematically overcome this bias.  
 Figure 5.1: High genomic GC nucleotide content (green histogram) hinders whole-exome sequencing for some genes, such as SHANK3 (right). Individual coding exons are shown in blue with non-genic sequences removed. Black lines indicate mean sequence depth for a single ASD trio and dark gray intervals indicate maximum and minimum depth across the family. The red dashed line indicates the minimum threshold required for accurate variant detection.   Whole-genome sequencing (WGS), in particular, is likely to provide the underlying basis for a ?complete? view of genetic variation, even if results are initially ?masked? to the genic portion of the genome. In addition to being much less prone to GC bias, the uniform coverage of WGS across all portions of the genome enables algorithms that capitalize on paired-end sequence information. Such algorithms are well suited for identifying fine or complex structural variation, such as very small CNVs (affecting only one or two exons), which still elude exome-based methods (as a single exon or ?probe? is nearly always insufficient to identifying a CNV).   Moreover, WGS addresses two important remaining spaces of the human genome that have not been fully assayed to date: First, non-genic space and regulatory elements, and second, genes or regions that are part of segmentally duplicated portions. Both are more fully described below.  Mutations in non-genic regulatory elements of the genome are potentially a pervasive component of ASD genetic etiology, especially in the context of a susceptible genetic 
 85 
background. Such regulatory backgrounds may in fact be relatively more common than the mutations seen in exons and genes. There is little present understanding of the expected penetrance of such regulatory mutations, and it is possible that the penetrance of ASD-implicated regulatory mutations mirrors that of the genes they regulate?that is, regulatory mutations of highly penetrant genes such as CHD8 may be equally penetrant (e.g., as in mutations affecting CFTR expression, see Rowntree and Harris 2003), while other mutations may be of significantly lower penetrance (Walsh et al. 2008). Crucially, however, these mutations and backgrounds are still expected to be very rare and fall below the frequency threshold of genome-wide association studies.   Another possibility for regulatory mutations is that they contain a subset of highly penetrant de novo mutations with similar properties to loss-of-function mutations seen in genes. Yet, given the difficulty in identifying important regulatory elements in the genome, our limited understanding of their effects, and the enormous genomic space within which these mutations can fall, such mutations may be extraordinarily difficult to identify in a de novo paradigm. Instead, such mutations may be more readily identifiable using large families with recurrent autism and no strongly identifiable genic mutations.   Finally, studies of mutations in ASD have systematically ignored or filtered genes in segmental duplications, due in part to the ambiguity and difficulty involved in accurately mapping sequence data to incomplete or truncated haplotypes. However, segmental duplications may be very important to a complete understanding of ASD, as they contain over 1,000 genes and are enriched for neurodevelopmental and brain-related functions (Sudmant et al. 2010). These duplications represent the youngest structural changes to the human genome?and many are specific to the homo lineage. Furthermore, while segmental duplications are known to be a critical driving factor in non-allelic homologous recombination (NAHR), it is not yet understood how the breakpoints of NAHR events may affect the genic content of segmental duplications. Study of these regions has not been comprehensively possible using either array-based or exome sequence data; however, WGS data and analysis may reveal that these regions can contribute additional genetic risk for ASD. 
 86 
 5.3 Understanding normal variation and pathogenic variants The extreme locus and variant heterogeneity of ASD has made statistical association and identification of genes and variants difficult. To combat this heterogeneity, many strategies rely on either reducing the genetic space searched or adding additional abstraction to the genetic data examined. In this thesis, a critical insight was the use of the RVIS score in chapter four. The RVIS score was able to stratify private, LGD-inherited SNVs into two groups: a group of variants that affected genes where functional variation was commonly observed, and a group of variants that truncated genes which were intolerant to functional variation. This is broad distinction at a gene level and reduction of the genetic/genomic space (i.e., it reduced the number of genes studied) was critical in establishing private, LGD-inherited SNVs as a risk factor for ASD as well as suggesting new candidate ASD genes.   However, gene-based scores such as the RVIS score may only be a stepping stone to understanding genetic variants at a variant level. Currently, the agreement between variant pathogenicity scores (such as PolyPhen2, SIFT, MutationTaster, etc.) is low and poorly stratifies risk versus non-risk variants in ASD. Although some methods, such as the CADD score (Kircher et al., 2014), are designed to aggregate multiple variant-level scores and their genomic context, I found that the CADD score did not substantially add to the identification of SNVs with ASD risk (notably, the CADD model does not include the RVIS score).  In order to improve our understanding of ultra-rare variants and their functional impact, additional work in three domains is needed. First, while the NHLBI?s ESP 6500 project has provided an excellent base for understanding rare variation, cohorts of 50,000+ are likely needed to understand the spectrum of ultra-rare variation in human populations. In addition, trio- or family-based studies of control individuals are needed in order to understand the specific pattern of de novo variation.  
 87 
Second, a critical next step in understanding functional variation in humans is improving the definition and identification of gene and transcript models. Over 90% of transcripts undergo alternative splicing, and over 60% are tissue-regulated (Wang et al. 2008). Crucially, these gene and transcript models are the first step in the identification of mutations as non-genic, synonymous, nonsynonymous, or likely gene disrupting (LGD). In many cases, multiple transcripts exist for a single locus/gene, resulting in multiple annotations for just one SNV. In this case, many publications (including this one) take the ?most severe? effect as the effect of that SNV, a practice that biases towards more severe effects to be reported, or the ?longest transcript?, which likely over-estimates the number mutations that fall in coding regions.   I analyzed the frequency of variant annotation disagreement for variants on currently known transcripts using a set of 2,529 de novo missense, stop-gained and frameshift variants from published sources, by taking the ?most severe? functional annotation for each variant (O'Roak et al. 2012b; Iossifov et al. 2012; Sanders et al. 2012). Of these, 151 (6%) were also annotated as intronic (i.e., not protein-altering), and this fraction was slightly higher for frameshift and nonsense mutations (32/418, 7.7%) than for missense mutations (119/2,111, 5.6%), suggesting a measurable bias due to the selection of the ?most severe? effect at each variant.   Long-read technology and RNAseq has the potential to dramatically improve our understanding of transcript models and increase our confidence in functional variant annotation (for reviews of this topic, see Garber et al. 2011 and Mutz et al. 2013). These improvements will not only create a canonical reference set of transcripts for each gene but have the potential to create tissue- or cell-type specific transcripts (Lonsdale et al. 2013). This information will make it possible to annotate variants to a much smaller subset of transcripts, improving the specificity of the variant effects. Finally, larger transcript-level datasets will also improve our understanding of splice-site variation. Current annotation tools crudely annotate a variant as a ?splice-site disrupting variant? based on proximity to exons alone; future models of variant effects and transcripts will undoubtedly improve on this. 
 88 
 Finally, the wealth of information available at variant, gene, and gene-network levels continues to be a challenge to the interpretation of genomic variants. Future interpretation schemes will need to integrate information across levels, including, for example, the impact of an SNV on a gene at an amino acid level, the transcripts and expression levels of that gene within specific tissues of interest, and how that gene is connected within functional pathways and networks. Encouragingly, such improvements can provide both a wealth of value for new experiments as well as dramatically improve the value of existing data, such as the exome data from the SSC..  5.4 Defining subtypes of ASD Exome sequencing provides a unique opportunity not only to understand ASD from a genetic viewpoint but also from a phenotypic point of view. This ?genotype-first? approach, in which ?genetic subtypes? of ASD are defined by mutations in genes or pathways promises to create meaningful distinctions within the ASD spectrum for both researchers and patients (Stessman et al. 2014). By assaying and stratifying patients based on the genetic data, the ?search space? for possible associated genotypes is greatly reduced, and individual patterns (not significant at a study level) become more pronounced. One example of such ASD ?genetic subtypes? is the CHD8 mutation, associated with larger head circumference, gastro-intestinal problems and distinctive facial features. Notably, however, mutations in CHD8 are not specifically associated with low IQ, and nearly half of patients with disruptive de novo mutations in CHD8 have normal or above normal IQ. Follow-up study of CHD8 mutations in zebrafish has confirmed these phenotypic effects (including slower gastrointestinal motility and wider spaced eye cusps, a proxy for brain or head size; Bernier et al. 2014).   The ability to find genetically and phenotypically similar subgroups within the larger ASD spectrum may also lead to the identification genes which are ?specific? to the social and language deficits of ASD, rather than those that broadly impair IQ or cortical functioning. In chapters three and four, I explored how rare inherited CNVs were enriched in probands, and specifically in those probands with the poorest (i.e., highest) 
 89 
scores on the Social Responsiveness Scale (SRS), which tracks a child?s ability to communicate and interact socially. The families with the greatest spread in SRS score between the affected probands and his/her unaffected sibling also showed the greatest probands enrichment for inherited CNVs. In contrast, I saw no additional enrichment of rare inherited CNVs amongst probands with lower IQs, suggesting that many rare inherited CNVs may be specific to ASD-related phenotypes. Interestingly, when I examined the relationship of private inherited LGD SNVs (in chapter four) and these two phenotypic measures, I found a different relationship: Probands with lower IQ were enriched for private LGD SNVs. The difference in phenotypic effect for inherited CNVs and inherited SNVs may be explained in several ways. First, it is possible that CNV duplications, which triple gene dosage as opposed to reducing copy number (as CNV deletion or LGD mutation of a genes does), are more likely to result in social impairment, perhaps because triplication would dysregulate pathways, rather than abrogate their function. A second possibility is that a subset of private inherited LGD SNVs has a higher phenotypic impact than do CNVs; in this scenario, the observed inherited SNVs would be expected to be very young mutations, which are quickly purged from populations. A final possibility is the reverse, in that it is in fact CNVs that are more phenotypically impactful (and thus more likely to have more severe IQ-reducing effects) and are purged so quickly from populations that only relatively more benign CNVs remain as transmitted CNVs (in this scenario, the highly impactful CNVs are observed simply as de novo CNVs).   5.5 Defining a gradient of simplex and multiplex autism The study of simplex autism families and a quad-based approach that includes an unaffected sibling has been instrumental in understanding both the effect of de novo mutations in ASD, as well as the relative risk of inherited genetic mutations for ASD. However, the ascertainment of simplex autism families is complicated by effects of stoppage (i.e., families are more likely to stop having more children when one child is diagnosed with ASD), and the fact that family sizes are often too small to rule out an inherited phenotype. A better understanding of the impact and phenotypic effects of inherited and de novo mutations will be gained by examination of larger families, 
 90 
including those with multiple affected offspring. In particular, a community-based or longitudinal study of all ASD diagnoses is a logical next step in understanding the balance of influence from inherited and de novo genetic etiologies in the risk of ASD.   5.6 Understanding complex genetic etiologies at a family level The exome sequencing and targeted resequencing studies of the past three years have created a wealth of new information about specific, highly penetrant genes, such as CHD8 and SCN2A, and their role and risk in ASD. These studies have established that de novo mutation and rare variants play a dominant role in the genetic etiology of ASD and have identified mutations in highly penetrant genes that are likely causative. In this thesis, I have examined the role of inherited mutations in the context of ASD genetic etiology and identified inherited ASD risk factors and genes. Taken together, however, the identified gene candidates from both studies of de novo and inherited variants can only explain a small fraction of the overall heritable risk for ASD.   Identification of additional specific genes implicated in ASD can be achieved through the study of much larger cohorts (Figure 5.2) in order to establish genome-wide significance. However, these estimates assume a very high and constant penetrance and ASD risk for all de novo mutation of genes. Critically, if bona fide ASD risk genes are not fully penetrant, then even larger cohorts of samples will be required for statistical association and discovery of genes. In fact, the fraction of genes with multiple de novo hits that are not fully penetrant can be estimated by examining how many of these genes also have inherited LGD mutations in siblings. From the data generated in the reanalysis of the SSC exomes, 40/128 of genes with two or more de novo mutations in probands (and none in siblings) also have at least one LGD SNV mutation or rare CNV inherited in siblings, suggesting that these genes have reduced penetrance for ASD. Thus, this analysis suggests that very large cohorts will be required for the large-scale identification single ASD risk genes using a de novo sequencing approach.  
 91 
 Figure 5.2: Expected hit rate (or sensitivity) of true positive genes discovered using trio sequencing studies (under a family-wise error rate of 5%; that is, each gene passes exome-wide significance of 2.6 ? 10?6). We estimate the power of trio sequencing to detect statistically significant associations for disease-associated genes, under the assumption that 10% or 20% of singleton mutations could be fully penetrant.  A critical component to identifying additional ASD genes and the underlying neurobiological pathways will be comprehensive analysis of mutations at an individual level. Such an approach will leverage the existing genotype data?including ?one-off? de novo mutations in genes as well as inherited CNVs and SNVs?along with pathway and protein network metadata about the genes and pathways affected. Identification of interactions between genes with inherited and de novo mutations will be a powerful method to implicate new candidate genes and may be able to increase the fraction of cases for which a plausible genetic etiology can be identified.   As an example of such individual-level pathway identification, in chapter four I identified an ASD proband with a de novo missense mutation NLGN2, in addition to an inherited 2-exon intragenic deletion of NRXN3. These two genes are part of the neuroligin-neurexin interaction pathways, which directly mediate the trans-synaptic interface of neurons. While both genes have been implicated in ASD, this specific case illustrates how an incompletely penetrant inherited mutation and a de novo missense mutation can act in a 
NATURE NEUROSCIENCE VOLUME 17 | NUMBER 6 | JUNE 2014 767
R E V I E W
random mutation modeling40 to calculate the likelihood that observed 
(de novo) mutations have a damaging effect. Similar prioritizations 
are provided by tools that score individual mutation severity (SIFT, 
PolyPhen2, MutationTaster, MutPred, CONDEL, etc.), some of which 
can be adapted to a gene-based prioritization score from genome-wide 
data41. These population data provide a powerful unbiased approach 
to home in on genes that are likely to be among the most penetrant 
because of the complete absence of disruptive variation in the general 
population (for example, CHD8 or DYRK1A). A critical aspect of such 
analyses is the reliability of a particular gene model. Most human 
genes show evidence of alternative splice forms, many of which have 
no known function. Apparent hotspots of mutation for a particular 
exon (often exon-intron boundaries) in both cases and controls may 
suggest misannotation, the presence of a processed pseudogene or an 
alternative, nonfunctional splice form.
Pathway enrichment and links to cancer biology
Another popular approach to discern the most important gene 
candidates for further disease association and characterization has 
been to identify specific biological networks of genes enriched in 
cases as compared to controls. Although this approach cannot be used 
unequivocally to define causality, membership of a specific gene in a 
particular protein-protein interaction (PPI) or coexpression network 
may increase the likelihood of its association with disease. Numerous 
studies have reported significant enrichment of both de novo 
CNV and single-nucleotide variant (SNV) mutations in particular 
pathways3,4,42,43. O?Roak et al.3, for example, reported a significant 
enrichment of de novo disruptive autism mutations among proteins 
associated with chromatin remodeling and B-catenin and WNT 
signaling?a finding that was replicated in a follow-up resequencing 
study of more than 2,400 probands. One recent instance, in which 
membership of a new candidate gene in a PPI network led to the 
discovery of an autism-associated gene, is ADNP. A single ADNP 
LoF mutation was initially observed in exome sequencing studies. 
Although the observed mutation frequency in this gene did not reach 
statistical significance when cases and controls were compared20, 
it was strongly implicated in the PPI network originally defined 
by O?Roak et al.3 Targeted resequencin  experiments combined 
with clinical exome sequencing identified s veral more cases with 
de novo mutations and remarkably similar phenotypes represent-
ing a new SWI-SNF?related autism syndrome (Fig. 3)44. Notably, 
many of the genes implicated in the B-catenin pathway have also been 
described as mutated in patients with ID1 but not in patients with 
SCZ. Similarly, an enrichment of genes interacting with FMR1 (also 
known as FMRP)?the gene responsible for fragile X syndrome?has 
been reported with de novo mutations in ASD5, epilepsy11 and, most 
recently, SCZ10,45. Whether this observation is due to the relative high 
incidence of cases that also presented with comorbid ID remains to 
be determined.
In addition to PPI networks, studies of coexpression have shown 
enrichment for specific spatio-temporal patterns of expression. 
A study of coexpressed genes affected by de novo mutations reported 
an enrichment in fetal prefrontal cortical network in SCZ8, which is 
in line with the finding by Xu et al.9 that genes with higher expression 
Table 4 Recurrent identical de novo mutations in 6 genes identified in 11 exome studies with different neurodevelopmental phenotypes
Gene Coding effect Mutation (genomic DNA level) Mutation (cDNA level)
Mutation  
(protein level) Study Disorder
ALG13 Missense ChrX(GRCh37):g.110928268A>G NM_001099922.2:c.320A>G p.Asn107Ser de Ligt et al.1 ID
ALG13 Missense ChrX(GRCh37):g.110928268A>G NM_001099922.2:c.320A>G p.Asn107Ser Allen et al.11 EE
ALG13 Missense ChrX(GRCh37):g.110928268A>G NM_001099922.2:c.320A>G p.Asn107Ser Allen et al.11 EE
KCNQ3 Missense Chr8(GRCh37):g.133192493G>A NM_001204824.1:c.328C>T p.Arg110Cys Rauch et al.2 ID
KCNQ3 Missense Chr8(GRCh37):g.133192493G>A NM_001204824.1:c.328C>T p.Arg110Cys Allen et al.11 EE
SCN1A Splice donor LRG_8:g.24003G>A NM_006920.4:c.602+1G>A p.? Allen et al.11 EE
SCN1A Splice donor LRG_8:g.24003G>A NM_006920.4:c.602+1G>A p.? Allen et al.11 EE
CUX2 Missense Chr12(GRCh37):g.111748354G>A NM_015267.3:c.1768G>A p.Glu590Lys Rauch et al.2 ID
CUX2 Missense Chr12(GRCh37):g.111748354G>A NM_015267.3:c.1768G>A p.Glu590Lys Allen et al.11 EE
SCN2A Missense Chr2(GRCh37):g.166198975G>A NM_021007.2:c.2558G>A p.Arg853Gln Allen et al.11 EE
SCN2A Missense Chr2(GRCh37):g.166198975G>A NM_021007.2:c.2558G>A p.Arg853Gln Allen et al.11 EE
DUSP15 Missense Chr20(GRCh37):g.30450489G>A NM_080611.2:c.320C>T p.Thr107Met Neale et al.7 ASD
DUSP15 Missense Chr20(GRCh37):g.30450489G>A NM_080611.2:c.320C>T p.Thr107Met Fromer et al.10 SCZ
EE, epileptic encephalopathies; ASD, autism spectrum disorder; ID, intellectual disability; SCZ, schizophrenia.
Figure 1 Genes with recurrent de novo mutations in four 
neurodevelopmental disorders. (a) We estimate the number of fully penetrant 
genes that can explain disease once mutated, based on a de novo model 
using the ?unseen species problem?. We consider all recurrent missense or 
LoF de novo mutations pathogenic, as well as a defined fraction of mutations 
in genes observed just once (because it is unlikely that all de novo mutations 
are pathogenic). The ratio between genes mutated recurrently and the rate 
of singleton mutations suggests an estimate for the true number of genes 
pathogenic when mutated. Including more singleton mutations increases 
the fraction of each disorder explained by single de novo SNVs at the cost 
of including more genes as pathogenic. Initial exome sequencing studies of 
epilepsy and ID focused on specific pediatric subtypes or the most severe 
cases; thus, the number of generalized epilepsy- or ID-associated genes is 
likely to be much higher. EE, epileptic encephalopathies; ASD, autism spectrum disorder; ID, intellectual disability; SCZ, schizophrenia. (b) Expected  
hit rate (or sensitivity) of true positive genes discovered using trio sequencing studies (under a family-wise error rate of 5%; that is, each gene passes exome-
wide significance of 2.6 ? 10?6). We estimate the power of trio sequencing to detect statistically significant associations for disease-associated genes,  
under the assumption that 10% or 20% of singleton mutations could be fully penetrant (vertical bars in a). We assume the distribu ion of these genes is 
uniform within each disorder and that they do not differ significantly from all genes in terms of length and mutability, although these are taken into account 
when determining significance.
Number of trios sequencedFraction of pathogenic singleton
mutations
N
u
m
b
e
r 
o
f 
m
o
n
o
g
e
n
ic
d
is
e
a
s
e
 g
e
n
e
s
N
u
m
b
e
r 
o
f 
g
e
n
e
s
 d
e
te
c
te
d
ASD
SCZ
EE
ID
20% singletons pathogenic
10% singletons pathogenic
0.05 0.10 0.15 0.20 0.25 0.30
0
200
400
600
800
1,000
1,200
a b
1,400
1,600
0
50
100
150
200
0 2,000 4,000 6,000 8,000 10,000
 92 
?synergistically? pathogenic manner. Using this framework, in conjunction with additional information about cellular- and tissue-level co-expression, will undoubtedly reveal additional ASD risk gene combinations and pathways.  5.7 Future directions New sequencing technology and the establishment of large well-phenotyped family-based cohorts, such as the SSC, have enabled the systematic discovery of mutations that underlie the genetic etiology of ASD and ID. The fraction of explained genetic etiology is a measurable indicator of progress. In 2005, ~10% of the genetic etiology of autism was understood. Within seven years, advances in genomics technology facilitated the rapid discovery of de novo SNVs and CNVs leading to the discovery of disruptive genetic variants that may account for another ~25% of cases. Although the extent of locus heterogeneity in ASD and ID was initially underestimated, the development of exome sequencing and low-cost/high-throughput MIP-based resequencing has strongly implicated two dozen novel genes accounting for >3% of disease.  Many of these genes may in fact define distinct clinical ?subtypes? of ASD upon detailed examination of patients with a common genetic etiology, consistent with the hypothesis that autism is an umbrella term underlying many different and distinct ?autisms?. This is reminiscent of the work with CNVs, where the identification of recurrent mutations and patient follow-up led to the identification of novel syndromes and subtypes from idiopathic cases of disease (Sharp et al. 2006). There is already compelling evidence for this based on an assessment of multiple patients with ADNP (Helsmoortel et al. 2014), DYRK1A (Courcet et al. 2012) and CHD8 mutations (Bernier et al. 2014), which appear to define microcephalic and macrocephalic subtypes, respectively. Alternatively, the ?genotype-first? approach may also reveal phenotypic variability of genic mutations across a diverse array of neuropsychiatric and neurodevelopmental disorders. Similar to the CNVs of 16p11.2 and 15q13.3, which are associated with several disorders, there is evidence for this already for mutations associated with SETBP1 (Hoischen et al. 2010) and SCN2A, resulting in very different outcomes. Establishment of cohorts with different 
 93 
types of mutations and careful study of their phenotypes and comorbidities may reveal specific protein domains and mutation types associated with different diseases.   The knowledge of specific genes, loci, and pathways now spurs the development of functional experiments. These include using novel methods with induced pluripotent stem cells to assay specific mutations in a patient with Timothy syndrome (Yazawa et al. 2011; Pa?ca et al. 2011), as well as established model systems, such as mouse and zebrafish models to explore the roles of CHD8 (Bernier et al. 2014), DYRK1A (Ahn et al. 2006) and PTEN (Backman et al. 2001) in brain volume.   Improving knowledge of ASD genetic and neurobiological etiologies will aid in the diagnosis of ASD/ID subtypes, allowing for specific recruitment for clinical trials and the development of targeted therapeutics for each subtype. This model is akin to the heterogeneity seen in other broad categories of human disorders and disease and has proven to be successful in many cases (e.g., specific therapeutics for a particular mutation in cystic fibrosis or specific forms of cancer). Integrating the genetics, neurology, and pathophysiology of these disorders holds considerable promise not only for our understanding of the biology of the human brain but also for potential treatments. 
 94 
REFERENCES   Abrahams BS, Geschwind DH. 2008. Advances in autism genetics: on the threshold of a new neurobiology. Nature Reviews Neuroscience 9: 341?355. Ahn K-J, Jeong HK, Choi H-S, Ryoo S-R, Kim YJ, Goo J-S, Choi S-Y, Han J-S, Ha I, Song W-J. 2006. DYRK1A BAC transgenic mice show altered synaptic plasticity with learning and memory defects. Neurobiol Dis 22: 463?472. Alkan C, Kidd JM, Marques-Bonet T, Aksay G, Antonacci F, Hormozdiari F, Kitzman JO, Baker C, Malig M, Mutlu O, et al. 2009. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet 41: 1061?1067. Amir RE, Van den Veyver IB, Wan M, Tran CQ, Francke U, Zoghbi HY. 1999. Rett syndrome is caused by mutations in X-linked MECP2, encoding methyl-CpG-binding protein 2. Nat Genet 23: 185?188. Backman SA, Stambolic V, Suzuki A, Haight J, Elia A, Pretorius J, Tsao MS, Shannon P, Bolon B, Ivy GO, et al. 2001. Deletion of Pten in mouse brain causes seizures, ataxia and defects in soma size resembling Lhermitte-Duclos disease. Nat Genet 29: 396?403. Bailey A, Le Couteur A, Gottesman I, Bolton P, Simonoff E, Yuzda E, Rutter M. 1995. Autism as a strongly genetic disorder: evidence from a British twin study. Psychol Med 25: 63?77. Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, Schwartz S, Adams MD, Myers EW, Li PW, Eichler EE. 2002. Recent segmental duplications in the human genome. Science 297: 1003?1007. Bajpai R, Chen DA, Rada-Iglesias A, Zhang J, Xiong Y, Helms J, Chang C-P, Zhao Y, Swigut T, Wysocka J. 2010. CHD7 cooperates with PBAF to control multipotent neural crest formation. Nature 463: 958?962. Bamshad MJ, Ng SB, Bigham AW, Tabor HK, Emond MJ, Nickerson DA, Shendure J. 2011. Exome sequencing as a tool for Mendelian disease gene discovery. Nature Reviews Genetics 12: 745?755. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=21946919&retmode=ref&cmd=prlinks. Barton A, Fendrik AJ. 2013. Sustained vs. oscillating expressions of Ngn2, Dll1 and Hes1: A model of neural differentiation of embryonic telencephalon. Journal of Theoretical Biology 328: 1?8. 
 95 
Batsukh T, Pieper L, Koszucka AM, Velsen von N, Hoyer-Fender S, Elbracht M, Bergman JEH, Hoefsloot LH, Pauli S. 2010. CHD8 interacts with CHD7, a protein which is mutated in CHARGE syndrome. Human Molecular Genetics 19: 2858?2866. Bedogni F, Hodge RD, Elsen GE, Nelson BR, Daza RAM, Beyer RP, Bammler TK, Rubenstein JLR, Hevner RF. 2010. Tbr1 regulates regional and laminar identity of postmitotic neurons in developing neocortex. Proceedings of the National Academy of Sciences 107: 13129?13134. Ben-Shachar S, Lanpher B, German JR, Qasaymeh M, Potocki L, Nagamani SCS, Franco LM, Malphrus A, Bottenfield GW, Spence JE, et al. 2009. Microdeletion 15q13.3: a locus with incomplete penetrance for autism, mental retardation, and psychiatric disorders. J Med Genet 46: 382?388. Bernier R, Golzio C, Xiong B, Stessman HA, Coe BP, Penn O, Witherspoon K, Gerdts J, Baker C, Vulto-van Silfhout AT, et al. 2014. Disruptive CHD8 Mutations Define a Subtype of Autism Early in Development. Cell 158: 263?276. Berryer MH, Hamdan FF, Klitten LL, Moller RS, Carmant L, Schwartzentruber J, Patry L, Dobrzeniecka S, Rochefort D, Neugnot-Cerioli M, et al. 2012. Mutations in SYNGAP1Cause Intellectual Disability, Autism, and a Specific Form of Epilepsy by Inducing Haploinsufficiency. Human Mutation 34: 385?394. Betancur C. 2011. Etiological heterogeneity in autism spectrum disorders: more than 100 genetic and genomic disorders and still counting. Brain Res 1380: 42?77. Binder DK, Nagelhus EA, Ottersen OP. 2012. Aquaporin-4 and epilepsy eds. C. Steinh?user and D. Boison. FEBS J 60: 1203?1214. Bozdagi O, Tavassoli T, Buxbaum JD. 2013. Insulin-like growth factor-1 rescues synaptic and motor deficits in a mouse model of autism and developmental delay. Mol Autism 4: 9. Cadigan KM. 2008. Wnt/?-Catenin Signaling: Turning the Switch. Developmental Cell 14: 322?323. Campbell CD, Sampas N, Tsalenko A, Sudmant PH, Kidd JM, Malig M, Vu TH, Vives L, Tsang P, Bruhn L. 2011. Population-Genetic Properties of Differentiated Human Copy-Number Polymorphisms. The American Journal of Human Genetics 88: 317?332. Chen J, Alberts I, Li X. 2014. Dysregulation of the IGF-I/PI3K/AKT/mTOR signaling pathway in autism spectrum disorders. International Journal of Developmental Neuroscience 35: 35?41. Chenn A, Walsh CA. 2003. Increased neuronal production, enlarged forebrains and cytoarchitectural distortions in beta-catenin overexpressing transgenic mice. Cereb 
 96 
Cortex 13: 599?606. Chiang DY, Getz G, Jaffe DB, O'Kelly MJT, Zhao X, Carter SL, Russ C, Nusbaum C, Meyerson M, Lander ES. 2008. High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat Methods 6: 99?103. Constantino JN, Gruber CP. Social Responsiveness Scale. Western Psychological Services, Los Angeles http://portal.wpspublish.com/portal/page?_pageid=53,70492&_dad=portal&_schema=PORTAL. Constantino JN, Todorov A, Hilton C, Law P, Zhang Y, Molloy E, Fitzgerald R, Geschwind D. 2013. Autism recurrence in half siblings: strong support for genetic mechanisms of transmission in ASD. Mol Psychiatry 18: 137?138. Cooper GM, Coe BP, Girirajan S, Rosenfeld JA, Vu TH, Baker C, Williams C, Stalker H, Hamid R, Hannig V, et al. 2011. A copy number variation morbidity map of developmental delay. Nat Genet 43: 838?846. Courcet J-B, Faivre L, Malzac P, Masurel-Paulet A, Lopez E, Callier P, Lambert L, Lemesle M, Thevenon J, Gigot N, et al. 2012. The DYRK1A gene is a cause of syndromic intellectual disability with severe microcephaly and epilepsy. J Med Genet 49: 731?736. Cukier HN, Dueker ND, Slifer SH, Lee JM, Whitehead PL, Lalanne E, Leyva N, Konidari I, Gentry RC, Hulme WF, et al. 2014. Exome sequencing of extended families with autism reveals genes shared across neurodevelopmental and neuropsychiatric disorders. Mol Autism 5: 1. Darnell JC, Van Driesche SJ, Zhang C, Hung KYS, Mele A, Fraser CE, Stone EF, Chen C, Fak JJ, Chi SW, et al. 2011. FMRP stalls ribosomal translocation on mRNAs linked to synaptic function and autism. Cell 146: 247?261. Davidson J, Goin-Kochel RP, Green-Snyder LA, Hundley RJ, Warren Z, Peters SU. 2012. Expression of the Broad Autism Phenotype in Simplex Autism Families from the Simons Simplex Collection. J Autism Dev Disord. de Ligt J, Willemsen MH, van Bon BWM, Kleefstra T, Yntema HG, Kroes T, Vulto-van Silfhout AT, Koolen DA, de Vries P, Gilissen C, et al. 2012. Diagnostic exome sequencing in persons with severe intellectual disability. N Engl J Med 367: 1921?1929. de Vries BBA, Pfundt R, Leisink M, Koolen DA, Vissers LELM, Janssen IM, Reijmersdal SV, Nillesen WM, Huys EHLPG, Leeuw N de, et al. 2005. Diagnostic genome profiling in mental retardation. Am J Hum Genet 77: 606?616. Durand CM, Betancur C, Boeckers TM, Bockmann J, Chaste P, Fauchereau F, Nygren G, Rastam M, Gillberg IC, Anckars?ter H, et al. 2007. Mutations in the gene encoding 
 97 
the synaptic scaffolding protein SHANK3 are associated with autism spectrum disorders. Nat Genet 39: 25?27. Endele S, Rosenberger G, Geider K, Popp B, Tamer C, Stefanova I, Milh M, Kort?m F, Fritsch A, Pientka FK, et al. 2010. Mutations in GRIN2A and GRIN2B encoding regulatory subunits of NMDA receptors cause variable neurodevelopmental phenotypes. Nat Genet 42: 1021?1026. Fairless R, Masius H, Rohlmann A, Heupel K, Ahmad M, Reissner C, Dresbach T, Missler M. 2008. Polarized Targeting of Neurexins to Synapses Is Regulated by their C-Terminal Sequences. J Neurosci 28: 12969?12981. Fischbach GD, Lord C. 2010. The Simons Simplex Collection: A Resource for Identification of Autism Genetic Risk Factors. Neuron 68: 192?195. Fotaki V, Dierssen M, Alc?ntara S, Mart?nez S, Mart? E, Casas C, Visa J, Soriano E, Estivill X, Arbon?s ML. 2002. Dyrk1A haploinsufficiency affects viability and causes developmental delay and abnormal brain morphology in mice. Mol Cell Biol 22: 6636?6647. Fromer M, Moran JL, Chambert K, Banks E, Bergen SE, Ruderfer DM, Handsaker RE, McCarroll SA, O?Donovan MC, Owen MJ, et al. 2012. Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. Am J Hum Genet 91: 597?607. Fu Y-H, Kuhl DPA, Pizzuti A, Pieretti M, Sutcliffe JS, Richards S, Verkert AJMH, Holden JJA, Fenwick RG Jr., Warren ST, et al. 1991. Variation of the CGG repeat at the fragile X site results in genetic instability: Resolution of the Sherman paradox. Cell Reports 67: 1047?1058. Garber M, Grabherr MG, Guttman M, Trapnell C. 2011. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods 8: 469?477. Garrison E, Marth G. 2012. Haplotype-based variant detection from short-read sequencing. arXiv q-bio.GN. Gilman SR, Iossifov I, Levy D, Ronemus M, Wigler M, Vitkup D. 2011. Rare De Novo Variants Associated with Autism Implicate a Large Functional Network of Genes Involved in Formation and Function of Synapses. Neuron 70: 898?907. Girirajan S, Campbell CD, Eichler EE. 2010. Human Copy Number Variation and Complex Genetic Disease. Annu Rev Genet. Girirajan S, Rosenfeld JA, Coe BP, Parikh S, Friedman N, Goldstein A, Filipink RA, McConnell JS, Angle B, Meschino WS, et al. 2012. Phenotypic heterogeneity of genomic disorders and rare copy-number variants. N Engl J Med 367: 1321?1331. 
 98 
Glessner JT, Wang K, Cai G, Korvatska O, Kim CE, Wood S, Zhang H, Estes A, Brune CW, Bradfield JP, et al. 2009. Autism genome-wide copy number variation reveals ubiquitin and neuronal genes. Nature 459: 569?573. Guedj F, Pereira PL, Najas S, Barallobre M-J, Chabert C, Souchet B, Sebrie C, Verney C, Herault Y, Arbones M, et al. 2012. DYRK1A: A master regulatory protein controlling brain growth. Neurobiol Dis 46: 190?203. Hach F, Hormozdiari F, Alkan C, Hormozdiari F, Birol I, Eichler EE, Sahinalp SC. 2010. mrsFAST: a cache-oblivious algorithm for short-read mapping. Nat Methods 7: 576?577. Hallmayer J, Cleveland S, Torres A, Phillips J, Cohen B, Torigoe T, Miller J, Fedele A, Collins J, Smith K, et al. 2011. Genetic heritability and shared environmental factors among twin pairs with autism. Arch Gen Psychiatry 68: 1095?1102. Hamdan FF, Gauthier J, Spiegelman D, Noreau A, Yang Y, Pellerin S, Dobrzeniecka S, C?t? M, Perreau-Linck E, Perreault-Linck E, et al. 2009. Mutations in SYNGAP1 in autosomal nonsyndromic mental retardation. N Engl J Med 360: 599?605. H?vik B, Le Hellard S, Rietschel M, Lyb?k H, Djurovic S, Mattheisen M, M?hleisen TW, Degenhardt F, Priebe L, Maier W, et al. 2011. The complement control-related genes CSMD1 and CSMD2 associate to schizophrenia. Biol Psychiatry 70: 35?42. Helbig I, Mefford HC, Sharp AJ, Guipponi M, Fichera M, Franke A, Muhle H, de Kovel C, Baker C, Spiczak von S, et al. 2009. 15q13.3 microdeletions increase risk of idiopathic generalized epilepsy. Nat Genet 41: 160?162. Helsmoortel, C., Vulto-van Silfhout, A. T., Coe, B. P., Vandeweyer, G., Rooms, L., van den Ende, J., et al. (2014). A SWI/SNF-related autism syndrome caused by de novo mutations in ADNP. Nature Genet, 46: 380?384. Hoischen A, van Bon BWM, Gilissen C, Arts P, van Lier B, Steehouwer M, de Vries P, de Reuver R, Wieskamp N, Mortier G, et al. 2010. De novo mutations of SETBP1 cause Schinzel-Giedion syndrome. Nat Genet 42: 483?485. Holder JL Jr., Lotze TE, Bacino C, Cheung SW. 2012. A child with an inherited 0.31%Mb microdeletion of chromosome 14q32.33: Further delineation of a critical region for the 14q32 deletion syndrome. Am J Med Genet 158A: 1962?1966. Hormozdiari F, Alkan C, Eichler EE, Sahinalp SC. 2009. Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Research 19: 1270?1278. Hsueh Y-P, Wang T-F, Yang F-C, Sheng M. 2000. Nuclear translocation and transcription regulation by the membrane-associated guanylate kinase CASK/LIN-2. Nature 404: 298?302. 
 99 
Huang Z. 2005. The origin recognition core complex regulates dendrite and spine development in postmitotic neurons. The Journal of Cell Biology 170: 527?535. International Schizophrenia Consortium. 2008. Rare chromosomal deletions and duplications increase risk of schizophrenia. Nature 455: 237?241. Iossifov I, Ronemus M, Levy D, Wang Z, Hakker I, Rosenbaum J, Yamrom B, Lee Y-H, Narzisi G, Leotta A, et al. 2012. De novo gene disruptions in children on the autistic spectrum. Neuron 74: 285?299. Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, et al. 2009. STRING 8--a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Research 37: D412?6. Kamiya K, Kaneda M, Sugawara T, Mazaki E, Okamura N, Montal M, Makita N, Tanaka M, Fukushima K, Fujiwara T, et al. 2004. A nonsense mutation of the sodium channel gene SCN2A in a patient with intractable epilepsy and mental decline. J Neurosci 24: 2690?2698. Karakoc E, Alkan C, O'Roak BJ, Dennis MY, Vives L, Mark K, Rieder MJ, Nickerson DA, Eichler EE. 2011. Detection of structural variants and indels within exome data. Nat Methods 9: 176?178. Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, Hansen N, Teague B, Alkan C, Antonacci F, et al. 2008. Mapping and sequencing of structural variation from eight human genomes. Nature 453: 56?64. Kim H-G, Kishikawa S, Higgins AW, Seong I-S, Donovan DJ, Shen Y, Lally E, Weiss LA, Najm J, Kutsche K, et al. 2008. Disruption of Neurexin 1 Associated with Autism Spectrum Disorder. The American Journal of Human Genetics 82: 199?207. Klei L, Sanders SJ, Murtha MT, Hus V, Lowe JK, Willsey AJ, Moreno-De-Luca D, Yu TW, Fombonne E, Geschwind D, et al. 2012. Common genetic variants, acting additively, are a major source of risk for autism. Mol Autism 3: 9. Koolen DA, Kramer JM, Neveling K, Nillesen WM, Moore-Barton HL, Elmslie FV, Toutain A, Amiel J, Malan V, Tsai AC-H, et al. 2012. Mutations in the chromatin modifier gene KANSL1 cause the 17q21.31 microdeletion syndrome. Nat Genet 44: 639?641. Korbel JO, Abyzov A, Mu XJ, Carriero N, Cayting P, Zhang Z, Snyder M, Gerstein MB. 2009. PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome Biology 10: R23. Korbel JO, Urban AE, Gruber F, Du J, Royce TE, Starr P, Zhong G, Emanuel B, Weissman SM, Snyder M, et al. 2007. Systematic prediction and validation of breakpoints associated with copy-number variants in the human genome. 
 100 
Proceedings of the National Academy of Sciences of the United States of America 104: 10110. Krumm N, O'Roak BJ, Karakoc E, Mohajeri K, Nelson B, Vives L, Jacquemont S, Munson J, Bernier R, Eichler EE. 2013. Transmission Disequilibrium of Small CNVs in Simplex Autism. Am J Hum Genet 93: 595?606. Krumm N, O'Roak BJ, Shendure J, Eichler EE. 2014. A de novo convergence of autism genetics and molecular neuroscience. Trends Neurosci 37: 95?105. Krumm N, Sudmant PH, Ko A, O'Roak BJ, Malig M, Coe BP, NHLBI Exome Sequencing Project, Quinlan AR, Nickerson DA, Eichler EE. 2012. Copy number variation detection and genotyping from exome sequence data. Genome Research 22: 1525?1532. Kumar RA, Marshall CR, Badner JA, Babatz TD, Mukamel Z, Aldinger KA, Sudi J, Brune CW, Goh G, KaraMohamed S, et al. 2009. Association and Mutation Analyses of 16p11.2 Autism Candidate Genes. PLoS ONE 4: e4582. Lepagnol-Bestel A-M, Zvara A, Maussion G, Quignon F, Ngimbous B, Ramoz N, Imbeaud S, Loe-Mie Y, Benihoud K, Agier N, et al. 2009. DYRK1A interacts with the REST/NRSF-SWI/SNF chromatin remodelling complex to deregulate gene clusters involved in the neuronal phenotypic traits of Down syndrome. Human Molecular Genetics 18: 1405?1414. Levy D, Ronemus M, Yamrom B, Lee Y-H, Leotta A, Kendall J, Marks S, Lakshmi B, Pai D, Ye K, et al. 2011. Rare De Novo and Transmitted Copy-Number Variation in Autistic Spectrum Disorders. Neuron 70: 886?897. Li H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Lichtenstein P, Carlstrom E, Rastam M, Gillberg C, Anckarsater H. 2010. The Genetics of Autism Spectrum Disorders and Related Neuropsychiatric Disorders in Childhood. American Journal of Psychiatry 167: 1357?1363. Litterman N, Ikeuchi Y, Gallardo G, O'Connell BC, Sowa ME, Gygi SP, Harper JW, Bonni A. 2011. An OBSL1-Cul7Fbxw8 ubiquitin ligase signaling mechanism regulates Golgi morphology and dendrite patterning. ed. P. Scheiffele. PLoS Biol 9: e1001060. Lonsdale J, Thomas J, Salvatore M, Phillips R, Lo E, Shad S, Hasz R, Walters G, Garcia F, Young N, et al. 2013. The Genotype-Tissue Expression (GTEx) project. Nat Genet 45: 580?585. Losh M, Childress D, Lam K, Piven J. 2008. Defining key features of the broad autism phenotype: a comparison across parents of multiple- and single-incidence autism families. Am J Med Genet B Neuropsychiatr Genet 147B: 424?433. 
 101 
Marshall CR, Noor A, Vincent JB, Lionel AC, Feuk L, Skaug J, Shago M, Moessner R, Pinto D, Ren Y, et al. 2008. Structural variation of chromosomes in autism spectrum disorder. Am J Hum Genet 82: 477?488. Morita, M., Ler, L. W., Fabian, M. R., Siddiqui, N., Mullin, M., Henderson, V. C., et al. (2012). A novel 4EHP-GIGYF2 translational repressor complex is essential for mammalian development. Molecular and Cellular Biology, 32(17), 3585?3593. doi:10.1128/MCB.00455-12. Matsuura T, Sutcliffe JS, Fang P, Galjaard RJ, Jiang YH, Benton CS, Rommens JM, Beaudet AL. 1997. De novo truncating mutations in E6-AP ubiquitin-protein ligase gene (UBE3A) in Angelman syndrome. Nat Genet 15: 74?77. Mazur-Kolecka B, Golabek A, Kida E, Rabe A, Hwang Y-W, Adayev T, Wegiel J, Flory M, Kaczmarski W, Marchi E, et al. 2012. Effect of DYRK1A activity inhibition on development of neuronal progenitors isolated from Ts65Dn mice. J Neurosci Res 90: 999?1010. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, et al. 2010. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research 20: 1297?1303. Miller DT, Shen Y, Weiss LA, Korn J, Anselm I, Bridgemohan C, Cox GF, Dickinson H, Gentile J, Harris DJ, et al. 2009. Microdeletion/duplication at 15q13.2q13.3 among individuals with features of autism and other neuropsychiatric disorders. J Med Genet 46: 242?248. Moessner R, Marshall CR, Sutcliffe JS, Skaug J, Pinto D, Vincent J, Zwaigenbaum L, Fernandez B, Roberts W, Szatmari P, et al. 2007. Contribution of SHANK3 mutations to autism spectrum disorder. Am J Hum Genet 81: 1289?1297. Moller RS, K?bart S, Hoeltzenbein M, Heye B, Vogel I, Hansen CP, Menzel C, Ullmann R, Tommerup N, Ropers H-H, et al. 2008. Truncation of the Down syndrome candidate gene DYRK1A in two unrelated patients with microcephaly. Am J Hum Genet 82: 1165?1170. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. 2008. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5: 621?628. Mutz K-O, Heilkenbrinker A, L?nne M, Walter J-G, Stahl F. 2013. Transcriptome analysis using next-generation sequencing. Curr Opin Biotechnol 24: 22?30. Neale BM, Kou Y, Liu L, Ma?ayan A, Samocha KE, Sabo A, Lin C-F, Stevens C, Wang L-S, Makarov V, et al. 2012. Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485: 242?245. Ng D, Pitcher GM, Szilard RK, Serti? A, Kanisek M, Clapcote SJ, Lipina T, Kalia LV, 
 102 
Joo D, McKerlie C, et al. 2009a. Neto1 Is a Novel CUB-Domain NMDA Receptor?Interacting Protein Required for Synaptic Plasticity and Learning. PLoS Biol 7: e1000041. Ng SB, Buckingham KJ, Lee C, Bigham AW, Tabor HK, Dent KM, Huff CD, Shannon PT, Jabs EW, Nickerson DA, et al. 2009b. Exome sequencing identifies the cause of a mendelian disorder. Nat Genet 42: 30?35. Ng SB, Turner EH, Robertson PD, Flygare SD, Bigham AW, Lee C, Shaffer T, Wong M, Bhattacharjee A, Eichler EE, et al. 2009c. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461: 272?276. Nishiyama M, Oshikawa K, Tsukada Y-I, Nakagawa T, Iemura S-I, Natsume T, Fan Y, Kikuchi A, Skoultchi AI, Nakayama KI. 2009. CHD8 suppresses p53-mediated apoptosis through histone H1 recruitment during early embryogenesis. Nat Cell Biol 11: 172?182. Nishiyama M, Skoultchi AI, Nakayama KI. 2012. Histone H1 recruitment by CHD8 is essential for suppression of the Wnt-?-catenin signaling pathway. Mol Cell Biol 32: 501?512. O'Roak BJ, Deriziotis P, Lee C, Vives L, Schwartz JJ, Girirajan S, Karakoc E, Mackenzie AP, Ng SB, Baker C, et al. 2011. Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations. Nat Genet 43: 585?589. O'Roak BJ, Vives L, Fu W, Egertson JD, Stanaway IB, Phelps IG, Carvill G, Kumar A, Lee C, Ankenman K, et al. 2012a. Multiplex targeted sequencing identifies recurrently mutated genes in autism spectrum disorders. Science 338: 1619?1622. O'Roak BJ, Vives L, Girirajan S, Karakoc E, Krumm N, Coe BP, Levy R, Ko A, Lee C, Smith JD, et al. 2012b. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature 485: 246?250. Ogiwara I, Ito K, Sawaishi Y, Osaka H, Mazaki E, Inoue I, Montal M, Hashikawa T, Shike T, Fujiwara T, et al. 2009. De novo mutations of voltage-gated sodium channel alphaII gene SCN2A in intractable epilepsies. Neurology 73: 1046?1053. Pa?ca SP, Portmann T, Voineagu I, Yazawa M, Shcheglovitov A, Pasca AM, Cord B, Palmer TD, Chikahisa S, Nishino S, et al. 2011. Using iPSC-derived neurons to uncover cellular phenotypes associated with Timothy syndrome. Nat Med 17: 1657?1662. Peiffer DA, Le JM, Steemers FJ, Chang W, Jenniges T, Garcia F, Haden K, Li J, Shaw CA, Belmont J, et al. 2006. High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Research 16: 1136?1148. Lorenz, P., Dietmann, S., Wilhelm, T., Koczan, D., Autran, S., Gad, S., et al. (2010). The 
 103 
ancient mammalian KRAB zinc finger gene cluster on human chromosome 8q24.3 illustrates principles of C2H2 zinc finger evolution associated with unique expression profiles in human tissues. BMC Genomics, 11(1), 206. doi:10.1186/1471-2164-11-206 Petrovski S, Wang Q, Heinzen EL, Allen AS, Goldstein DB. 2013. Genic Intolerance to Functional Variation and the Interpretation of Personal Genomes. PLoS Genetics 9: e1003709. Pinkel D, Segraves R, Sudar D, Clark S, Poole I, Kowbel D, Collins C, Kuo WL, Chen C, Zhai Y, et al. 1998. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat Genet 20: 207?211. Pinto D, Pagnamenta AT, Klei L, Anney R, Merico D, Regan R, Conroy J, Magalhaes TR, Correia C, Abrahams BS, et al. 2010. Functional impact of global rare copy number variation in autism spectrum disorders. Nature 466: 368?372. Poultney CS, Goldberg AP, Drapeau E, Kou Y, Harony-Nicolas H, Kajiwara Y, De Rubeis S, Durand S, Stevens C, Rehnstr?m K, et al. 2013. Identification of Small Exonic CNV from Whole-Exome Sequence Data and Application to Autism Spectrum Disorder. The American Journal of Human Genetics 93: 607?619. Rauch A, Wieczorek D, Graf E, Wieland T, Endele S, Schwarzmayr T, Albrecht B, Bartholdi D, Beygo J, Di Donato N, et al. 2012. Range of genetic mutations associated with severe non-syndromic sporadic intellectual disability: an exome sequencing study. Lancet 380: 1674?1682. Ronemus M, Iossifov I, Levy D, Wigler M. 2014. The role of de novo mutations in the genetics of autism spectrum disorders. Nature Reviews Genetics 15: 133?141. Rowntree RK, Harris A. 2003. The Phenotypic Consequences of CFTR Mutations. Ann Human Genet 67: 471?485. Salinas PC, Zou Y. 2008. Wnt Signaling in Neural Circuit Assembly. Annu Rev Neurosci 31: 339?358. Sanders SJ, Ercan-Sencicek AG, Hus V, Luo R, Murtha MT, Moreno-De-Luca D, Chu SH, Moreau MP, Gupta AR, Thomson SA, et al. 2011. Multiple recurrent de novo CNVs, including duplications of the 7q11.23 Williams syndrome region, are strongly associated with autism. Neuron 70: 863?885. Sanders SJ, Murtha MT, Gupta AR, Murdoch JD, Raubeson MJ, Willsey AJ, Ercan-Sencicek AG, DiLullo NM, Parikshak NN, Stein JL, et al. 2012. De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature 485: 237?241. Santen GWE, Aten E, Sun Y, Almomani R, Gilissen C, Nielsen M, Kant SG, Snoeck IN, Peeters EAJ, Hilhorst-Hofstee Y, et al. 2012. Mutations in SWI/SNF chromatin 
 104 
remodeling complex gene ARID1B cause Coffin-Siris syndrome. Nat Genet 44: 379?380. Santos Dos C, Essioux L, Teinturier C, Tauber M, Goffin V, Bougn?res P. 2004. A common polymorphism of the growth hormone receptor is associated with increased responsiveness to growth hormone. Nat Genet 36: 720?724. Sathirapongsasuti JF, Lee H, Horst BAJ, Brunner G, Cochran AJ, Binder S, Quackenbush J, Nelson SF. 2011. Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV. Bioinformatics 27: 2648?2654. Scharpf RB, Irizarry RA, Ritchie ME, Carvalho B, Ruczinski I. 2011. Using the R Package crlmm for Genotyping and Copy Number Estimation. J Stat Softw 40: 1?32. Scholz R, Berberich S, Rathgeber L, Kolleker A, K?hr G, Kornau H-C. 2010. AMPA Receptor Signaling through BRAG2 and Arf6 Critical for Long-Term Synaptic Depression. Neuron 66: 768?780. Schuurs-Hoeijmakers JHM, Geraghty MT, Kamsteeg E-J, Ben-Salem S, de Bot ST, Nijhof B, van de Vondervoort IIGM, van der Graaf M, Nobau AC, Otte-H?ller I, et al. 2012. Mutations in DDHD2, Encoding an Intracellular Phospholipase A1, Cause a Recessive Form of Complex Hereditary Spastic Paraplegia. The American Journal of Human Genetics 91: 1073?1081. Sebat J, Lakshmi B, Malhotra D, Troge J, Lese-Martin C, Walsh T, Yamrom B, Yoon S, Krasnitz A, Kendall J, et al. 2007. Strong Association of De Novo Copy Number Mutations with Autism. Science 316: 445?449. Sharp AJ, Hansen S, Selzer RR, Cheng Z, Regan R, Hurst JA, Stewart H, Price SM, Blair E, Hennekam RC, et al. 2006. Discovery of previously unidentified genomic disorders from the duplication architecture of the human genome. Nat Genet 38: 1038?1042. Sharp AJ, Mefford HC, Li K, Baker C, Skinner C, Stevenson RE, Schroer RJ, Novara F, De Gregori M, Ciccone R, et al. 2008. A recurrent 15q13.3 microdeletion syndrome associated with mental retardation and seizures. Nat Genet 40: 322?328. Song WJ, Sternberg LR, Kasten-Sport?s C, Keuren ML, Chung SH, Slack AC, Miller DE, Glover TW, Chiang PW, Lou L, et al. 1996. Isolation of human and murine homologues of the Drosophila minibrain gene: human homologue maps to 21q22.2 in the Down syndrome "critical region". Genomics 38: 331?339. Stefansson H, Rujescu D, Cichon S, Pietil?inen OPH, Ingason A, Steinberg S, Fossdal R, Sigurdsson E, Sigmundsson T, Buizer-Voskamp JE, et al. 2008. Large recurrent microdeletions associated with schizophrenia. Nature 455: 232?236. Steffenburg S, Gillberg C, Hellgren L, Andersson L, Gillberg IC, Jakobsson G, Bohman M. 1989. A twin study of autism in Denmark, Finland, Iceland, Norway and Sweden. 
 105 
J Child Psychol Psychiatry 30: 405?416. Stessman HA, Bernier R, Eichler EE. 2014. A genotype-first approach to defining the subtypes of a complex disease. Cell 156: 872?877. Su AI. 2004. A gene atlas of the mouse and human protein-encoding transcriptomes. Proceedings of the National Academy of Sciences 101: 6062?6067. Sudmant PH, Kitzman JO, Antonacci F, Alkan C, Malig M, Tsalenko A, Sampas N, Bruhn L, Shendure J, 1000 Genomes Project, et al. 2010. Diversity of human copy number variation and multicopy genes. Science 330: 641?646. Talkowski ME, Rosenfeld JA, Blumenthal I, Pillalamarri V, Chiang C, Heilbut A, Ernst C, Hanscom C, Rossin E, Lindgren AM, et al. 2012. Sequencing chromosomal abnormalities reveals neurodevelopmental loci that confer risk across diagnostic boundaries. Cell 149: 525?537. Tejedor F, Zhu XR, Kaltenbach E, Ackermann A, Baumann A, Canal I, Heisenberg M, Fischbach KF, Pongs O. 1995. minibrain: a new protein kinase family involved in postembryonic neurogenesis in Drosophila. Neuron 14: 287?301. Thompson BA, Tremblay V, Lin G, Bochar DA. 2008. CHD8 is an ATP-dependent chromatin remodeling factor that regulates beta-catenin target genes. Mol Cell Biol 28: 3894?3904. van Bon BWM, Hoischen A, Hehir-Kwa J, de Brouwer APM, Ruivenkamp C, Gijsbers ACJ, Marcelis CL, de Leeuw N, Veltman JA, Brunner HG, et al. 2011. Intragenic deletion in DYRK1A leads to mental retardation and primary microcephaly. Clin Genet 79: 296?299. Venkatraman ES, Olshen AB. 2007. A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics 23: 657?663. Vissers LELM, de Ligt J, Gilissen C, Janssen I, Steehouwer M, de Vries P, van Lier B, Arts P, Wieskamp N, del Rosario M, et al. 2010. A de novo paradigm for mental retardation. Nat Genet 42: 1109?1112. Voineagu I, Wang X, Johnston P, Lowe JK, Tian Y, Horvath S, Mill J, Cantor RM, Blencowe BJ, Geschwind DH. 2011. Transcriptomic analysis of autistic brain reveals convergent molecular pathology. Nature 474: 380?384. Wada K, Saigoh K, Wang Y-L, Suh J-G, Yamanishi T, Sakai Y, Kiyosawa H, Harada T, Ichihara N, Wakana S, et al. 1999. Intragenic deletion in the gene encoding ubiquitin carboxy-terminal hydrolase in gad mice. Nat Genet 23: 47?51. Wali A, Ali G, John P, Lee K, Chishti MS, Leal SM, Ahmad W. 2007. Mapping of a Gene for Alopecia with Mental Retardation Syndrome (APMR3) on Chromosome 18q11.2-q12.2. Ann Human Genet 71: 570?577. 
 106 
Walsh CA, Morrow EM, Rubenstein JLR. 2008. Autism and Brain Development. Cell 135: 396?400. Walsh KM, Bracken MB. 2011. Copy number variation in the dosage-sensitive 16p11.2 interval accounts for only a small proportion of autism incidence: A systematic review and meta-analysis. Genet Med 13: 377?384. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB. 2008. Alternative isoform regulation in human tissue transcriptomes. Nature 456: 470?476. Wang T-F, Ding C-N, Wang G-S, Luo S-C, Lin Y-L, Ruan Y, Hevner R, Rubenstein JLR, Hsueh Y-P. 2004. Identification of Tbr-1/CASK complex target genes in neurons. J Neurochem 91: 1483?1492. Wobst H, F?rster S, Laurini C, Sekulla A, Dreiseidler M, H?hfeld J, Schmitz B, Diestel S. 2012. UCHL1 regulates ubiquitination and recycling of the neural cell adhesion molecule NCAM. FEBS J 279: 4398?4409. Yabut O, Domogauer J, D'Arcangelo G. Dyrk1A Overexpression Inhibits Proliferation and Induces Premature Neuronal Differentiation of Neural Progenitor Cells. jneurosciorg. Yazawa M, Hsueh B, Jia X, Pasca AM, Bernstein JA, Hallmayer J, Dolmetsch RE. 2011. Using induced pluripotent stem cells to investigate cardiac phenotypes in Timothy syndrome. Nature 471: 230?234. Yoon S, Xuan Z, Makarov V, Ye K, Sebat J. 2009. Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Research 19: 1586?1592.  
 107 
Web and Software Resources  CoNIFER (Copy Number Inference from Exome Reads): source code, tutorials and sample can be downloaded from http://conifer.sf.net and additional pipeline implementations are available at https://github.com/nkrumm/conifer-tools  MIM (Mendelian Inheritance in Man) identifiers can be accessed via OMIM (Online Mendelian Inheritance in Man): http://omim.org  mrsFAST: source code and binaries are available at http://mrsfast.sf.net  NDAR (National Database of Autism Research): http://ndar.nih.gov Data from Chapter 4 available at: https://ndar.nih.gov/study.html?id=334  
Appendix(A((Chapter(1)((Details of StringDB network generation: In order to create the PPI network in Figure 3, we started with the de novo mutations published in each of the six exome studies [1-6] and limited these to events found in probands and intersecting exons or canonical splice sites. The network in Figure 3 was created using all genes with de novo truncating variants (defined as nonsense variants, frameshifting variants or variants likely affecting mRNA splicing) as well as six additional genes (DLG4, GRIN2A, CASK, PSEN1, CHD7, NLGN1) in which only missense variants have been observed thus far, but which have important neurobiological roles and/or disease association. In all, we included 158 genes, of which 157 could be identified in the StringDB database (LTN1 was not found in any human interactions in StringDB).  Data from the StringDB interaction database version 9.05 [7] was used to create the edges of the PPI. We strictly limited our interactions to only human (organism ID 9606) interactions which were based on experimental evidence and an overall combined interaction confidence score of 400 or more. We did not include interactions solely based on any of the other StringDB interaction types, such as in silico text-mining, co-expression, etc. Overall, we included 85,678 interactions and 12,113 nodes in our analysis.  In order to create the network displayed in Figure 3, we took these steps:  1. Intersected the 157 identified genes based on the criteria above with the human, experimentally-validated StringDB interactions. These form the central red (truncating mutations) and blue nodes (selected missense mutations) in Figure 3, and are connected using thick black lines.  2. Found the two largest connected components. We observed that these were connected via the DLGAP1 protein (see main text for discussion) and added 
this node as a unfilled (white) node with dashed lines. 3. In addition, we surmised that our set of truncating mutations was likely incomplete, and that many ASD/ID genes may be excluded from the central network simply due to the fact that rare variants in these genes have not yet been discovered. Thus, we ?grew? or expanded the network by allowing genes with truncating mutations to be included as ?peripheral? nodes if they were within a distance of two (i.e., one intervening node) of the central network. These nodes are drawn as a lighter shade of red and have finely dashed edges. For this analysis, we excluded three proteins (SUMO1, SUMO2 and UBC) which had highly non-specific interactions in StringDB (sumosylation and ubiquitination).  4. We indicated which mutations have only been observed in studies of ID by using half-filled circles. The reciprocal (ASD-only) situation is not indicated due to the fact that there have been nearly ten-fold more ASD exomes sequenced than ID exomes. 5. Lastly, we scaled the sizes of the nodes based the number of times mutations in cases had been observed in each gene (including the mutations from the MIP resequencing data).  Estimating PPI significance: In order to test if the PPI network of de novo mutations found in the six reviewed exome studies was significantly distinct from randomly formed networks of similar size, we performed two simulation studies. These two simulations were based on random sampling from the complete set of known PPI interactions (i.e., from StringDB) or from random permutation of the existing network. Both simulations were designed to take into account the highly variable degree distribution of interaction networks-- that is, some nodes are highly connected ?hub nodes? while other proteins are scantly connected, if at all. The results of the simulations are described in Table S1, and each is described in more detail below.  
Stratified node resampling: For each iteration of the simulation, we randomly selected a stratified (based on degree distribution for the nodes with mutations) set of nodes from the complete StringDB interaction network (limited to interactions with ?experimental? evidence and a minimum interaction score of 400). This ensured that the nodes we picked were similar in connectivity and that representation of ?hub? nodes and ?outlier? nodes was equivalent to that of the actual network. A new PPI graph was generated from each set of stratified random nodes, and the structural characteristics of these graphs were compiled into a null distribution. We primarily examined the average clustering coefficient and the total number of edges of the permuted graphs and compared these to the characteristics of the actual PPI networks. P-values were derived using the empirical distributions from 10,000 iterations of the simulation.  Edge swapping simulation: In this simulation, we did not alter the set of nodes included in each PPI network, but instead permuted the edges found within the PPI network, thus preserving the degree distribution of the network. Specifically, in each iteration of the simulation, a random sampling of edges (where the number of sampled edges was equal to the total number of edges in the PPI network) in the network were swapped with another eligible edge:    u --- v   u      x                 |       |    x ----y   x      y After randomly swapping edges, we re-computed the average clustering coefficient, size of largest connected component and number of edges for the subgraph of the genes (nodes) with observed mutations and computed the empirical p-value as above. Due to the increased complexity and running time of this simulation, we performed only 1,000 iterations.  
Table S1: Summary table of PPI network simulations  
   Top row p-values are from stratified node resampling simulation ----- Bottom row p-values are from the edge-swap simulation  Nominally significant (p < 0.05) values highlighted in bold 
Details of Hidden Species simulation in Figure 1:  In order to estimate the number of genes implicated in ASD under a de novo/rare variant model, we used mutations in probands from the four ASD exome studies and a reformulation of the ?unseen species problem? (see [8] for review; [9] for application to de novo CNVs discovered in autism), where genes with severe de novo SNPs in probands are considered ?observed species?, and binned by their frequency of appearance (i.e., ?singletons?, ?doubletons?, etc.). For each category (truncating, truncating+missense), we find the distribution of the number of recurrently mutated genes (i.e., the bins and bin counts of a histogram function). All genes with more than one mutation are included, as is a fraction of the ?singleton? mutations (those with only one observed mutation in the four studies). The recurrence counts are shown below:  Table S2: Recurrence of de novo mutations in 4 ASD studies  
  Given these frequencies and frequency counts, we estimated the total number of genes implicated in autism (the total number of species) using the Chao and Lee estimator implemented in the R package SPECIES [10]. The ?Percentage of de novo singleton events considered pathogenic? refers to the fraction of the singletons (recurrence = 1) included in the frequency counts. 
References:  1?O'Roak, B.J. et al. (2012) Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature 485, 246?250 2?Sanders, S.J. et al. (2012) De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature 485, 237?241 3?Iossifov, I. et al. (2012) De novo gene disruptions in children on the autistic spectrum. Neuron 74, 285?299 4?Neale, B.M. et al. (2012) Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485, 242?245 5?Rauch, A. et al. (2012) Range of genetic mutations associated with severe non-syndromic sporadic intellectual disability: an exome sequencing study. Lancet 380, 1674?1682 6?de Ligt, J. et al. (2012) Diagnostic exome sequencing in persons with severe intellectual disability. N Engl J Med 367, 1921?1929 7?Franceschini, A. et al. (2012) STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Research 41, D808?D815 8?BungeFitzpatrick (1993) Estimating the Number of Species: A Review. Journal of the American Statistical Association 88, 364?374 9?Sanders, S.J. et al. (2011) Multiple recurrent de novo CNVs, including duplications of the 7q11.23 Williams syndrome region, are strongly associated with autism. Neuron 70, 863?885 10?Wang, J.-P. 01-Apr-(2011), SPECIES: An R Package for Species Richness Estimation. Journal of Statistical Software. [Online]. Available: http://www.jstatsoft.org/. [Accessed: 30-Aug-2011] 
Appendix(B((Chapter(2)( Library construction and exome capture: All exome samples were prepared by subjecting 2 ug of genomic DNA to a series of shotgun library construction steps, including fragmentation through acoustic sonication (Covaris), end-polishing and A-tailing, ligation of sequencing adaptors, and PCR amplification. Following library construction, 1 ?g of shotgun library is hybridized to biotinylated capture probes for 72 hours and then recovered via streptavidin beads. Unbound DNA is washed away, and the captured DNA is PCR amplified for sequencing.   Sequence data processing and alignment: Raw sequenced reads (from FASTQ files) were first split into 36bp chunks (in order to to avoid interference from indels), and mapped using the mrsFAST (v.2.3.0.2) aligner. Up to two mismatches were allowed per read. To reduce computational overhead, we created a concatenated exome index, consisting of the targeted exons (see below), plus 300bp flanking sequence from the hg19 (NCBI build 37) human reference genome, masked with RepeatMasker and Tandem Repeat Finder. After mapping to this concatenated ?exome?, we translated mapped coordinates back to to hg19 genome coordinates for further processing.  Exome probe definitions: For the mrsFAST-based alignments, we developed a probe set (i.e., target regions) by intersecting target definitions of the Roche Nimblegen EZ Exome SeqCap Version 2 (from http://www.nimblegen.com/downloads/annotation/ez_exome_v2/SeqCapEZ_Exome_v2.0_Design_Annotation_files.zip) exome capture kit with RefSeq exons (excluding UTR regions). In addition, we included 4,857 non-exonic targeted regions from the SeqCap Version 2 target definition list. This resulted in 194,080 
target probes (available at http://conifer.sourceforge.net)  Initial exon-level normalization: We calculated RPKM values for the 194,080 target probes individually. The RPKM normalization is given by  RPKM = 109 * Read Starts / Total Mapped Reads * Target Size (bp)  where the number of Read Starts is defined as the number of reads starting within the target boundaries, and the Total Mapped Reads corresponds to the number of unique reads which had at least one mapping. This initial RPKM normalization step adjusts our read-depth estimates for target (exon) size as well as the overall sequencing coverage in the experiment. To reduce erroneous signal from failed or improperly targeted probes, we excluded 3,964 targets which had a median RPKM < 1 in the 533 ESP samples.  Next, to control for probe-to-probe differences in capture efficiency, we standardized the RPKM values using a z-transformation. The median and standard deviation of each exon were derived from RPKM values of the 533 ESP exomes. The formula for the zRPKM value is:  zRPKM = (RPKMexon,sample - Medianexon) / StdDevexon  Removing systematic bias between batches: A previous analysis of exome read-depth values from ~1,700 ESP exomes using principal components analysis (PCA) revealed several strong components, some of which were attributed to ?batch? effects (unpublished, Sara Ng and Jay Shendure). We hypothesized that these strong components do not correspond to biological signal, but rather to differences in capture protocol, efficiency and sequencing bias. Using singular value decomposition, a mathematical analog of 
PCA, we decompose the exon-by-sample (X) data matrix into three matrices:  X = USVT  In order to remove the strongest k components, we set S1...Sk to zero to form S?, and then recalculate X as the dot product of U, S? and VT . For computation efficiency, each chromosome is normalized individually across the population. We used an implementation of SVD in the scipy.stats package available for the python programming language.   Discovery of rare CNVs For discovery of rare CNVs, we removed between 12 and 15 (k) singular values, a number which we empirically adjusted based on the inflection point of the ?scree plot? (Fig S2), as well as by manual inspection of the final normalized data. To reduce the false positive rate of discovery for rare CNVs, we applied a 15-exon centrally-weighted moving average across exons. We set discovery thresholds at -1.5 or +1.5 for rare deletions and duplications, respectively, and required at least three exome probes to exceed the threshold. To account for the fact that smoothing shrinks the apparent size of discovered events, regions which exceeded this threshold were slightly expanded until the sample?s smoothed value crossed within two standard deviations surrounding the population mean of the smoothed values (Fig S1b).  Sample-level quality control: We excluded ESP exomes from the final background distribution if our algorithm predicted more than 10 calls, as we noted that these samples had a greatly increased total call count (up to 111 calls/sample), and that the calls were largely false positives. This resulted in the exclusion of a total of 80 of 613 initial exomes (87% pass rate) ESP exomes from the background distribution, leaving our final set of 533 exomes. No exomes from the HapMap cohort (range: 1-7 calls per 
individual) or the autism cohort (range: 0-14 calls per individual) were excluded.  Genotyping CNPs: For genotyping copy number polymorphic (CNP) regions of the genome, as well as assessing the copy-number of multi-copy genes, we developed a slightly modified approach. Starting from zRPKM values, we again applied the SVD transformation, but opted to remove only five components, in order to prevent the SVD algorithm from remove bona fide signal from the regions of interest. We genotype each individual by determining the average, resulting in the ?SVD-ZRPKM value?.  Whole Genome Copy Number Correlations: To estimate the absolute copy number at CNP loci, read-depth from independent whole-genome sequencing (as previously described in (Sudmant et al., 2010)) was used. Briefly, regions of known copy-number were used to create a copy-number standard curve, and the absolute copy number of tiling 1kb windows across the genome was estimated. For genotyping, the median of the 1kb window estimates was used.  Because we wanted to assess a correlation between exome and whole-genome based methods, we only included loci in the final set if the whole-genome copy number estimate indicated that the locus was polymorphic among the seven HapMap samples tested. We defined a locus to be polymorphic if the absolute range of copy numbers amongst the HapMap samples was greater than 1. Finally, we defined the median copy number of each locus as the median of the absolute copy number estimates among the seven HapMap samples.  Absolute copy number estimation using population frequency information: To convert relative SVD-ZRPKM values into absolute copy numbers, we used an unsupervised clustering algorithm to cluster SVD-ZRPKM genotype values, and 
then leveraged genotypes from 43 CNPs in a large set of HapMap samples from (Campbell et al., 2011) to match clusters to absolute copy number.   Unsupervised clustering was done using a mean-shift algorithm implemented in the python package SciKits.learn. The mean-shift algorithm is similar to k-means clustering, but does not require a priori information regarding the number of clusters. After clustering, we automatically merged clusters together if their centers were not spaced linearly on the x-axis, as we found that this marginally improved the clustering for some loci. Finally, we fit the most common copy-number state(s) for each locus from (Campbell et al., 2011) to the largest cluster(s) identified by the exome-based SVD-ZRPKM values by maximizing the r2 value between the two vectors (from each data source) of copy-number states. In other words, we attempted to match the frequencies of each copy number state identified by (Campbell et al., 2011) to consecutive clusters identified by our clustering method. To determine an absolute copy number genotype of a CNP locus for a HapMap sample, we simply determined to which cluster the sample belonged and the matched absolute copy number for that cluster.   Sensitivity call set for HapMap Samples: To assess sensitivity, we started with CNV calls from the discovery experiment from Conrad and colleagues (Conrad et al., 2010) as a gold standard. This list contained at first 6919 calls for the 5 overlapping hapmap samples in our set. Of these, 486 overlapped at least 3 exome probes (required by our discovery algorithm). Because segmental duplications are prone to array-CGH reference and detection bias, we removed 416 calls for which 50% of the underlying exome probes were in segmental duplications. Finally, we removed 20 calls found in somatically rearranged regions:  chr2:89156874-89630175  Ig light chain kappa 
chr6:32386993-32787910 HLA chr6:31226231-31328167 HLA chr14:105994256-107283087  Ig Heavy chain chr22:22380820-23265082  Ig light chain lambda chr7:141975722-142519580  T-cell receptor beta subunit  This resulted in 50 calls. For each call, we reviewed several data sources: 1) Illumina i1M or 650Y (for NA15510) SNP array LogR intensities and B-allele frequency, 2) whole genome copy number estimates (from (Sudmant et al., 2010), but not available for NA15510), 3) fosmid-based calls from (Kidd et al., 2008) and 4) SVD-ZRPKM signal across ESP and HapMap samples. We manually curated the 50 calls into four categories: Rare CNVs (5 total), CNPs or CNP-like (36 events), events in high-diversity regions of the genome (6 events; primarily Olfactory receptors and zinc-finger genes), and false positives in the Conrad et al. set (3 calls). False positives had no corroborating evidence in any other data set, and were not counted towards the sensitivity estimates.   Discovery of rare CNVs in ASD trios: Using the input set of 366 ASD cohort individuals (122 probands) with 366 randomly picked ESP samples, and removing 15 components, our algorithm made a total of 1,043 calls among the 366 individuals in the ASD cohort (with 369 calls in probands), with each sample having between 0 and 14 calls; overall 340 individuals had at least one call. Merging all overlapping calls in the ASD resulted in 282 CNVRs.  As the exome capture reaction targets many genes present in duplicated regions of the genome, and as many exons share homologous sequence, a significant proportion of our calls in probands are due to changes in the copy number of these genes due to independent assortment of parental haplotypes. Starting with the 369 calls made in the 117 probands, we filtered calls to enrich for ?rare? 
CNVs. Calls which had greater than 50% reciprocal overlap (as determined by the fraction of underlying exome probes within the call also in segmental duplications) with segmental duplications were removed (163/369, or 44%). Next, we calculated the median copy number of calls based on whole-genome read-depth copy-number estimates from ~660 genomes (Sudmant et al., 2010), and additionally filtered 13 calls (3.5%) with more than 3+ copies population-wide (as events stemming from these segmentally-duplicated or higher-copy regions of the genome are likely due to the independent assortment of parental haplotypes, and not ?true? rare CNVs). Additionally, we manually curated the calls to remove calls within regions undergoing somatic rearrangement (two calls; one at the IGH locus and one in the HLA locus), and merged adjacent or overlapping calls. These steps left 191 calls among 97 probands, and these calls were primarily found in non-duplicated genes and diploid regions of the genome. We categorized each call into one of three bins: de novo, inherited or copy-number polymorphic (Table S3).  Description of CNVs in ASD trios: We found eight putative de novo events (Table S2). For six of these, we were able to corroborate the event using available Illumina SNP microarray data as well as targeted array-CGH experimental data (Sanders et al. 2011, O?Roak et al., submitted). The other two de novo events were each driven by increases in SVD-ZRPKM values for the eighth exon of FAF2 gene. Although we were able to confirm an excess of reads mapping to this exon by manual inspection of the mapped reads (from both mrsFAST and BWA alignments), we were not able to experimentally validate these duplications using a quantitative real-time PCR assay targeting the eighth exon itself (data not shown).  Next, we looked for inherited events using our exome read-depth analysis and found that 128 events in the probands were inherited from either the probands mother or father. For 117 of these events, the SVD-RPKM values of both the 
proband and the parent exceeded the detection threshold (?1.5); however, for 11 of these calls, the SVD-RPKM values between proband and parents was just below the deletion or duplication threshold required for calling, and inheritance status was determined by manual inspection.  Inspection of the SVD-RPKM values for remaining 55 events (14 loci; see Table S#) revealed that these events strongly resemble copy-number polymorphic sites or contained processed pseudogenes. Such events are likely due to increases or decreases in copy number from the independent assortment of parental alleles; furthermore, changes in processed pseudogenes at other genomic loci can change the apparent copy number of annotated genes in the exome-capture reaction.  As we had explicitly attempted to filter out such sites, we investigated these sites further. The observed signals in five of the loci, PRKRA (18 events), RNF145 (3 events), and CDC27 (2 events), HNRNPA1 (1 event), and TDG (1 event) are likely driven by processed pseudogenes. Most of the remaining loci (DAZL, BTNL8/BTNL3, CLPS, OR4, SIGLEC14, and KRT34) were previously identified to be copy-number polymorphic by Conrad et al. (2010). Finally, the SVD-RPKM values of the last event, a duplication of exons in KRT8 and KRT18 are in-line with the signature of CNPs or highly duplicated exons.  Comparison of mrsFAST- and BWA-based read-depth estimation BWA-based mappings were generated using the default settings for BWA (0.5.6) and post-processed with a pipeline developed specifically for SNP and single nucleotide variant (SNV) discovery. Reads which had more than one high-quality mappings were removed from the alignment and a minimum mapping quality (MAPQ) of 30 was required of all reads. The same method for generating RPKM values from BWA alignments was used as was for mrsFAST-based alignments. We calculated RPKM values for the same 194,080 intervals used elsewhere in this report, and again excluded targets with a median RPKM < 1, a total of 7,117 probes in this experiment. 
 To make up the sample set for the comparison experiment, we combined 492 ESP samples, for which we had both mrsFAST and BWA-based mapping information, with the 8 HapMap samples. We noticed the the overall variance (as determined by the scree plot) in the BWA-based mapping was lower, and opted to remove only 6 components of variance. For the mrsFAST-based mappings, we removed the usual first 12 components. All other processing steps were done in the same fashion as elsewhere in this paper.  The signal-to-noise ratio for calls was calculated using the formula   SNR =  |?call| / ?chromosome 
 where ?call is the mean of the SVD-ZRPKM values for the exons within a call, and ?chromosome is the standard deviation of all the SVD-ZRPKM values of the call?s chromosome. We calculated the SNR for the seven rare validated calls from table S1 for both mrsFAST-based and BWA-based SVD-ZRPKM values (Table S6). Six of seven rare CNVs showed improved SNR using the mrsFAST-based mappings, with a median improvement of 58% over BWA (mean 38% improvement).  Comparison to ExomeCNV algorithm:  We compared our algorithm to the previously published ExomeCNV (Sathirapongsasuti et al., 2011) in order to better understand the strengths and weaknesses of each. ExomeCNV is designed to detect copy number aberration in the context of cancer, a special case of copy number variation which requires additional parameters to be defined (e.g., the rate of admixture/contamination of tumor and normal), and which must be able to handle samples for which a large fraction of the genome is not diploid. Accordingly, ExomeCNV is designed around 
a digital comparative hybridization algorithm, which requires that both the test and reference are as closely matched as possible (e.g., tumor-normal pairs of exomes from the same capture and sequence), and includes many features to better characterize cancer exomes. In contrast, ours is designed to discover genic deletions and duplications of exonic regions independently in each sample by first eliminating systematic noise using singular value decomposition.  We compared the ability of both algorithms to detect germline variation in DNA samples extensively analyzed and validated as part of other studies. To assess the sensitivity and specificity of both algorithms, we used the five HapMap samples for which exome sequence data had been generated and where high-density microarray analyses had been performed previously (Conrad et al., 2010). We set NA19240 as the reference sample, and used ExomeCNV to call CNVs on the remaining four samples (NA12878, NA15510, NA18517, and NA19129). Similar to the authors own use of the NovaAlign alignment package, we used the available BWA alignments for this comparison, and used the same 194,080 probes to generate an interval coverage file using the GATK (version 1.3.8) software package. We left all ExomeCNV parameters at their default values: sensitivity and specificity were set at 0.9999 for exons (maximizing specificity) and 0.99 for calls (?auc? option), and the admixture rate was set at a conservative 0.5 (despite the fact that we did not expect any biological admixture, we found that keeping this setting reduced the number of false positive calls).   Among the four test samples, ExomeCNV predicted 450 CNVs, of which only 63 (14%) overlapped with calls in the Conrad et al. call set by more than 10% reciprocal overlap. In contrast, our algorithm found 24 calls among these four samples, of which 21 (87.5%) overlapped the Conrad et al. set. While both programs were able to find all of the five rare CNVs (Table S3), we note that ExomeCNV predicted 16 CNVs larger than 500kb, which did not have any overlap with the high resolution Conrad et al. set of calls. This low specificity 
would make it very difficult to find ?true positives? in the ExomeCNV output, even when filtering for large CNVs only.  Using exon-level log-ratio output from ExomeCNV, we next compared how sensitive it was to changes in copy-number of duplicated genes. Across the 62 CNP loci genotyped by our algorithm (Table S4), ExomeCNV was able to generate LogR values for 51 loci (82%). Example correlations and a comparison between ExomeCNV and our algorithm are shown for four loci in Figure S8a. Across all loci, when compared to the log-ratio values to the whole-genome estimate for each locus, the median r2 across these loci was 0.57 (c.f. this work?s algorithm r2 = 0.92). As with the BWA alignment comparison, the genotyping dynamic range of ExomeCNV was severely limited, and the LogR values from ExomeCNV correlated only poorly with the corresponding whole-genome estimates of absolute copy number for loci with median copy number greater than seven (Figure S8c).  Finally, although the authors of ExomeCNV recognize that their algorithm depends on sample-to-sample consistency, large cohorts of tens to hundreds of exomes cannot be expected to maintain such consistency. Crucially, our algorithm allows for the comparison of samples from different cohorts, and even different iterations of the exome capture reaction itself. To demonstrate this, we examined two ESP samples from two different experimental cohorts (but stemming from the same study, and using the same capture kit version, library preparation steps and sequencing machines). The output from ExomeCNV for chromosome 20 is shown in the top left panel of Figure S7. When we counted the fraction of exome probes which ExomeCNV predicted as copy-number variant, we found that a biologically implausible 96.6% of the exome was detected as changed from diploid copy number (Figure S7, top right panel). In contrast, when we picked an ESP sample from the same experimental batch (and which was closely matched based on the variance we observed using the SVD 
decomposition) as the reference, ExomeCNV reported only 0.4% of exome probes as non-diploid (Figure S7, bottom panel). When we applied our algorithm (this work) at a very sensitive setting (? 1 SVD-ZRPKM threshold), we found only that for the same samples, only 0.06% and 0.15% of the exons were altered from diploid. This comparison highlights the strength of singular value decomposition for eliminating batch effects and systematic noise that may arise from exome capture experiments.    References: Campbell, C. D., Sampas, N., Tsalenko, A., Sudmant, P. H., Kidd, J. M., Malig, M., Vu, T. H., et al. (2011). Population-Genetic Properties of Differentiated Human Copy-Number Polymorphisms. The American Journal of Human Genetics, 88(3), 317?332. doi:10.1016/j.ajhg.2011.02.004 Conrad, D. F., Pinto, D., Redon, R., Feuk, L., Gokcumen, O., Zhang, Y., Aerts, J., et al. (2010). Origins and functional impact of copy number variation in the human genome Nature, 464(7289), 704?712. doi:10.1038/nature08516 Kidd, J. M., Cooper, G. M., Donahue, W. F., Hayden, H. S., Sampas, N., Graves, T., Hansen, N., et al. (2008). Mapping and sequencing of structural variation from eight human genomes Nature, 453(7191), 56?64. doi:10.1038/nature06862 Sudmant, P. H., Kitzman, J. O., Antonacci, F., Alkan, C., Malig, M., Tsalenko, A., Sampas, N., et al. (2010). Diversity of human copy number variation and multicopy genes Science, 330(6004), 641?646. doi:10.1126/science.1197005 Sathirapongsasuti, J. F., Lee, H., Horst, B. A. J., Brunner, G., Cochran, A. J., Binder, S., Quackenbush, J., et al. (2011). Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV. Bioinformatics, 27(19), 2648?2654. doi:10.1093/bioinformatics/btr462  
Figure S1: Threshold call overview:
+1.5 duplication threshold (A)
Extension boundary (B)
2 SD range across population
Final Call  (C)
S1: Threshold algorithm
To discover rare CNVs, we found smoothed SVD-ZRPKM values which crossed a 
threshold (A) of +1.5 or -1.5 for duplications and deletions, respectively. To account for 
the fact that our smoothed values shrink the apparent size of the call, we extended calls 
such that the final call (C) better represented the extend of the actual CNV. To do this, 
we extended calls from the initial supra-threshold event until the smoothed SVD-
ZRPKM values dipped below ?2 standard deviations surrounding the population median 
(red highlight) of the SVD-ZRPKM values (marked in figure by line [B], and by black 
circles).
Figure S2: Scree Plot
S2: Scree plot
This scree plot shows the first 40 singular values (S
n
) from the HapMap (blue) ASD trio 
(green) samples. The relative contributed variance of each singular value is proportional 
to its strength indicated on the y-axis.
Number (n)
Si
ng
ula
r V
alu
e (
S n
)
!""#$%&#'()*+#,-..#+/0123+4
/56#-..#7%8#+/0123+
9:!#7%8#+/0123+#/56
;#</1=/1#+/0123+
Figure S3: Filtering of calls in 122 ASD trios
S3: Filtering of calls in 122 ASD trios:
Starting with 369 detected calls in 122 ASD probands, we applied a set of filters to 
restrict calls to unique/diploid regions of the genome in order to estimate the precision of 
our method. Calls which had greater than 50% reciprocal overlap (as determined by the 
fraction of underlying exome probes within the call also in segmental duplications) with 
segmental duplications were removed (163/369, or 44%). Next, we calculated the 
median copy number of calls based on whole-genome read-depth copy-number 
estimates from ~660 genomes {Sudmant:2010kp}, and additionally filtered 13 calls 
(3.5%) with more than 3+ copies population-wide (as events stemming from these 
segmentally-duplicated or higher-copy regions of the genome are likely due to the 
independent assortment of parental haplotypes, and not ?true? rare CNVs). Additionally, 
we manually curated the calls to remove calls within regions undergoing somatic 
rearrangement (two calls; one at the IGH locus and one in the HLA locus), and merged 
adjacent or overlapping calls. We classified the 128/191 remaining calls as inherited 
(see text for details), eight as de novo, and 55 as copy-number polymorphic. We 
estimated the precision of our method for the detection of rare CNVs from our accuracy 
with de novo CNVs (6/8) and the number of inherited events (128 in total), resulting in a 
98.5% precision. 
Figure S4: Filtering of calls from Conrad et al. (2010) array-CGH experiment:
Figure S4: Filtering of calls from Conrad et al. (2010) array-CGH experiment:
We estimated the sensitivity of our method using array comparative genomic 
hybridization calls from Conrad et al. (2010) as a gold standard. Starting with calls from 
the 42-million probe CNV discovery experiment in {Conrad:2010ja}, there were 486 calls 
with at least three exome probes in the five HapMap samples for which we had exome 
sequences. Calls which had greater than 50% reciprocal overlap (as determined by the 
fraction of exome probes within the call also in segmental duplications) with segmental 
duplications were removed; additionally, we removed 20 calls in somatically rearranged 
regions. We manually inspected the remaining 50 calls (Table S2) to assess sensitivity 
of the method. Five events were rare and all five were detected by the ?1.5 SVD-
ZRPKM threshold. There were 36 CNPs, of which only three cross the threshold for rare 
CNVs. Six of the remaining events were either located in high diversity regions of the 
genome. Finally, we noted that three of the events were very likely false positive events 
in the Conrad dataset, as they were not corroborated by Illumina 1M SNP microarray 
data, nor were they found by a fosmid mapping approach{Kidd:2008bo}.
Figure S5: BWA and mrsFAST comparison ? genome view
Figure S5:
Visual comparison of BWA and mrsFAST-based mappings on a stretch of chromosome 
16. We found that across the seven validated rare CNVs from table S1, the SVD-
ZRPKM values derived from BWA mappings had a 57% lower signal-to-noise ratio, as 
noted by the decreased signal of NA18517 at the METTL9/OTOA locus for BWA-based 
mappings. (Y-axes have different scales to account for the lower standard deviation 
seen in the BWA-based SVD-ZRPKM values.)
Figure S6a: BWA and mrsFAST comparison ? genotyping accuracy
r2 between exome signal 
and whole-genome estimate of copy number
# o
f Lo
ci
BWA-based mappings
mrsFAST-based mappings (     = unmappable for BWA)
Figure S6a:
Comparison of correlations coefficients of SVD-ZRPKM to whole-genome copy number 
estimate across 62 CNP loci between BWA- and mrsFAST-based mapping strategies. 
The median r
2
 for the BWA-based experiment is 0.62 (green bars), while for mrsFAST 
the median r
2
 is 0.92 (blue bars). Moreover, for 15 loci, the BWA-based mappings did 
not have sufficient read-coverage in the loci to be genotyped, making them intractable 
to BWA-based read-depth genotyping.
Figure S6b: BWA and mrsFAST comparison ? by median copy number
mrsFAST BWA2 BWA
1 to 4 0.9443786471 0.5402885327 0.8739312056
4 to 6 0.9577126587 0.5310800182 0.8917799914
6 to 7 0.9464910938 0.6836105026 0.8280063547
7 to 12 0.7872028165 0.1420914257 0.1072175278
12 + 0.3287128512 0.0290366009 0.088258786
0
0.25
0.5
0.75
1
1 to 4 4 to 6 6 to 7 7 to 12 12 +
BWA Genotyping vs. mrsFAST
M
e
d
i
a
n
 
R
2
 
C
o
e
f
f
i
c
i
e
n
t
Median Copy Number of Locus (per whole-genome)
mrsFAST
BWA2
Figure S6b:
Comparison of BWA-based and mrsFAST-based alignments for genotyping of 62 loci, 
binned by median copy number of each locus. We calculated the median copy number 
of the 62 loci based on whole-genome read-depth copy-number estimates from ~660 
genomes. We note that mrsFAST-based mapping significantly improves the correlation 
between the SVD-ZRPKM genotyping scores and whole-genome absolute copy 
number, especially for loci with a median copy number between 7 and 12.
Figure S6c: BWA and mrsFAST comparison ? LRRC37A3 locus
Figure S6c:
Example CNP locus (LRRC37A3) representative of difficulty for BWA-based genotyping 
of loci with median population copy number greater than seven.
(top left): Histogram showing SVD-ZRPKM genotype values of 8 HapMap samples 
(indicated by horizontal lines) and 492 ESP samples. Annotated numbers on the 
histogram indicate the absolute copy number, as estimated from whole genome 
sequencing of HapMap samples. 
(top right): Correlation between SVD-ZRPKM values and whole-genome derived 
absolute copy number for 7 HapMap samples. The poor resolution of BWA-based 
mappings for this locus contribute to a poor correlation and low accuracy.
(bottom left, right): the same locus for mrsFAST-based mappings. Both the histogram 
and the scatter plot show markedly increased resolution for distinguishing copy number 
states and improved SVD-ZRPKM to absolute copy-number correlation.
BWA mappings
mrsFAST mappings
2
2
S
V
D
-Z
R
P
K
M
 G
en
ot
yp
e 
Va
lu
e
S
V
D
-Z
R
P
K
M
 G
en
ot
yp
e 
Va
lu
e
SVD-ZRPKM Genotype Value
SVD-ZRPKM Genotype Value
Whole-Genome Copy Number 
Estimate
Whole-Genome Copy Number 
Estimate
sa
m
p
l
e
,
 
c
h
r
:
 
s
t
a
r
t
 
?
 
s
t
o
p
 
(
h
g
1
9
)
B
C
E
D
A
F
A
)
 
C
N
V
 
c
a
l
l
 
f
r
o
m
 
C
o
n
r
a
d
 
e
t
 
a
l
.
 
(
2
0
1
0
)
B
)
 
S
N
P
-
a
r
r
a
y
 
d
a
t
a
 
(
b
l
a
c
k
 
l
i
n
e
s
 
-
 
L
o
g
R
;
 
b
l
u
e
 
d
o
t
s
 
B
-
a
l
l
e
l
e
 
f
r
e
q
u
e
n
c
y
)
C
)
 
W
h
o
l
e
-
G
e
n
o
m
e
 
r
e
a
d
 
d
e
p
t
h
 
f
r
o
m
 
S
u
d
m
a
n
t
 
e
t
 
a
l
.
 
(
2
0
1
0
)
 
?
!
s
e
e
 
k
e
y
 
a
t
 
r
i
g
h
t
D
)
 
E
x
o
m
e
-
b
a
s
e
d
 
C
N
V
 
c
a
l
l
E
)
 
S
V
D
-
Z
R
P
K
M
 
v
a
l
u
e
s
 
(
b
l
u
e
 
l
i
n
e
:
 
s
a
m
p
l
e
 
w
i
t
h
 
c
a
l
l
;
 
b
l
a
c
k
 
l
i
n
e
s
:
 
5
3
3
 
E
S
P
 
s
a
m
p
l
e
s
)
F
)
 
R
e
f
s
e
q
 
G
e
n
e
s
1
0
9 8 7 6 5 4 3 2 1 0
k
e
y
Fi
gu
re
(s
) S
7
R
ar
e 
D
up
lic
at
io
n
R
ar
e 
D
up
lic
at
io
n
R
ar
e 
D
up
lic
at
io
n
R
ar
e 
D
up
lic
at
io
n
C
N
P
C
N
P
Fa
ls
e 
Po
si
tiv
e
Fa
ls
e 
Po
si
tiv
e
Fa
ls
e 
Po
si
tiv
e
AB
C
D
F
E
E
A) Coordinates of genomic interval shown
B) Samples highlighted (blue - parents, red - proband)
C) Call in proband
D) Segmental duplications
E) Threshold used to call deletion/duplications
F) Genes
Figure(s) S8


Figure S9: ExomeCNV results for two references from different cohorts
Test/Reference from
different batches
(ExomeCNV)
Test/Reference from
same batch
(ExomeCNV)
Chromosome 20
0% 10% 20% 30% 40% 50%
CN 1
CN 2
CN 3
CN 4
CN 5
0% 20% 40% 60% 80% 100%
CN 1
CN 2
CN 3
CN 4
CN 5
Copy number distribution 
across exome
Copy number distribution 
across exome
Test:  ESP_3470
Reference:  ESP_3247
Test:  ESP_3240
Reference:  ESP_3247
Ex
on
 Lo
gR
Ex
on
 Lo
gR
Genomic coordinate
Genomic coordinate
Figure S9: ExomeCNV results for two references from different cohorts
Top: Comparison of two ESP exomes from differing cohorts. Plot shows ExomeCNV 
LogR output for chromosome 20 and colored bars indicate location of altered copy 
number. A biologically implausible fraction of the exome (96.6%) is marked as non-
diploid (bar chart, top right).
Bottom: Using the SVD algorithm, we matched the same reference (ESP_3247) to a 
sample from the same cohort/experimental batch. Accordingly, ExomeCNV was less 
influenced by systematic noise stemming from the exome capture, and marked a much 
more realistic 99.6% of the exome as diploid.
Figure S10: ExomeCNV and CoNIFER genotyping comparison summary
r2 between exome signal 
and whole-genome estimate of copy number
# o
f L
oc
i
This work / CoNIFER    (     =  false negative for ExomeCNV)
ExomeCNV (Sathirapongsasuti et al.)
r2 = 0.99
r2 = 0.96
r2 = 0.56
r2 = 0.98
r2 = 0.02
r2 = 0.83
r2 = 0.001
r2 = 0.40
PDPR CCL4 LRRC37A NBPF3
ExomeCNV
This work/
CoNIFER
0
0.25
0.50
0.75
1.00
1 to 4 4 to 6 6 to 7 7 to 12 12 +
Median copy number of locus (per whole 
genome read-depth analysis)
Me
dia
n R
2  v
alu
e
This work / CoNIFER
ExomeCNV
A.
B. C.
Figure S10: ExomeCNV and CoNIFER genotyping comparison summary
a) Comparison of genotyping correlation between ExomeCNV LogR value (y-axis; top 
row) and SVD-ZRPKM value (y-axis, bottom row) vs. absolute copy number established 
by whole-genome read-depth (x-axis, both rows; Sudmant et al., 2010) for four selected 
loci. b) Distribution of r
2
 values across 62 genotyped CNP loci: green bars represent 
ExomeCNV results (median r
2 
= 0.57); dark blue bars are the same loci assayed using 
this work?s algorithm (median r
2 
= 0.92), while light blue bars represent loci which could 
not be assayed using ExomeCNV (11 loci). c) Median r
2
 correlations for ExomeCNV and 
our algorithm, binned by the median copy number of each CNP locus.
Table S1: Precision of HapMap Calls
Sample
hg19 Coordinates 
(chr - start - stop)
Call Type
Reciprocal Overlap 
(%)
Annotation
NA15510 9 108,380,239 109,692,970 duplication 0% Rare
NA19240 12 133,659,688 133,727,740 duplication 15% Rare
NA15510 7 99,507,187 99,627,998 duplication 51% Rare
NA18517 16 21,396,577 21,756,357 duplication 55% Rare
NA15510 3 19,492,646 21,465,556 duplication 56% Rare
NA19129 6 29,910,533 30,043,566 deletion 83% Rare
NA18517 4 68,788,472 69,057,034 duplication 85% Rare
NA15510 1 155,227,075 155,264,543 duplication 95% Rare
NA19240 17 20,355,663 20,356,920 duplication 3% CNP
NA12878 19 46,543,414 46,663,725 deletion 5% CNP
NA19240 15 20,649,180 20,667,952 duplication 5% CNP
NA19240 17 34,416,017 34,432,705 duplication 6% CNP
NA19240 17 34,624,770 34,641,537 duplication 6% CNP
NA19240 17 34,523,196 34,539,971 duplication 6% CNP
NA19240 17 18,392,254 18,395,859 duplication 7% CNP
NA19129 22 21,063,583 21,067,678 duplication 7% CNP
NA19240 15 43,891,024 43,903,182 deletion 9% CNP
NA18517 5 69,716,795 69,741,679 duplication 10% CNP
NA19240 17 25,970,550 25,976,008 duplication 10% CNP
NA19129 8 6,835,291 7,092,762 duplication 17% CNP
NA12878 3 195,452,668 195,474,251 deletion 22% CNP
NA18517 5 70,297,524 70,357,135 duplication 22% CNP
NA19240 17 39,535,207 39,551,622 duplication 24% CNP
NA18517 5 68,862,415 68,886,194 duplication 25% CNP
NA12878 22 22,328,728 23,249,131 deletion 30% CNP
NA18517 6 35,748,887 35,787,224 duplication 31% CNP
NA19240 6 29,643,162 30,041,255 duplication 32% CNP
NA12878 17 34,624,258 34,748,588 deletion 36% CNP
NA18517 2 89,049,497 89,292,723 duplication 37% CNP
NA12878 17 34,499,213 34,583,568 deletion 41% CNP
NA15510 19 33,481,419 33,503,630 duplication 43% CNP
NA19240 14 106,385,019 107,114,522 duplication 47% CNP
NA19129 19 41,313,388 41,388,115 duplication 47% CNP
NA15510 7 100,319,584 100,336,236 deletion 50% CNP
NA19240 10 124,362,280 124,377,856 duplication 50% CNP
NA19240 22 22,676,730 22,937,826 duplication 50% CNP
NA19240 7 100,319,584 100,336,236 deletion 50% CNP
NA15510 15 22,077,863 22,739,445 duplication 50% CNP
NA19240 16 72,097,154 72,109,118 duplication 52% CNP
NA19240 1 110,173,564 110,260,048 duplication 56% CNP
NA19240 15 22,082,360 22,490,341 duplication 56% CNP
NA15510 22 23,010,695 23,155,078 duplication 57% CNP
NA19129 2 89,246,519 90,260,548 deletion 58% CNP
NA15510 5 180,335,585 180,431,762 deletion 59% CNP
NA12878 22 24,325,587 24,431,968 duplication 62% CNP
NA15510 14 106,235,274 106,347,756 deletion 62% CNP
NA15510 19 41,313,713 41,388,115 deletion 64% CNP
NA18517 16 72,092,152 72,120,701 duplication 68% CNP
Sample
hg19 Coordinates 
(chr - start - stop)
Call Type
Reciprocal Overlap 
(%)
Annotation
NA19240 19 33,481,419 33,517,538 duplication 70% CNP
NA18517 19 43,514,149 43,857,918 duplication 74% CNP
NA19129 22 21,827,328 21,905,265 duplication 76% CNP
NA19240 3 196,528,797 196,547,438 duplication 76% CNP
NA19240 14 20,215,586 20,404,761 duplication 77% CNP
NA18517 14 106,235,274 106,332,116 duplication 81% CNP
NA19240 22 23,029,156 23,235,998 duplication 83% CNP
NA19129 1 155,178,595 155,218,094 duplication 92% CNP
NA19129 10 124,340,381 124,358,613 duplication 95% CNP
NA12878 5 180,377,236 180,430,876 duplication 98% CNP
NA18517 1 161,479,609 161,643,863 deletion 100% CNP
NA12878 1 25,627,436 25,633,220 deletion - CNP
NA12878 4 69,403,342 69,434,202 deletion - CNP
NA12878 7 141,754,553 141,759,766 deletion - CNP
NA12878 11 60,971,578 61,018,753 deletion - CNP
NA12878 14 106,233,850 106,332,116 deletion - CNP
NA12878 17 18,390,960 18,396,173 deletion - CNP
NA12878 17 20,354,793 20,361,697 deletion - CNP
NA12878 17 25,969,263 25,976,008 deletion - CNP
NA12878 17 34,398,310 34,495,524 deletion - CNP
NA15510 1 25,617,131 25,643,570 deletion - CNP
NA15510 1 25,712,201 25,729,237 deletion - CNP
NA15510 19 39,329,036 39,334,554 deletion - CNP
NA18517 22 24,373,137 24,431,968 deletion - CNP
NA12878 9 39,102,493 39,149,974 duplication - CNP
NA15510 3 196,509,517 196,539,722 duplication - CNP
NA18517 19 43,236,935 43,439,921 duplication - CNP
NA19129 22 24,572,078 24,581,897 duplication - CNP
NA19240 6 31,969,272 31,970,032 duplication 3% HLA locus
NA19240 6 31,999,819 32,003,054 duplication 13% HLA locus
NA12878 6 31,948,780 31,961,254 deletion 19% HLA locus
NA12878 6 31,978,929 31,996,054 deletion 26% HLA locus
NA19240 6 31,982,819 31,996,054 duplication 27% HLA locus
NA19240 6 31,949,884 31,961,254 duplication 31% HLA locus
NA19129 6 31,948,780 31,961,254 deletion - HLA locus
NA19129 6 31,969,272 31,970,317 deletion - HLA locus
NA19129 6 31,982,622 31,995,709 deletion - HLA locus
NA19129 6 31,999,327 32,003,054 deletion - HLA locus
Ta
b
l
e
 
S
2
:
 
S
e
n
s
i
t
i
v
i
t
y
 
v
s
.
 
C
o
n
r
a
d
 
e
t
 
a
l
.
 
a
C
G
H
 
C
a
l
l
s
C
o
n
r
a
d
 
C
N
V
R
S
a
m
p
l
e
h
g
1
9
 
C
o
o
r
d
i
n
a
t
e
s
 
(
c
h
r
 
-
 
s
t
a
r
t
 
-
 
s
t
o
p
)
h
g
1
9
 
C
o
o
r
d
i
n
a
t
e
s
 
(
c
h
r
 
-
 
s
t
a
r
t
 
-
 
s
t
o
p
)
h
g
1
9
 
C
o
o
r
d
i
n
a
t
e
s
 
(
c
h
r
 
-
 
s
t
a
r
t
 
-
 
s
t
o
p
)
C
a
l
l
 
T
y
p
e
#
 
e
x
o
m
e
 
p
r
o
b
e
s
G
e
n
e
s
A
n
n
o
t
a
t
i
o
n
D
i
s
c
o
v
e
r
e
d
?
C
N
V
R
1
3
1
3
.
1
N
A
1
5
5
1
0
3
1
9
,
5
3
5
,
6
5
3
2
0
,
6
3
8
,
5
0
1
d
u
p
l
i
c
a
t
i
o
n
4
7
K
C
N
H
8
,
 
E
F
H
B
,
 
R
A
B
5
A
,
 
K
A
T
2
B
,
 
S
G
O
L
1
R
a
r
e
Y
e
s
C
N
V
R
1
9
5
2
.
1
N
A
1
8
5
1
7
4
6
8
,
7
8
8
,
7
3
0
6
9
,
0
1
6
,
1
0
1
d
u
p
l
i
c
a
t
i
o
n
1
7
T
M
P
R
S
S
1
1
A
R
a
r
e
Y
e
s
C
N
V
R
3
5
0
7
.
1
N
A
1
5
5
1
0
7
9
9
,
5
6
4
,
1
3
3
9
9
,
6
2
5
,
4
1
1
d
u
p
l
i
c
a
t
i
o
n
6
A
Z
G
P
1
,
 
Z
K
S
C
A
N
1
R
a
r
e
Y
e
s
C
N
V
R
5
7
9
1
.
2
N
A
1
9
2
4
0
1
2
1
3
3
,
7
1
7
,
2
0
2
1
3
3
,
7
7
9
,
4
2
5
d
u
p
l
i
c
a
t
i
o
n
8
Z
N
F
1
4
0
,
 
Z
N
F
1
0
,
 
Z
N
F
2
6
8
R
a
r
e
Y
e
s
C
N
V
R
6
6
6
8
.
5
N
A
1
8
5
1
7
1
6
2
1
,
5
2
3
,
0
4
4
2
1
,
9
4
6
,
3
4
7
d
u
p
l
i
c
a
t
i
o
n
3
8
M
E
T
T
L
9
,
 
I
G
S
F
6
,
 
O
T
O
A
R
a
r
e
Y
e
s
C
N
V
R
4
3
9
3
.
2
N
A
1
2
8
7
8
9
9
1
,
9
6
3
,
4
0
3
9
2
,
3
4
3
,
3
8
2
d
u
p
l
i
c
a
t
i
o
n
2
7
C
o
n
r
a
d
 
F
a
l
s
e
 
P
o
s
i
t
i
v
e
-
-
C
N
V
R
6
1
5
2
.
2
N
A
1
5
5
1
0
1
4
5
0
,
1
0
1
,
8
9
8
5
0
,
9
4
2
,
5
2
7
d
u
p
l
i
c
a
t
i
o
n
1
6
1
C
o
n
r
a
d
 
F
a
l
s
e
 
P
o
s
i
t
i
v
e
-
-
C
N
V
R
6
6
3
1
.
1
N
A
1
9
1
2
9
1
6
8
,
9
3
9
,
8
0
7
8
,
9
8
7
,
0
2
5
d
u
p
l
i
c
a
t
i
o
n
4
C
o
n
r
a
d
 
F
a
l
s
e
 
P
o
s
i
t
i
v
e
-
-
C
N
V
R
2
8
6
1
.
1
N
A
1
8
5
1
7
6
3
5
,
7
5
4
,
7
9
1
3
5
,
7
6
6
,
6
8
0
d
u
p
l
i
c
a
t
i
o
n
4
C
L
P
S
C
N
P
Y
e
s
C
N
V
R
3
5
0
9
.
1
N
A
1
9
2
4
0
7
1
0
0
,
3
2
7
,
8
6
2
1
0
0
,
3
3
7
,
8
8
6
d
e
l
e
t
i
o
n
6
E
P
O
C
N
P
Y
e
s
C
N
V
R
8
1
1
4
.
1
N
A
1
2
8
7
8
2
2
2
4
,
3
4
4
,
2
1
1
2
4
,
4
0
4
,
5
6
4
d
u
p
l
i
c
a
t
i
o
n
7
G
S
T
T
1
C
N
P
Y
e
s
C
N
V
R
3
3
9
.
1
 
N
A
1
5
5
1
0
1
1
4
4
,
9
5
0
,
0
5
4
1
4
5
,
0
8
0
,
1
4
0
d
u
p
l
i
c
a
t
i
o
n
7
P
D
E
4
D
I
P
C
N
P
N
o
C
N
V
R
3
3
9
.
1
 
N
A
1
2
8
7
8
1
1
4
4
,
9
4
8
,
2
8
3
1
4
5
,
0
8
0
,
1
4
0
d
u
p
l
i
c
a
t
i
o
n
7
P
D
E
4
D
I
P
C
N
P
N
o
C
N
V
R
3
3
9
.
1
 
N
A
1
8
5
1
7
1
1
4
4
,
9
5
6
,
6
9
4
1
4
5
,
0
8
3
,
9
9
4
d
u
p
l
i
c
a
t
i
o
n
4
P
D
E
4
D
I
P
C
N
P
N
o
C
N
V
R
7
5
9
.
1
 
N
A
1
9
1
2
9
2
3
8
,
9
5
6
,
2
8
5
3
8
,
9
7
2
,
4
9
3
d
e
l
e
t
i
o
n
5
G
A
L
M
C
N
P
N
o
C
N
V
R
7
5
9
.
1
 
N
A
1
2
8
7
8
2
3
8
,
9
5
5
,
8
7
7
3
8
,
9
7
2
,
4
9
3
d
e
l
e
t
i
o
n
5
G
A
L
M
C
N
P
N
o
C
N
V
R
7
5
9
.
1
 
N
A
1
9
2
4
0
2
3
8
,
9
5
5
,
9
4
6
3
8
,
9
7
2
,
8
3
0
d
e
l
e
t
i
o
n
5
G
A
L
M
C
N
P
N
o
C
N
V
R
7
5
9
.
1
 
N
A
1
8
5
1
7
2
3
8
,
9
5
6
,
2
8
5
3
8
,
9
7
2
,
9
3
7
d
e
l
e
t
i
o
n
5
G
A
L
M
C
N
P
N
o
C
N
V
R
2
7
1
9
.
1
N
A
1
8
5
1
7
5
1
8
0
,
3
7
4
,
6
1
0
1
8
0
,
4
3
1
,
1
1
0
d
u
p
l
i
c
a
t
i
o
n
1
1
B
T
N
L
8
/
3
C
N
P
N
o
C
N
V
R
2
7
1
9
.
1
N
A
1
2
8
7
8
5
1
8
0
,
3
7
6
,
2
2
3
1
8
0
,
4
3
0
,
7
1
5
d
u
p
l
i
c
a
t
i
o
n
8
B
T
N
L
8
/
3
C
N
P
N
o
C
N
V
R
2
7
2
8
.
1
N
A
1
9
2
4
0
6
2
5
7
,
1
0
0
3
8
2
,
9
8
3
d
u
p
l
i
c
a
t
i
o
n
8
D
U
S
P
2
2
C
N
P
N
o
C
N
V
R
2
7
2
8
.
1
N
A
1
2
8
7
8
6
2
5
5
,
6
5
0
3
8
2
,
5
0
8
d
u
p
l
i
c
a
t
i
o
n
8
D
U
S
P
2
2
C
N
P
N
o
C
N
V
R
2
7
2
8
.
1
N
A
1
9
1
2
9
6
2
5
4
,
4
5
8
3
8
2
,
4
5
3
d
u
p
l
i
c
a
t
i
o
n
8
D
U
S
P
2
2
C
N
P
N
o
C
N
V
R
2
7
2
8
.
1
N
A
1
8
5
1
7
6
2
5
7
,
3
0
9
3
8
4
,
4
0
8
d
u
p
l
i
c
a
t
i
o
n
8
D
U
S
P
2
2
C
N
P
N
o
C
N
V
R
2
8
6
1
.
1
N
A
1
9
1
2
9
6
3
5
,
7
5
4
,
9
9
6
3
5
,
7
6
6
,
5
7
0
d
u
p
l
i
c
a
t
i
o
n
4
C
L
P
S
C
N
P
N
o
C
N
V
R
2
8
6
1
.
1
N
A
1
9
2
4
0
6
3
5
,
7
5
4
,
7
3
6
3
5
,
7
6
6
,
4
1
5
d
u
p
l
i
c
a
t
i
o
n
5
C
L
P
S
C
N
P
N
o
C
N
V
R
2
8
6
1
.
1
N
A
1
2
8
7
8
6
3
5
,
7
5
4
,
5
9
1
3
5
,
7
6
6
,
4
1
5
d
u
p
l
i
c
a
t
i
o
n
5
C
L
P
S
C
N
P
N
o
C
N
V
R
4
9
1
2
.
1
N
A
1
8
5
1
7
1
0
1
2
4
,
3
6
0
,
4
0
2
1
2
4
,
3
7
6
,
5
8
7
d
e
l
e
t
i
o
n
8
D
M
B
T
1
C
N
P
N
o
C
N
V
R
4
9
1
2
.
1
N
A
1
2
8
7
8
1
0
1
2
4
,
3
6
0
,
5
1
2
1
2
4
,
3
7
6
,
4
2
7
d
e
l
e
t
i
o
n
7
D
M
B
T
1
C
N
P
N
o
C
N
V
R
4
9
1
2
.
2
N
A
1
8
5
1
7
1
0
1
2
4
,
3
4
2
,
5
2
9
1
2
4
,
3
5
1
,
7
5
2
d
u
p
l
i
c
a
t
i
o
n
6
D
M
B
T
1
C
N
P
N
o
C
N
V
R
4
9
1
2
.
3
N
A
1
2
8
7
8
1
0
1
2
4
,
3
4
2
,
3
3
7
1
2
4
,
3
6
0
,
4
5
9
d
u
p
l
i
c
a
t
i
o
n
1
4
D
M
B
T
1
C
N
P
N
o
C
N
V
R
4
9
1
2
.
3
N
A
1
9
2
4
0
1
0
1
2
4
,
3
4
2
,
5
5
6
1
2
4
,
3
6
0
,
6
8
2
d
u
p
l
i
c
a
t
i
o
n
1
5
D
M
B
T
1
C
N
P
N
o
Co
n
r
a
d
 
C
N
V
R
S
a
m
p
l
e
h
g
1
9
 
C
o
o
r
d
i
n
a
t
e
s
 
(
c
h
r
 
-
 
s
t
a
r
t
 
-
 
s
t
o
p
)
h
g
1
9
 
C
o
o
r
d
i
n
a
t
e
s
 
(
c
h
r
 
-
 
s
t
a
r
t
 
-
 
s
t
o
p
)
h
g
1
9
 
C
o
o
r
d
i
n
a
t
e
s
 
(
c
h
r
 
-
 
s
t
a
r
t
 
-
 
s
t
o
p
)
C
a
l
l
 
T
y
p
e
#
 
e
x
o
m
e
 
p
r
o
b
e
s
G
e
n
e
s
A
n
n
o
t
a
t
i
o
n
D
i
s
c
o
v
e
r
e
d
?
C
N
V
R
4
9
1
2
.
3
N
A
1
9
1
2
9
1
0
1
2
4
,
3
4
1
,
2
0
8
1
2
4
,
3
5
8
,
6
7
3
d
u
p
l
i
c
a
t
i
o
n
1
5
D
M
B
T
1
C
N
P
N
o
C
N
V
R
5
1
7
9
.
1
N
A
1
9
1
2
9
1
1
5
5
,
3
6
6
,
1
5
4
5
5
,
4
5
2
,
9
9
2
d
e
l
e
t
i
o
n
6
O
R
4
H
i
g
h
 
D
i
v
e
r
s
i
t
y
N
o
C
N
V
R
5
1
7
9
.
1
N
A
1
2
8
7
8
1
1
5
5
,
3
6
5
,
7
4
2
5
5
,
4
5
3
,
0
6
1
d
e
l
e
t
i
o
n
6
O
R
4
H
i
g
h
 
D
i
v
e
r
s
i
t
y
N
o
C
N
V
R
6
0
7
2
.
4
N
A
1
9
2
4
0
1
4
2
0
,
1
7
7
,
2
7
0
2
0
,
4
2
2
,
5
8
2
d
u
p
l
i
c
a
t
i
o
n
6
O
R
4
H
i
g
h
 
D
i
v
e
r
s
i
t
y
N
o
C
N
V
R
6
0
7
2
.
5
N
A
1
8
5
1
7
1
4
2
0
,
2
8
9
,
6
8
0
2
0
,
4
2
4
,
6
1
6
d
e
l
e
t
i
o
n
4
O
R
4
H
i
g
h
 
D
i
v
e
r
s
i
t
y
N
o
C
N
V
R
7
0
9
5
.
1
N
A
1
8
5
1
7
1
7
3
9
,
3
8
2
,
8
7
1
3
9
,
3
9
5
,
4
3
0
d
e
l
e
t
i
o
n
3
K
R
T
A
P
H
i
g
h
 
D
i
v
e
r
s
i
t
y
N
o
C
N
V
R
7
0
9
7
.
1
N
A
1
9
2
4
0
1
7
3
9
,
5
0
7
,
0
5
5
3
9
,
5
2
5
,
6
2
4
d
e
l
e
t
i
o
n
6
K
R
T
3
4
C
N
P
N
o
C
N
V
R
7
0
9
8
.
1
N
A
1
9
2
4
0
1
7
3
9
,
5
3
2
,
3
0
1
3
9
,
5
3
9
,
2
0
5
d
u
p
l
i
c
a
t
i
o
n
7
K
R
T
3
4
C
N
P
N
o
C
N
V
R
7
6
7
3
.
1
N
A
1
2
8
7
8
1
9
4
6
,
6
2
2
,
8
3
1
4
6
,
6
2
8
,
2
6
1
d
e
l
e
t
i
o
n
3
I
G
F
L
3
C
N
P
N
o
C
N
V
R
7
7
0
2
.
1
N
A
1
9
1
2
9
1
9
5
2
,
1
3
1
,
8
0
4
5
2
,
1
4
8
,
9
1
3
d
e
l
e
t
i
o
n
8
S
I
G
L
E
C
1
4
C
N
P
N
o
C
N
V
R
7
7
0
8
.
1
N
A
1
9
2
4
0
1
9
5
3
,
3
2
2
,
9
8
9
5
3
,
3
6
1
,
3
5
8
d
u
p
l
i
c
a
t
i
o
n
3
Z
N
F
4
6
8
H
i
g
h
 
D
i
v
e
r
s
i
t
y
N
o
C
N
V
R
7
7
6
3
.
1
N
A
1
9
1
2
9
2
0
1
,
5
5
2
,
9
6
3
1
,
5
9
5
,
6
8
9
d
e
l
e
t
i
o
n
4
S
I
R
B
P
1
C
N
P
N
o
C
N
V
R
7
7
6
3
.
1
N
A
1
8
5
1
7
2
0
1
,
5
5
2
,
9
6
3
1
,
5
9
5
,
6
8
9
d
e
l
e
t
i
o
n
4
S
I
R
B
P
1
C
N
P
N
o
C
N
V
R
7
7
6
3
.
1
N
A
1
9
2
4
0
2
0
1
,
5
5
2
,
9
6
3
1
,
5
9
5
,
6
8
9
d
e
l
e
t
i
o
n
4
S
I
R
B
P
1
C
N
P
N
o
C
N
V
R
8
1
1
4
.
1
N
A
1
9
1
2
9
2
2
2
4
,
3
4
4
,
5
9
5
2
4
,
4
0
4
,
4
9
5
d
u
p
l
i
c
a
t
i
o
n
7
G
S
T
T
1
C
N
P
N
o
C
N
V
R
8
1
1
4
.
7
N
A
1
9
2
4
0
2
2
2
4
,
3
6
4
,
5
6
8
2
4
,
4
0
4
,
7
0
1
d
u
p
l
i
c
a
t
i
o
n
7
G
S
T
T
1
C
N
P
N
o
C
N
V
R
8
1
1
4
.
7
N
A
1
5
5
1
0
2
2
2
4
,
3
7
1
,
0
9
5
2
4
,
4
0
4
,
7
1
5
d
u
p
l
i
c
a
t
i
o
n
7
G
S
T
T
1
C
N
P
N
o
C
N
V
R
4
8
4
1
.
3
N
A
1
9
1
2
9
1
0
8
9
,
1
8
9
,
7
0
6
8
9
,
2
7
5
,
9
4
4
d
u
p
l
i
c
a
t
i
o
n
4
C
N
P
N
o
Table S3(a): Rare CNVs in 122 ASD probands
Sample
hg19 Coordinates 
(chr - start - stop)
Inheritance SNP Microarray
11696.p1 chr3 37,170,553 37,494,050 de novo Validated
11711.p1 chr5 175,913,355 175,956,388 de novo Did Not Validate
11218.p1 chr5 175,913,355 175,956,645 de novo Did Not Validate
12581.p1 chr9 140,671,069 141,015,333 de novo Validated
13726.p1 chr11 56,510,303 60,235,941 de novo Validated
11928.p1 chr15 30,919,023 32,404,100 de novo Validated
13335.p1 chr16 29,475,783 30,204,395 de novo Validated
13815.p1 chr16 75,681,780 76,532,591 de novo Validated
13844.p1 chr1 47,501,688 47,546,339 Matching event in parent No Data
12810.p1 chr1 86,965,336 87,043,755 Matching event in parent Validated
11715.p1 chr1 185,089,514 185,137,530 Matching event in parent Validated
12667.p1 chr1 185,092,988 185,144,245 Matching event in parent Validated
11895.p1 chr1 206,241,532 206,557,431 Matching event in parent Validated
12130.p1 chr1 206,241,532 206,557,431 Matching event in parent Validated
11707.p1 chr1 207,307,748 207,640,257 Matching event in parent Validated
11064.p1 chr2 33,622,199 36,691,798 Matching event in parent Validated
11472.p1 chr2 44,508,525 44,549,039 Matching event in parent Validated
11895.p1 chr2 86,276,282 86,677,085 Matching event in parent Validated
11023.p1 chr2 198,285,151 198,593,302 Matching event in parent Validated
11023.p1 chr2 209,027,927 209,104,727 Matching event in parent Validated
11526.p1 chr3 47,539,775 47,619,418 Matching event in parent Did Not Validate
13741.p1 chr3 100,277,248 100,455,560 Matching event in parent No Data
11722.p1 chr3 100,287,665 100,451,516 Matching event in parent Validated
11303.p1 chr3 100,295,768 100,447,702 Matching event in parent Validated
13335.p1 chr3 141,712,379 142,090,170 Matching event in parent Validated
11262.p1 chr3 151,461,880 152,018,156 Matching event in parent Validated
12565.p1 chr3 151,461,880 152,018,156 Matching event in parent Validated
11193.p1 chr3 196,454,793 196,626,933 Matching event in parent Did Not Validate
11224.p1 chr4 5,699,319 5,795,444 Matching event in parent Validated
11190.p1 chr4 107,845,110 108,935,744 Matching event in parent Validated
11056.p1 chr5 32,093,012 32,235,235 Matching event in parent Validated
11788.p1 chr5 32,093,012 32,235,235 Matching event in parent Validated
11480.p1 chr5 32,097,384 32,242,233 Matching event in parent Validated
13733.p1 chr5 80,659,606 80,742,741 Manual inspection of parental SVD-ZRPKM values No Data
11469.p1 chr5 112,899,555 113,740,553 Matching event in parent Validated
12130.p1 chr5 112,902,788 113,740,553 Matching event in parent Validated
13102.p1 chr5 138,456,723 138,699,611 Manual inspection of parental SVD-ZRPKM values No Data
11303.p1 chr5 138,643,104 138,700,432 Matching event in parent Did Not Validate
11198.p1 chr5 156,378,522 156,482,544 Manual inspection of parental SVD-ZRPKM values Did Not Validate
11480.p1 chr6 25,923,922 26,368,495 Matching event in parent Validated
13207.p1 chr6 49,421,297 49,459,988 Manual inspection of parental SVD-ZRPKM values Did Not Validate
13822.p1 chr6 55,142,177 55,300,561 Manual inspection of parental SVD-ZRPKM values No Data
11425.p1 chr6 56,882,004 56,993,638 Manual inspection of parental SVD-ZRPKM values Validated
11459.p1 chr6 88,311,501 88,374,577 Matching event in parent Validated
12212.p1 chr6 107,420,452 107,824,999 Matching event in parent Validated
11518.p1 chr6 168,317,768 168,442,831 Matching event in parent Validated
13726.p1 chr6 168,319,414 168,443,396 Matching event in parent No Data
12933.p1 chr6 168,319,414 168,711,126 Matching event in parent Validated
Sample
hg19 Coordinates 
(chr - start - stop)
Inheritance SNP Microarray
11472.p1 chr6 168,319,414 168,711,964 Matching event in parent Validated
11722.p1 chr6 168,323,535 168,439,409 Matching event in parent Validated
12667.p1 chr6 168,323,535 168,442,831 Matching event in parent Validated
11863.p1 chr6 168,323,535 168,458,019 Matching event in parent Validated
13557.p1 chr6 168,325,684 168,711,126 Matching event in parent Validated
11398.p1 chr7 11,101,590 12,620,846 Matching event in parent Validated
11696.p1 chr7 16,834,559 17,838,777 Matching event in parent Validated
12667.p1 chr7 33,066,428 33,297,022 Manual inspection of parental SVD-ZRPKM values Validated
11722.p1 chr7 48,285,108 48,431,736 Manual inspection of parental SVD-ZRPKM values Validated
13822.p1 chr7 142,659,290 142,960,678 Matching event in parent No Data
11526.p1 chr7 142,659,290 142,961,260 Matching event in parent Validated
11218.p1 chr7 142,723,286 142,960,678 Matching event in parent Validated
11843.p1 chr7 152,740,571 154,664,403 Matching event in parent Did Not Validate
11141.p1 chr8 13,071,835 15,480,758 Matching event in parent Validated
12130.p1 chr8 15,601,046 16,032,809 Manual inspection of parental SVD-ZRPKM values Validated
11556.p1 chr8 15,601,046 16,035,497 Matching event in parent Validated
11753.p1 chr8 29,197,614 29,953,044 Matching event in parent Did Not Validate
12933.p1 chr8 29,197,614 29,959,489 Matching event in parent Did Not Validate
13222.p1 chr8 29,197,614 29,959,489 Matching event in parent Did Not Validate
11660.p1 chr8 29,202,886 29,959,489 Matching event in parent Did Not Validate
11722.p1 chr8 29,940,362 30,335,353 Matching event in parent Validated
11303.p1 chr8 98,725,889 98,973,758 Matching event in parent Did Not Validate
11638.p1 chr8 98,731,276 98,863,702 Matching event in parent Did Not Validate
11479.p1 chr8 98,731,276 98,943,750 Matching event in parent Did Not Validate
11023.p1 chr8 98,735,106 98,900,470 Matching event in parent Did Not Validate
11414.p1 chr8 98,735,106 98,954,127 Matching event in parent Did Not Validate
11827.p1 chr8 98,735,106 98,954,127 Matching event in parent Did Not Validate
13741.p1 chr8 102,678,816 103,662,619 Matching event in parent No Data
12378.p1 chr9 134,360,072 134,458,089 Matching event in parent Validated
11218.p1 chr9 139,327,606 139,354,326 Matching event in parent Did Not Validate
13844.p1 chr10 1,598,506 3,824,406 Matching event in parent No Data
13031.p1 chr10 75,005,679 75,034,352 Matching event in parent Did Not Validate
12378.p1 chr10 82,040,435 82,122,829 Matching event in parent Validated
12130.p1 chr10 132,965,059 133,761,295 Matching event in parent Did Not Validate
12118.p1 chr10 133,106,473 134,523,960 Matching event in parent Validated
11498.p1 chr10 135,233,529 135,368,588 Matching event in parent Validated
11148.p1 chr10 135,233,529 135,372,455 Matching event in parent Validated
11707.p1 chr10 135,340,899 135,372,455 Matching event in parent Validated
11498.p1 chr10 135,370,262 135,372,455 Matching event in parent Validated
11964.p1 chr11 14,856,527 14,989,400 Matching event in parent Validated
12430.p1 chr11 31,128,044 31,451,948 Matching event in parent Validated
11141.p1 chr11 84,822,704 85,366,752 Matching event in parent Did Not Validate
13532.p1 chr11 95,555,662 95,724,887 Matching event in parent Did Not Validate
11452.p1 chr11 95,560,949 95,724,887 Matching event in parent Did Not Validate
13008.p1 chr12 306,542 922,980 Matching event in parent Validated
11526.p1 chr12 15,035,072 15,090,986 Manual inspection of parental SVD-ZRPKM values Validated
13793.p1 chr12 19,440,406 19,626,289 Matching event in parent No Data
13741.p1 chr12 50,536,856 50,642,534 Matching event in parent No Data
12581.p1 chr12 112,167,609 112,323,840 Matching event in parent Validated
11193.p1 chr13 21,720,943 21,950,794 Matching event in parent Did Not Validate
Sample
hg19 Coordinates 
(chr - start - stop)
Inheritance SNP Microarray
11083.p1 chr13 50,118,872 50,237,331 Matching event in parent Validated
11257.p1 chr13 115,004,824 115,048,418 Matching event in parent Validated
13530.p1 chr14 67,940,136 68,276,006 Matching event in parent Validated
13533.p1 chr14 74,512,762 74,551,696 Matching event in parent Validated
11479.p1 chr15 43,692,241 43,708,007 Matching event in parent Did Not Validate
13812.p1 chr15 57,555,309 57,815,839 Matching event in parent No Data
13415.p1 chr15 57,555,309 57,816,949 Matching event in parent Validated
11556.p1 chr15 89,760,350 89,817,535 Matching event in parent Validated
12430.p1 chr15 100,269,327 100,537,794 Matching event in parent Did Not Validate
13812.p1 chr16 3,656,474 3,725,398 Matching event in parent No Data
11834.p1 chr16 21,763,689 22,538,986 Matching event in parent Validated
11526.p1 chr16 75,481,455 75,600,805 Matching event in parent Validated
11184.p1 chr16 81,171,041 81,194,510 Matching event in parent Validated
11964.p1 chr16 84,402,221 84,474,564 Matching event in parent Validated
13335.p1 chr17 644,540 708,487 Matching event in parent Validated
11707.p1 chr17 3,981,176 4,434,078 Matching event in parent Validated
13844.p1 chr17 4,192,524 4,391,189 Matching event in parent No Data
13733.p1 chr17 10,322,203 10,358,127 Matching event in parent No Data
13409.p1 chr17 72,322,488 72,733,256 Matching event in parent Validated
12667.p1 chr18 39,613,789 40,503,728 Matching event in parent Validated
13668.p1 chr18 47,813,109 48,252,545 Matching event in parent No Data
13494.p1 chr18 76,873,240 77,132,882 Matching event in parent Validated
13812.p1 chr19 4,212,596 4,233,304 Matching event in parent No Data
11141.p1 chr19 4,212,596 4,249,325 Manual inspection of parental SVD-ZRPKM values Did Not Validate
11205.p1 chr19 4,215,974 4,233,304 Matching event in parent Did Not Validate
13530.p1 chr19 4,215,974 4,234,821 Matching event in parent Did Not Validate
12118.p1 chr19 11,319,586 11,363,226 Matching event in parent Did Not Validate
11193.p1 chr19 17,440,933 17,452,512 Matching event in parent Did Not Validate
13116.p1 chr19 45,822,778 45,909,976 Matching event in parent Validated
13815.p1 chr19 57,792,174 57,967,854 Matching event in parent No Data
11013.p1 chr20 6,100,050 8,352,097 Matching event in parent Validated
13844.p1 chr21 35,446,000 36,079,687 Matching event in parent No Data
13593.p1 chr21 37,635,843 37,710,244 Matching event in parent Did Not Validate
12810.p1 chr22 32,495,169 32,788,346 Matching event in parent Validated
11947.p1 chr22 40,711,286 41,077,932 Matching event in parent Validated
11653.p1 chr22 41,568,502 41,634,889 Matching event in parent Validated
11043.p1 chr23 6,451,785 8,434,424 Matching event in parent No Data
12810.p1 chr23 135,487,850 135,594,172 Matching event in parent No Data
Table S3(b): CNPs in 122 ASD probands
Sample
hg19 Coordinates 
(chr - start - stop)
Gene
Processed 
Pseudogene
Type of 
Event
Validation
13812.p1 chr6 32,609,086 32,632,844 HLA NA
11523.p1 chr14 106,329,088 106,376,628 IgHeavy NA
12114.p1 chr2 179,255,799 179,315,757 PRKA Y CNP NA
11141.p1 chr2 179,296,823 179,315,757 PRKA Y CNP NA
11190.p1 chr2 179,296,823 179,315,757 PRKA Y CNP NA
12744.p1 chr2 179,296,823 179,315,757 PRKA Y CNP NA
11788.p1 chr2 179,296,823 179,318,347 PRKA Y CNP NA
11707.p1 chr2 179,300,871 179,312,313 PRKA Y CNP NA
11346.p1 chr2 179,300,871 179,315,170 PRKA Y CNP NA
11414.p1 chr2 179,300,871 179,315,170 PRKA Y CNP NA
11452.p1 chr2 179,300,871 179,315,170 PRKA Y CNP NA
11009.p1 chr2 179,300,871 179,315,757 PRKA Y CNP NA
11504.p1 chr2 179,300,871 179,315,757 PRKA Y CNP NA
11571.p1 chr2 179,300,871 179,315,757 PRKA Y CNP NA
11834.p1 chr2 179,300,871 179,315,757 PRKA Y CNP NA
12810.p1 chr2 179,300,871 179,315,757 PRKA Y CNP NA
13726.p1 chr2 179,300,871 179,318,347 PRKA Y CNP NA
11013.p1 chr2 179,300,871 179,320,878 PRKA Y CNP NA
11843.p1 chr2 179,306,336 179,312,313 PRKA Y CNP NA
11587.p1 chr2 179,306,336 179,315,170 PRKA Y CNP NA
11722.p1 chr3 16,635,161 16,640,105 DAZL CNP NA
13741.p1 chr3 16,636,039 16,640,105 DAZL CNP NA
11471.p1 chr3 16,636,820 16,639,048 DAZL CNP NA
13517.p1 chr3 16,636,820 16,640,105 DAZL CNP NA
11193.p1 chr5 158,523,981 158,634,904 RNF145 Y CNP NA
13812.p1 chr5 158,595,880 158,609,059 RNF145 Y CNP NA
11459.p1 chr5 158,600,990 158,697,453 RNF145 Y CNP NA
13409.p1 chr5 180,218,633 180,430,876 BTNL8 CNP NA
11964.p1 chr5 180,375,919 180,420,160 BTNL8 CNP NA
11224.p1 chr5 180,375,919 180,430,876 BTNL8 CNP NA
11753.p1 chr5 180,375,919 180,430,876 BTNL8 CNP NA
12212.p1 chr5 180,375,919 180,430,876 BTNL8 CNP NA
11064.p1 chr5 180,375,919 180,431,443 BTNL8 CNP NA
11518.p1 chr5 180,375,919 180,431,443 BTNL8 CNP NA
11599.p1 chr5 180,375,919 180,431,443 BTNL8 CNP NA
12249.p1 chr5 180,375,919 180,431,443 BTNL8 CNP NA
13793.p1 chr5 180,375,919 180,431,443 BTNL8 CNP NA
12667.p1 chr5 180,376,238 180,430,876 BTNL8 CNP NA
13335.p1 chr6 35,745,235 35,787,224 CLPS CNP NA
11498.p1 chr11 55,339,603 55,419,315 OR4 CNP NA
13741.p1 chr11 55,339,603 55,419,315 OR4 CNP NA
11471.p1 chr11 55,339,603 55,433,572 OR4 CNP NA
11587.p1 chr11 55,339,603 55,433,572 OR4 CNP NA
11013.p1 chr11 55,370,916 55,419,315 OR4 CNP NA
11205.p1 chr11 55,370,916 55,419,315 OR4 CNP NA
11291.p1 chr11 55,370,916 55,419,315 OR4 CNP NA
11753.p1 chr11 55,370,916 55,419,315 OR4 CNP NA
13409.p1 chr12 54,639,898 54,718,965 HNRNPA1 Y CNP NA
Sample
hg19 Coordinates 
(chr - start - stop)
Gene
Processed 
Pseudogene
Type of 
Event
Validation
12249.p1 chr12 104,376,576 104,387,282 TDG Y CNP NA
13409.p1 chr13 27,679,867 27,847,631 RPL21 Y CNP NA
11638.p1 chr17 45,201,251 45,297,419 CDC27 Y CNP NA
13409.p1 chr17 45,201,251 45,297,419 CDC27 Y CNP NA
11653.p1 chr19 52,132,290 52,149,893 SIGLEC14 CNP NA
12641.p1 chr19 52,133,551 52,149,313 SIGLEC14 CNP NA
13409.p1 chr12 53,291,211 53,410,394 KRT8/KRT18 CNP NA
11257.p1 chr14 20,248,481 20,529,142 OR4 CNP NA
11947.p1 chr17 39,502,370 39,553,791 KRT33/KRT34 CNP NA
Table S4: Genotyping Correlation with Whole-Genome Absolute Copy Number
Location Genes mrsFAST r
2
BWA r
2
Median 
Copy 
Number
chr1:104230039-104238912 AMY1A 0.98 8.11
chr1:110222301-110242933 GSTM2,GSTM1 0.99 0.86 3.12
chr1:144951760-145076079 PDE4DIP 0.99 0.61 6.66
chr1:145209110-145285912 NOTCH2NL 0.92 0.17 8.73
chr1:145293370-145368682 NBPF10 0.03 0.14 258.86
chr1:196788860-196801319 CFHR1 0.88 0.64 2.66
chr1:196825137-196896065 CFHR4 0.54 0.50 2.53
chr1:202415009-202496465 PPP1R12B 0.99 0.05 2.03
chr1:21766630-21811393 NBPF3 0.40 0.01 13.85
chr1:25598980-25656936 RHD 0.98 0.87 4.01
chr11:55403116-55451172 OR4P4,OR4S2,OR4C6 0.94 0.88 1.04
chr11:61008668-61018915 PGA5 0.98 0.21 5.98
chr12:11505418-11542473 PRB1 0.46 0.83 4.39
chr14:20202606-20420924 OR4Q3,OR4M1,OR4N2,OR4K2,OR4K5 0.96 0.93 3.75
chr14:74035771-74042359 ACOT2 0.98 0.08 3.00
chr15:22304656-22588026 OR4N4 0.97 0.61 4.23
chr15:30605924-30675622 CHRFAM7A 0.84 0.07 3.92
chr16:14766404-14788526 PLA2G10 0.22 0.11 8.57
chr16:15068832-15131552 PDXDC1 0.93 0.36 4.73
chr16:22524883-22547861 LOC100132247 0.08 48.81
chr16:32684848-32688053 TP53TG3B,TP53TG3 0.89 8.28
chr16:70148739-70196427 PDPR 0.96 0.91 4.73
chr17:18362101-18425291 LGALS9C 0.96 0.80 6.74
chr17:20353175-20370848 LGALS9B 0.93 0.04 6.57
chr17:34431219-34433014 CCL4 0.98 0.01 5.45
chr17:34522268-34524156 CCL3L1 0.95 6.88
chr17:34746118-34808091 TBC1D3H,TBC1D3G,TBC1D3C 0.75 47.70
chr17:36337711-36348666 TBC1D3 0.92 49.01
chr17:39506594-39525574 KRT33A,KRT33B 0.67 0.61 2.19
chr17:39531902-39536694 KRT34 0.60 0.58 2.46
chr17:39738532-39743147 KRT14 0.92 0.05 4.35
chr17:44165239-44800231 KIAA1267,LRRC37A,ARL17A,LRRC37A2,NSF 0.97 0.35 3.95
chr17:45608443-45700642 NPEPPS 0.79 0.07 8.42
chr17:62850487-62914903 LRRC37A3 0.83 0.24 10.79
chr19:49535129-49536495 CGB2 0.43 0.01 19.19
chr19:54799854-54804238 LILRA3 0.85 0.76 7.05
chr2:97779232-97915915 ANKRD36 0.06 0.11 18.23
chr22:16256331-16287937 POTEH 0.58 18.93
chr22:23043312-23249272 IGLL5 0.90 0.10 2.13
chr22:24376138-24384284 GSTT1 0.98 0.44 1.09
chr22:25677318-25911586 LRP5L 0.99 0.93 2.96
chr3:197879236-197907728 FAM157A 0.03 0.36 8.71
chr3:75786028-75834255 ZNF717 0.10 0.03 39.36
chr4:69366860-69554789 UGT2B17,UGT2B15 0.91 0.00 2.72
chr4:70127619-70235027 UGT2B28 0.91 0.94 5.50
chr5:180377202-180416706 BTNL3 0.99 0.97 2.36
chr5:68821588-68854548 OCLN 0.92 3.19
Location Genes mrsFAST r
2
BWA r
2
Median 
Copy 
Number
chr5:69316084-69343660 SERF1A 0.77 4.01
chr5:69345316-69374572 SMN1 0.95 3.84
chr5:795743-825341 ZDHHC11 0.92 0.78 3.98
chr6:257332-380527 DUSP22 0.99 0.99 4.01
chr6:32455238-32493130 HLA-DRB5 0.92 0.09 1.41
chr7:101986192-101996889 SPDYE6 0.02 38.21
chr7:102114556-102332921 POLR2J,SPDYE2,POLR2J3,RASA4,UPK3BL,P
OLR2J2
0.95 0.12 10.58
chr7:143223558-143541003 CTAGE15P,FAM115C 0.98 3.66
chr7:144052488-144077725 ARHGEF5 0.95 0.45 5.29
chr7:43980493-44058748 UBE2D4,SPDYE1 0.05 0.06 9.20
chr8:11946846-11973025 ZNF705D 0.18 9.32
chr9:141106636-141134172 FAM157B 0.00 0.61 9.16
chr9:14510-29739 WASH1 0.26 21.12
chr9:33795558-33799229 PRSS3 0.96 0.33 5.75
chr9:67926760-67969840 ANKRD20A1 0.66 28.15
Table S5: Accuracy of absolute copy number prediction
location genes
# correct HapMap genotypes (of 7; 
from Campbell et al 2011)
chr1:25592663-25663607   RHD                         7
chr1:110222301-110242933 GSTM1,GSTM2                 7
chr1:144959523-145081011 PDE4DIP                     5
chr1:196738897-196801697 CFHR3,CFHR1                 4
chr1:196825137-196896065 CFHR4                       5
chr1:202389905-202402001 PPP1R12B                    7
chr1:202415009-202496465 PPP1R12B                    2
chr2:89160037-89262733   Ig Light chain locus 4
chr3:100547342-100670846 ABI3BP                      6
chr3:151511518-151550270 AADAC                       7
chr3:189364074-189538586 TP63                        5
chr4:68793517-68833125   TMPRSS11A                   6
chr4:69386965-69483317   UGT2B17,UGT2B15             7
chr4:70127619-70235027   UGT2B28                     5
chr4:144921494-145040886 GYPA,GYPB                   7
chr5:795743-825341       ZDHHC11                     5
chr5:32107113-32169449   PDZD2,GOLPH3                7
chr5:68821588-68854548   OCLN                        5
chr5:69316084-69343660   SERF1A                      4
chr5:69345316-69374572   SMN1                        6
chr5:180377202-180416706 BTNL3                       6
chr6:257332-380527       DUSP22                      5
chr6:32455238-32493130   HLA-DRB5                    6
chr7:143223558-143541003 FAM115C,CTAGE15P            4
chr9:115383227-115585827 KIAA1958,C9orf80,SNX30      7
chr10:51008386-51114434  PARG                        7
chr10:135232058-135377386 MTG1,CYP2E1,SYCE1           7
chr11:55403116-55451172  OR4P4,OR4S2,OR4C6           3
chr12:11505418-11542473  PRB1                        5
chr14:20202606-20420924  OR4Q3,OR4M1,OR4N2,OR4K2,OR4K 6
chr14:88400031-88414591  GALC                        6
chr15:22304656-22588026  OR4N4                       4
chr15:30605924-30675622  CHRFAM7A                    7
chr16:70148739-70196427  PDPR                        5
chr17:18362101-18425291  LGALS9C                     5
chr17:34416411-34496071  CCL3,CCL4,TBC1D3B           2
chr17:39506594-39525574  KRT33A,KRT33B               6
chr17:39531902-39536694  KRT34                       7
chr19:54724572-54740148  LILRB3                      4
chr22:22754320-23038160  PRAME,GGTLC2                6
chr22:23043312-23249272  IGLL5                       6
chr22:24347958-24395540  GSTT1,LOC391322             5
Ta
b
l
e
 
S
6
:
 
S
i
g
n
a
l
-
t
o
-
N
o
i
s
e
 
r
a
t
i
o
s
 
f
o
r
 
m
r
s
F
A
S
T
 
a
n
d
 
B
W
A
 
c
a
l
l
s
S
a
m
p
l
e
C
h
r
o
m
S
t
a
r
t
S
t
o
p
m
r
s
F
A
S
T
 
S
i
g
n
a
l
m
r
s
F
A
S
T
 
S
t
d
D
e
v
m
r
s
F
A
S
T
S
N
R
B
W
A
S
i
g
n
a
l
B
W
A
S
t
d
D
e
v
B
W
A
S
N
R
m
r
s
F
A
S
T
 
S
N
R
 
i
m
p
r
o
v
e
m
e
n
t
N
A
1
8
5
1
7
1
6
2
1
3
9
6
5
7
7
2
1
7
5
6
3
5
7
1
.
9
2
7
0
.
1
8
3
1
0
.
5
1
6
0
.
3
7
3
0
.
0
5
9
6
.
3
1
4
6
7
%
N
A
1
9
2
4
0
1
2
1
3
3
6
5
9
6
8
8
1
3
3
7
2
7
7
4
0
0
.
8
8
1
0
.
1
3
8
6
.
3
8
3
0
.
2
1
2
0
.
0
5
2
4
.
0
4
6
5
8
%
N
A
1
5
5
1
0
7
9
9
5
0
7
1
8
7
9
9
6
2
7
9
9
8
1
.
6
7
0
0
.
1
6
0
1
0
.
4
6
3
0
.
2
8
4
0
.
0
4
8
5
.
9
1
6
7
7
%
N
A
1
9
1
2
9
6
2
9
9
1
0
5
3
3
3
0
0
4
3
5
6
6
0
.
8
9
4
0
.
1
7
3
5
.
1
8
3
0
.
3
7
7
0
.
0
5
6
6
.
7
4
8
-
2
3
%
N
A
1
8
5
1
7
4
6
8
7
8
8
4
7
2
6
9
0
5
7
0
3
4
1
.
6
4
6
0
.
1
5
2
1
0
.
8
4
1
0
.
4
7
0
0
.
0
5
5
8
.
5
2
0
2
7
%
N
A
1
5
5
1
0
3
1
9
4
9
2
6
4
6
2
1
4
6
5
5
5
6
1
.
5
6
5
0
.
1
6
9
9
.
2
4
0
0
.
2
4
6
0
.
0
4
2
5
.
8
3
3
5
8
%
N
A
1
5
5
1
0
1
1
5
5
2
2
7
0
7
5
1
5
5
2
6
4
5
4
3
1
.
4
8
7
0
.
1
5
7
9
.
4
9
6
0
.
4
5
9
0
.
0
5
1
9
.
0
2
5
5
%
Appendix(C((Chapter(3)( Exome-based CNV calling Exome data preparation and alignment Previously generated FASTQ data from four exome sequencing studies (Iossifov et al., 2012; O'Roak et al., 2012; Sanders et al., 2012) was used in this study. In addition, we generated sequence for unaffected sibling in 20 published trios (O'Roak et al., 2011) for a complete set of 412 quads, comprising  (One family was excluded during a QC step, see ?Coverage and quality control?, below). Reads were split into consecutive 36mers, up to two per read, and mapped using the single-end mode of mrsFAST (Hach et al., 2010), allowing for up to two mismatches per 36mer. We aligned reads to a concatenated hg19 reference genome, which included exome-capture targets based on the Nimblegen EZ Exome v2.0 platform (194,080 targets) as well as 300bp up- and downstream of each target.   CoNIFER-based CNV discovery from exome read-depth Using CoNIFER v0.2.2 (http://conifer.sf.net, (Krumm et al., 2012)), we processed each of the three datasets separately. RPKM values were calculated for 194,080 probes and exons targeted by the Nimblegen EZ Exome v2.0 exome sequence enrichment platform. We set the --svd option to 12 (for samples in the Iossifov data set) or 15 (for O?Roak and Sanders data sets), and used default CoNIFER settings for all other options. After CoNIFER analysis, the raw SVD-ZRPKM values were exported (using the export command) for downstream analysis.  Coverage, Quality Control and data uniformity We examined the overall mapped coverage after mrsFAST alignment in all samples, as this can affect the sensitivity of exome-based CNV detection (Figure S2). There was no significant difference in coverage between proband-sibling pairs (paired two-tailed t-test = 1.68, p = 0.09). However, we noted that samples 
from the Iossifov dataset had overall lower coverage than did samples from the other two data sets, likely due to the more aggressive multiplexing strategy used by Iossifov and colleagues in sequencing their families.  We used the --write_sd option in the CoNIFER program to calculate the per-sample standard deviation of SVD-ZRPKM values after processing (listed in Table S1). Abnormally high values would indicate an excessively noisy sample, or a sample for which exon read-depth values could not be adequately normalized with CoNIFER. Fortunately, no samples were excluded using this metric (all std. dev ? 0.61). We did note what appeared to be significant contamination between reads in family 12154, and removed this family from all further analysis, resulting in 411 families.  CNV segmentation and single sample calling All four members of each quad were analyzed for CNVs, in order to minimize false negatives in probands and siblings. We used DNACopy and CGHCall to segment and assign deletion or duplication probabilities to SVD-ZRPKM values. In order to prevent excessively strong SVD-ZRPKM signals from interfering with the models used by CGHCall to assign copy number, we clipped the signal at +/- 3 for each exon. Parameters for DNACopy were as follows: alpha = 0.01, using the undo.split=?sdundo? option with undo.SD = 2. Default options for CGHcall were used, and we allowed only ?deletion? and ?duplication? as called states. Using these parameters, we obtained 32,672 raw segments as either ?deleted? or ?duplicated?.  After segmentation, we excluded 48 calls which clearly failed to segment properly. These calls were larger than 250 exons and had a absolute median signal less than 0.2. We visually examined these calls and confirmed that they were clear false positives generated by CGHcall. These calls were removed to reduce interference with downstream CNVR clustering and analysis, leaving 
32,624 calls.  Unified probe set for three datasets As we analyzed each of the three datasets (describe above) separately, and the CoNIFER algorithm dynamically excludes poorly performing probes, the final datasets each had a slightly differing total number of probes. We unified these probes into a unified probe set by merging non-overlapping probes (i.e., those that did not overlap in all three datasets) to their closest neighbor by genomic distance. The probes from each dataset had greater than 99% overlap with the other two datasets, as well as with the unified probeset (O?Roak: 99.7%, Sanders: 99.7%, Iossifov: 99.1%). However, after unification, 92 calls were eliminated from one or more datasets, on the basis that the unified probe set did not support our minimum size threshold of two exons for these calls.  Clustering CNVs and frequency estimates Next, we grouped individual CNV calls into similar CNV Regions (CNVRs) using  pairwise distances between all CNVs based on a modified reciprocal overlap (RO) heuristic. This function calculates the RO between two CNVs based on the minimum fraction of number of overlapping probes, and weights this percentage based on the total number of non-overlapping probes on each end. In this way, the function takes into account the uncertainty in breakpoints and RO for two small CNVs, while allowing two large overlapping CNVs to be count as distinct entities. The function is given by: 
 Where RO is the modified reciprocal overlap, C is the number of overlapping probes between two CNVs A and B, la and lb are the lengths (in number of probes) for A and B, and LO and RO correspond to the number ?left overhanging? and ?right overhanging? probes on either side of the common probes. We empirically 
picked g = 0.9.  CNVs were clustered into CNVRs by a weighted hierarchical clustering (WPGMA, weighted pair group method with averaging) method based on the pairwise weighted RO calculated above for each pair of CNVs. Each resulting tree connects all CNVs based on their similarity under the modified RO function (above), and the distances between leaves or clusters measures how likely the CNVs are part of the same CNVR.  To generate CNVRs and frequency counts, we flattened the trees based on the cophenetic distance (i.e., the ?height? of each branch of the dendrogram) at an empirical value of 0.85. Using this clustering method, we initially generated 3,473 CNVRs. Of these, 332 had more than 15 calls, and we removed these CNVRs from subsequent analysis of rare CNVs on the basis that they correspond to CNPs, which we analyzed separately. The 3,141 CNVRs (containing 7,424 calls) were carried forward to the next step. Quad-based genotyping method We reduced the rate of false-positive de novo calls in probands (and conversely, reduced the false-negative rate inherited CNVs) by applying a novel algorithm designed specifically to find inherited CNVs within a family, under the assumption that an false positive for an inherited call would be much more likely than a true positive de novo event. Using the Continuous Mutual Information (CMI) metric, which assess how ?informative? a set of observations X is for set Y, we made pair-wise comparisons of SVD-ZRPKM values at each CNV locus within each family. Specifically, for each rare call (termed the ?index call? here), the CMI between all pairs of samples in the quad was calculated:  for each quad member a:  for each other quad member b: 
 Where X and Y are vectors containing SVD-ZRPKM values for the first sample and the second sample, including the values of the 30-100 probes surrounding the call on either side (Specifically, 30 probes for events ? 3 exons; 50 flanking probes for events 4-150 exons; 100 probes flanking for events ? 150 exons). To incorporate our high prior probability on the location of the index call, we weighted the values X and Y by a vector W, which was 10 for the probes in the index call and 1 elsewhere. Then, f(x,y) provides the joint probability density of the two vectors, calculated by estimating the gaussian kernel density for both vectors; similarly, f(x) and f(y) are the marginal densities as estimated by the gaussian kernel density for X and Y separately. In order to calculate the CMI value, we used the scipy.quadpack library and integrated the CMI function across the interval (-100,100) for all values of x; the limits of y were determined by a step function (if x < 0: y: (-100, 0]; if x ? 0: y: (0, 100)), which prevented reciprocal calls or reciprocal signals from contributing positively to the CMI. We calculate the pairwise CMI for all pairs (excluding self-self) of members in each quad, and averaged the values for each pair of pairs (e.g., the CMI of mother-proband was averaged with proband-mother). In some cases, the numerical integration was not able to converge on a value for a particular pair X;Y, and in this case, we used the successfully calculated CMI. In no cases did the integration for both I(X;Y) and I(Y;X) fail. As our focus was to genotype and impute rare CNVs only, and the computational run time for this step grows at a rate of O(N2), we limited the calculation of the CMI values to CNVR clusters with 15 or fewer calls.  Depending on the individual carrying the index call, we used two different CMI cutoff values to determine if the pairwise CMI value indicated inheritance: if the parent carried the index call, we required a CMI threshold of 1.3; conversely, if the offspring carried the index call, we required only 1.0 (this discrepancy takes 
into account the high likelihood that a call is inherited if seen in an offspring, and the lower likelihood that a parental call is passed on). Additionally, we required that the CMI score was outside of the 0.005 percentile of a fitted gamma distribution of CMI values between 25 randomly unrelated samples at that locus. If these conditions were met, we imputed the CNV in the members of the quad for which no call had been made, based on the boundaries of the index call (i.e., the prior).  We calculated pairwise CMI values for the calls in 3,141 CNVR clusters with 15 or fewer calls. This step inferred a total of 903 calls passing the CMI cutoff(s), of which 204 calls (23%) were part of rare CNVR clusters (<15 calls) and 699 which joined the existing CNP CNVR clusters.  After adding the newly found calls, we re-clustered all calls into CNVRs and recomputed the final frequencies for each, as describe above (?Clustering CNVs and frequency estimates?). This resulted in 3,102 CNVRs containing 7,628 calls carried into our final filtering and classification step.  Filtering segmentally duplicated regions and processed pseudogenes: Our primary interest in rare and private CNVs prompted us to exclude CNVs which primarily lie in segmental duplications or otherwise multi-copy regions of the genome. These regions are polymorphic and the independent assortment of parental alleles prevents accurate assessment of inheritance patterns in these regions. We excluded CNVs which were found to have more than 50% of their probes within segmental duplications or duplicated regions of the genome (defined using previous methods from 1000 Genomes whole-genome depth-of-coverage analysis, where >80% of 34 unrelated genomes had a copy number of three or greater in 500bp repeat-masked windows across the genome). Excluding calls which overlapped at least 50% with these regions resulted in the exclusion of 2,353 calls (45% of all calls), corresponding to 609 CNVRs. 
 Next, we excluded calls which were likely to due solely to the insertion of processed pseudogenes. CoNIFER and most exome-based read-depth methods are sensitive to copy number changes specifically of exons, which can be the result of retro-insertion of processed mRNA transcripts (see Krumm et al., 2012 for more details).Processed pseudogenes can be polymorphic or fixed among individuals; furthermore, inserted copies may be present in variable copy number or absent altogether within the reference genome. To eliminate CNV calls in our dataset due to polymorphic or novel insertions of processed pseudogenes, we referenced three sources to define likely processed pseudogoenes: (1) a list of commonly polymorphic processed pseudogenes generated using SPLIT-READ (Karakoc et al., 2011) from 20 control exomes and (2) 225 autism trios (data not reported here). We excluded calls from our call list for which ?90% of the probes corresponded to a gene which had been observed at least once in 225 trios. This excluded a total of 1,098/7,628 calls (17%, in line with previous estimates from (Krumm et al., 2012)).  Final filtering and call set generation Our final set of calls was created by requiring an absolute median SVD-ZRPKM score (i.e., signal strength) of ? 0.5 for calls with 5 or more probes, ?1.0 for calls 3-5 probes in length, and ? 1.0 for calls 2 probes in length. We excluded any calls on the X or Y chromosomes for all analyses in this work.  Validation of CNV calls Targeted array CGH microarray design To validate the small rare CNVs discovered using the discovery pipeline, we designed a custom Agilent SurePrint G3 4x180k CGH microarray to confirm CNVs. As the variable spacing of the exome probes prevents precise knowledge of CNV breakpoints, we used a variable density array design with ?3 exon 
overhang based on the exome-based breakpoints. (with min/max limits of 5kbp and 50kbp) where possible. Probe density within the CNV call ranged from 125bp-1 for calls smaller than 10kbp to 5kbp-1 for large calls up to 500kbp, in order to insure at least 10 probes per call. Due to the high density of probes required for validation of small CNVs, some of the probes were of lower quality (as based on the manufacturer?s quality score), and their performance was accordingly lower.  Array CGH methods Test and reference DNA (we used DNA from HapMap sample NA18507) from each sample was labeled with Cy3 and Cy5 dye using a NimbleGen array labeling kit according to manufacturer?s instructions. Five micrograms of labelled test and reference DNA was hybridized for 24 hours using Agilent reagents to the microarray slide and washed according to manufacturer?s directions. Slides were scanned using an Agilent Microarray Scanner and analyzed using Agilent Feature Extract v10.5.1.1.   Data processing and array quality control Array intensity ratios were log-transformed and assessed for quality control. Arrays with a per-sample standard deviation of LogR values > 0.5 were repeated. In order to reduce systematic and batch noise between probes and samples, we employed a similar normalization strategy to the CoNIFER pipeline and used SVD to remove the three strongest components of variance.  Receiver Operating Curve determination of array CGH thresholds We determined minimum logR thresholds for the validation arrays by leveraging the logR values across the 60 previously identified CNVs (from Sanders et al., 2011), each found in at least one of our validation samples. We calculated Receiver Operating Curves (ROC, Figure S3) for duplications (39 calls) and deletions (21 calls), using the samples without the previously identified CNVs as 
the ?true negatives?. Next, we individually picked the optimal operating point (OOP) for deletions (median LogR OOP <= -0.178) and duplications (median LogR OOP >= 0.24), such that we maximally discerned our known true positives from true negatives. Both OOPs had a FPR of ~1%, and a recall rate >90%, indicating our array was highly specific and sensitive to true events. These logR cutoff values were used in assessing if novel CNVs were true positives or not: if the mean LogR across all probes in the call interval was greater than the duplication threshold (or lower than the deletion threshold), we considered the call validated.   Estimating false positive rate We started with 161 exome-based calls (Table S5) among 80 randomly selected probands and siblings. Of these 161 calls, 69 could be confirmed by a CNV already reported by Sanders and colleagues. We used array CGH data from our customized microarray and the OOPs determined above and the mean of all the array CGH probes across the exome-based CNV start and stop in order to validate remaining calls. The OOP thresholds were exceeded in 61 calls, and based on inheritance across multiple  and a combination of available raw Illumina 1M data, we scored four additional calls as validated (Table S5). Six calls did not have sufficient array probe coverage (our upper estimate of the reported FPR includes these calls as false positives, the lower estimate excludes these six from all calculations).  We found that 14 of the unconfirmed calls were rare processed pseudogenes specific to the family or samples tested. To find these, we mapped exome sequencing reads from each sample to a customized reference sequence composed of mRNA sequences extracted from RefSeq. If more reads mapped across the exon-exon junctions within the CNV call in the sample tested than across the same junction in other samples, we considered the elevation of exome read-depth signal to be due to a processed pseudogene insertion, rather than a 
true genomic CNV.  Additional analyses  Bootstrap permutations of burden analysis We tested the robustness of the overall effect of burden by a bootstrap method, in which we calculated the CNV burden ratio (for CNVs and genes) of 10,000 randomly sampled (with replacement) sets of families from the overall set of 411. The resulting distributions of total CNV counts and total gene count and the distribution of burden for both are shown in Figure S5 (compare to actual results for all 411 families in Figure 2a and 2b). The empirical 95% confidence intervals for both the burden of CNVs (CI: 1.09-1.29) and  genic burden (1.10-1.52) reject the null hypothesis (at alpha = 0.05) of no differential CNV burden between probands and siblings.   Phenotypes and regression models Full-scale IQ, SRS t-scores, and individual components of the SRS were downloaded from the SSC database and release 14 of the SSC. For all analyses we excluded the entire family if any values were missing for either the proband or sibling.   To clarify how we classified SRS discordant and concordant families, we provide the following table: 
  
Gene expression data and enrichment analysis We used publicly available gene expression data from the Human U133A/GNF1H Gene Atlas (GEO: GSE1133), comprising 79 human tissues, including 18 nervous system tissues(Su, 2004). We associated the microarray probe IDs with HUGO gene names and average expression across multiple probes in the same gene. For each tissue, genes were sorted by expression and we considered the top 5% of each category to be ?highly expressed?. To calculate enrichment, we took the unique sets of genes disrupted in probands and those disrupted in siblings and intersected each with the set of highly expressed genes in each category. The ratio of counts between these two intersections constituted the fold enrichment for each category.  In order to correct for the 79 multiple comparisons, we employed a permutation and false discovery rate (FDR) strategy. First, we derived a null distribution of enrichment between probands and siblings by shuffling the proband-only and sibling-only sets of genes and recomputing the enrichment. Next, an empirical p-value was derived by scoring the actual enrichment value against the null distributions for each tissue. Using the FDR method described in (Storey & Tibshirani, 2003) and the R package qvalue, we calculated q values for each tissue and assessed statistical significance at q < 0.05. In order to calculate the brain and non-brain averages, we averaged gene expression across all 18 brain- and nervous system tissues and 61 non-brain tissues. These two categories were corrected for two comparisons each.  Previously associated genes To establish the list of genes previously associated with autism/ASD/intellectual disability/schizophrenia, we attempted to identify all genes that were ?causal? or associated with developmental delay, intellectual disabilities and schizophrenia. We conducted searches of OMIM with the following terms: ?mental retardation? ?intellectual disabilities?, ?autism? schizophrenia. We also included genes from the 
Simons SFARI autism candidate genes with? association scores? ranging from 1 to 4 (n=155 genes) (https://gene.sfari.org/autdb/submitsearch?selfld_0=GENES_GENE_SYMBOL&selfldv_0=&numOfFields=1&userAction=viewall&tableName=AUT_HG&submit2=View+All#GS).  Additional control exomes The set of 2,972 exomes used to assess population frequency of ultra rare CNVs were taken from the National Heart Lung and Blood Institute?s Exome Sequencing Project (ESP). These exomes were processed in bulk using CoNIFER (with 21 components removed) and locus-specific population frequencies were determined by manual inspection of outlier samples for each locus.  Combined mutation model We used published lists of de novo SNV and indel mutations from published lists in the three studies ((Iossifov et al., 2012; O'Roak et al., 2012; Sanders et al., 2012)). In our combined model (see discussion), we only counted disruptive SNVs and indels (i.e., nonsense, splice, and frameshifting), as these have been shown to be most enriched in probands. Inherited and de novo CNV counts were derived from this work (Tables S7). We used a logistic regression model, which transforms a binary outcome (i.e., affected vs. unaffected) such that linear predictors can be used. The model as shown in Figure 5 is:  
 Code availability CoNIFER and CNV calling CoNIFER can be downloaded from http://conifer.sf.net. Version 0.2.2 was used in this work. The custom pipeline for CNV calling as described in this work is 
available at http://conifer.sf.net, although the authors cannot guarantee or provide any technical support for this.   Supplement References Hach, F., Hormozdiari, F., Alkan, C., Hormozdiari, F., Birol, I., Eichler, E. E., & Sahinalp, S. C. (2010). mrsFAST: a cache-oblivious algorithm for short-read mapping. Nature Methods, 7(8), 576?577. doi:10.1038/nmeth0810-576 Iossifov, I., Ronemus, M., Levy, D., Wang, Z., Hakker, I., Rosenbaum, J., et al. (2012). De novo gene disruptions in children on the autistic spectrum. Neuron, 74(2), 285?299. doi:10.1016/j.neuron.2012.04.009 Krumm, N., Sudmant, P. H., Ko, A., O'Roak, B. J., Malig, M., Coe, B. P., et al. (2012). Copy number variation detection and genotyping from exome sequence data. Genome Research. doi:10.1101/gr.138115.112 O'Roak, B. J., Deriziotis, P., Lee, C., Vives, L., Schwartz, J. J., Girirajan, S., et al. (2011). Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations. Nature Genetics, 43(6), 585?589. doi:10.1038/ng.835 O'Roak, B. J., Vives, L., Girirajan, S., Karakoc, E., Krumm, N., Coe, B. P., et al. (2012). Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature. doi:10.1038/nature10989 Sanders, S. J., Murtha, M. T., Gupta, A. R., Murdoch, J. D., Raubeson, M. J., Willsey, A. J., et al. (2012). De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature, 485(7397), 237?241. doi:10.1038/nature10945 Storey, J. D., & Tibshirani, R. (2003). Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences, 100(16), 9440?9445. doi:10.1073/pnas.1530509100 Su, A. I. (2004). A gene atlas of the mouse and human protein-encoding transcriptomes. Proceedings of the National Academy of Sciences, 101(16), 6062?6067. doi:10.1073/pnas.0400782101 
Figure S1: Flow chart for inherited CNV detection. See Methods and Supplemental Methods for details.
Figure S1: CNV Calling Flowchart
Figure S2: Mapped coverage between probands/siblings and by  data source
 a
 b
Figure S2: Mapped Coverage between probands/siblings and by data source. 
X-axis: total mapped 36mer reads (x108) by the mrsFAST alignment program to the 
human exome.  (a) Histograms of Probands (left) and Siblings (center) and overlap 
(right) shows no significant difference in coverage levels (Paired t-test p= 0.09). 
(b). Same as in (a), but by dataset, revealing that the Iossifov dataset had lower 
coverage than the O!Roak or Sanders datasets.
Figure S3: Array-CGH validation ROC curves
 a  b
Figure S3: Receiver-Operator Curve determining deletion and duplication thresh-
olds in array-CGH validation. ROC curves based on 60 true-positive deletions (a) 
and duplications (b) from Sanders et al., 2011 in these samples. Arrows indicate 
chosen optimal operating point (OOP), which was used as the threshold for valida-
tion of unknown calls.
Figure S4: CNV Size, inheritance, and copy number
 a  b
Figure S4: CNV size and copy number. Inherited CNVs in probands and siblings, 
binned by size in exons (a) or estimated genomic size (b). As expected, larger 
CNVs are more likely to be duplications, an effect we found true for both probands 
and siblings. 
Figure S5: Bootstrap results
 a  b
 c  d
Figure S5: Results of bootstrap permutation test. We bootstrapped our set of 
inherited CNVs (sampling CNVs by family, with replacement), and calculated the 
total CNV counts (a) for probands (green)and siblings (blue) and CNV burden (b) 
between probands and siblings (dark blue: inner 95% of empirical distribution). In 
(c) and (d), the results when counting total number of genes and genic burden.
Figure S6: Rare vs. Private burden in 411 quads
Si
bli
ng
sP
ro
ba
nd
s p=0.029
ns
p=0.004
ns
Genes affectedCNVs
Tr
an
sm
itt
ed
 C
N
V
s
Tr
an
sm
itt
ed
 G
en
es
 a  b
Figure S6: Rare vs. Private burden in 411 quads. There was no increased burden 
for CNVs (a) observed only once in 411 families, or for genes in those CNVs (b).
Figure S7: Phenotypes in 411 probands and siblings
Discordant SRS
All quads
discordant SRS
a b
c d
Figure S7: Phenotypes (SRS and IQ) in probands and siblings. (a) Distribution of SRS 
t-scores in probands (blue) and siblings (green). Higher scores are more affected, and 
SRS t-scores greater than 75 are considered ?severely affected?. (b) Heatmap plot of 
SRS values for probands (x-axis) and their siblings (y-axis). In almost all cases, the 
probands have higher SRS scores, but the difference in SRS score between probands 
and siblings varies widely among all pairs. We designated the pairs with the most 
extreme differences of SRS score between them as ?Discordant SRS? pairs (indicated 
by arrow and dashed orange box, lower right). (c) All of the SRS discordant pairs had 
SRS differences > 25 (by definition, as we required these pairs to have a proband SRS 
	(d). Scatter plot showing both proband SRS score and 
proband IQ score. Dashed blue line indicates cutoff for High and Low IQ in our com-


Figure S8: Burden contrasts including SRS and IQ
by Probands
by Siblings
by Both
CNVs inherited... 
a
c d
b
Figure S8. Burden between SRS and IQ in probands and siblings. (a) Genic burden 
and (b) CNV burden for proband-sibling pairs where the proband has low IQ (< 70) for 

	
shown in (c) and (d). P-value bars drawn if two-tailed paired t-test p value is less than 
0.05. 
Pro Sib Pro Sib Pro Sib Pro Sib
Pro Sib Pro Sib Pro Sib Pro Sib
Figure S9: Enrichment of brain expressed genes in SRS discordant quads (A) and all 
quads (B)
Figure S9. Enrichment of brain expressed genes in probands vs. siblings Bars (y-axis) 
represent ratio of enrichment between proband and siblings for genes highly expressed in 
each tissue (defined as top 5%, see Methods). Black bars: tissue is part of brain or nervous 
system; white bars: non-brain or nervous system tissues; hatched bars: are computed 
averages. Asterix indicates significance using a FDR-based multiple testing correction 
q-value < 0.05. (a) probands from SRS discordant quads only show greater enrichment for 
brain-expressed genes than do all quads, (b).
Figure S10: Intersection between brain-expressed genes and previously associated 
genes in proband CNVs, but not sibling CNVs
Brain Expression
Previously associated w. ID/ASD/SCZ
Probands (discordant SRS)Probands (all)
Siblings (discordant SRS)Siblings (all)
Figure S10. Intersection of brain-expressed and disease genes in probands
We intersected the sets of genes found in probands (top row) and siblings (bottom row) that 
were either brain expressed (teal circles) or had previously been observed in 
ASD/Schizophrenia/ID (yellow circles). Probands?especially those in SRS discordant 
pairs? had a higher fraction of intersecting genes (13 genes, Table S11) than other groups 
or their siblings, suggesting that these genes may be top candidates for follow-up study in 
the pathogenesis of ASD.
Sample Source SNV data available Part of Sanders 2011 (Illumina SNP) Part of QC analysis CoNIFER Std Dev11000.fa Sanders et al TRUE TRUE FALSE 0.3011000.mo Sanders et al TRUE TRUE FALSE 0.3011000.p1 Sanders et al TRUE TRUE FALSE 0.2811000.s1 Sanders et al TRUE TRUE FALSE 0.3711008.fa Sanders et al TRUE TRUE FALSE 0.2711008.mo Sanders et al TRUE TRUE FALSE 0.2711008.p1 Sanders et al TRUE TRUE FALSE 0.2711008.s1 Sanders et al TRUE TRUE FALSE 0.2811010.fa Sanders et al TRUE TRUE FALSE 0.2811010.mo Sanders et al TRUE TRUE FALSE 0.2911010.p1 Sanders et al TRUE TRUE FALSE 0.2811010.s1 Sanders et al TRUE TRUE FALSE 0.3011013.fa O?Roak et al TRUE TRUE FALSE 0.3811013.mo O?Roak et al TRUE TRUE FALSE 0.4211013.p1 O?Roak et al TRUE TRUE FALSE 0.3911013.s1 O?Roak et al TRUE TRUE TRUE 0.3611014.fa Sanders et al TRUE TRUE FALSE 0.2411014.mo Sanders et al TRUE TRUE FALSE 0.2611014.p1 Sanders et al TRUE TRUE FALSE 0.2511014.s1 Sanders et al TRUE TRUE FALSE 0.2611029.fa O?Roak et al TRUE TRUE FALSE 0.3911029.mo O?Roak et al TRUE TRUE FALSE 0.4011029.p1 O?Roak et al TRUE TRUE TRUE 0.3811029.s1 O?Roak et al TRUE TRUE TRUE 0.3911045.fa Sanders et al TRUE TRUE FALSE 0.3211045.mo Sanders et al TRUE TRUE FALSE 0.2911045.p1 Sanders et al TRUE TRUE TRUE 0.3411045.s1 Sanders et al TRUE TRUE FALSE 0.2811057.fa Sanders et al TRUE TRUE FALSE 0.2411057.mo Sanders et al TRUE TRUE FALSE 0.2511057.p1 Sanders et al TRUE TRUE FALSE 0.2411057.s1 Sanders et al TRUE TRUE FALSE 0.2311060.fa Sanders et al TRUE TRUE FALSE 0.3111060.mo Sanders et al TRUE TRUE FALSE 0.3211060.p1 Sanders et al TRUE TRUE FALSE 0.3811060.s2 Sanders et al TRUE FALSE FALSE 0.3511066.fa Sanders et al TRUE TRUE FALSE 0.3711066.mo Sanders et al TRUE TRUE FALSE 0.3011066.p1 Sanders et al TRUE TRUE FALSE 0.2311066.s2 Sanders et al TRUE FALSE FALSE 0.3411067.fa Sanders et al TRUE TRUE FALSE 0.2611067.mo Sanders et al TRUE TRUE FALSE 0.2111067.p1 Sanders et al TRUE TRUE FALSE 0.2311067.s1 Sanders et al TRUE TRUE FALSE 0.2411074.fa Sanders et al TRUE TRUE FALSE 0.2411074.mo Sanders et al TRUE TRUE FALSE 0.2411074.p1 Sanders et al TRUE TRUE FALSE 0.2611074.s1 Sanders et al TRUE TRUE FALSE 0.2711075.fa Sanders et al TRUE TRUE FALSE 0.3111075.mo Sanders et al TRUE TRUE FALSE 0.2011075.p1 Sanders et al TRUE TRUE FALSE 0.2411075.s1 Sanders et al TRUE TRUE FALSE 0.2311077.fa Sanders et al TRUE TRUE FALSE 0.2611077.mo Sanders et al TRUE TRUE FALSE 0.2111077.p1 Sanders et al TRUE TRUE FALSE 0.2211077.s1 Sanders et al TRUE TRUE FALSE 0.2711079.fa Sanders et al TRUE TRUE FALSE 0.2911079.mo Sanders et al TRUE TRUE FALSE 0.3011079.p1 Sanders et al TRUE TRUE FALSE 0.2211079.s1 Sanders et al TRUE TRUE FALSE 0.4311085.fa Sanders et al TRUE TRUE FALSE 0.3011085.mo Sanders et al TRUE TRUE FALSE 0.3011085.p1 Sanders et al TRUE TRUE FALSE 0.2711085.s1 Sanders et al TRUE TRUE FALSE 0.3011089.fa Sanders et al TRUE TRUE FALSE 0.2411089.mo Sanders et al TRUE TRUE FALSE 0.2311089.p1 Sanders et al TRUE TRUE FALSE 0.2311089.s1 Sanders et al TRUE TRUE FALSE 0.2511090.fa Sanders et al TRUE TRUE FALSE 0.2611090.mo Sanders et al TRUE TRUE FALSE 0.3111090.p1 Sanders et al TRUE TRUE TRUE 0.2911090.s1 Sanders et al TRUE TRUE FALSE 0.3911092.fa Sanders et al TRUE TRUE FALSE 0.2711092.mo Sanders et al TRUE TRUE FALSE 0.3011092.p1 Sanders et al TRUE TRUE FALSE 0.2911092.s1 Sanders et al TRUE TRUE FALSE 0.2911094.fa Sanders et al TRUE TRUE FALSE 0.2411094.mo Sanders et al TRUE TRUE FALSE 0.2211094.p1 Sanders et al TRUE TRUE FALSE 0.3911094.s1 Sanders et al TRUE TRUE FALSE 0.2411107.fa Sanders et al TRUE TRUE FALSE 0.2411107.mo Sanders et al TRUE TRUE FALSE 0.2411107.p1 Sanders et al TRUE TRUE FALSE 0.2711107.s1 Sanders et al TRUE TRUE FALSE 0.2511108.fa Sanders et al TRUE TRUE FALSE 0.32
11108.mo Sanders et al TRUE TRUE FALSE 0.3211108.p1 Sanders et al TRUE TRUE FALSE 0.3011108.s1 Sanders et al TRUE TRUE FALSE 0.3811114.fa Sanders et al TRUE TRUE FALSE 0.2811114.mo Sanders et al TRUE TRUE FALSE 0.3011114.p1 Sanders et al TRUE TRUE FALSE 0.2911114.s1 Sanders et al TRUE TRUE FALSE 0.2811115.fa Sanders et al TRUE TRUE FALSE 0.2911115.mo Sanders et al TRUE TRUE FALSE 0.4011115.p1 Sanders et al TRUE TRUE FALSE 0.3111115.s1 Sanders et al TRUE TRUE TRUE 0.3211117.fa Sanders et al TRUE TRUE FALSE 0.2411117.mo Sanders et al TRUE TRUE FALSE 0.3111117.p1 Sanders et al TRUE TRUE FALSE 0.2411117.s1 Sanders et al TRUE TRUE FALSE 0.3511118.fa Sanders et al TRUE TRUE FALSE 0.4611118.mo Sanders et al TRUE TRUE FALSE 0.3111118.p1 Sanders et al TRUE TRUE TRUE 0.3011118.s1 Sanders et al TRUE TRUE FALSE 0.3511132.fa Sanders et al TRUE TRUE FALSE 0.2511132.mo Sanders et al TRUE TRUE FALSE 0.2311132.p1 Sanders et al TRUE TRUE FALSE 0.2111132.s1 Sanders et al TRUE TRUE FALSE 0.3111146.fa Sanders et al TRUE TRUE FALSE 0.2511146.mo Sanders et al TRUE TRUE FALSE 0.2711146.p1 Sanders et al TRUE TRUE FALSE 0.3511146.s1 Sanders et al TRUE TRUE FALSE 0.3611154.fa Sanders et al TRUE TRUE FALSE 0.2911154.mo Sanders et al TRUE TRUE FALSE 0.2911154.p1 Sanders et al TRUE TRUE FALSE 0.2911154.s1 Sanders et al TRUE TRUE FALSE 0.4211172.fa O?Roak et al TRUE TRUE FALSE 0.3711172.mo O?Roak et al TRUE TRUE FALSE 0.4411172.p1 O?Roak et al TRUE TRUE FALSE 0.3911172.s1 O?Roak et al TRUE TRUE FALSE 0.3611180.fa Sanders et al TRUE FALSE FALSE 0.3111180.mo Sanders et al TRUE FALSE FALSE 0.3311180.p1 Sanders et al TRUE FALSE FALSE 0.2811180.s1 Sanders et al TRUE FALSE FALSE 0.3311190.fa O?Roak et al TRUE TRUE FALSE 0.5511190.mo O?Roak et al TRUE TRUE FALSE 0.4611190.p1 O?Roak et al TRUE TRUE FALSE 0.3611190.s1 O?Roak et al TRUE TRUE FALSE 0.4111196.fa Sanders et al TRUE TRUE FALSE 0.2211196.mo Sanders et al TRUE TRUE FALSE 0.2411196.p1 Sanders et al TRUE TRUE FALSE 0.2411196.s1 Sanders et al TRUE TRUE TRUE 0.2311203.fa Sanders et al TRUE TRUE FALSE 0.2311203.mo Sanders et al TRUE TRUE FALSE 0.2211203.p1 Sanders et al TRUE TRUE FALSE 0.2111203.s1 Sanders et al TRUE TRUE FALSE 0.2511219.fa Sanders et al TRUE TRUE FALSE 0.2711219.mo Sanders et al TRUE TRUE FALSE 0.2611219.p1 Sanders et al TRUE TRUE FALSE 0.2911219.s1 Sanders et al TRUE TRUE FALSE 0.2311220.fa Sanders et al TRUE TRUE FALSE 0.2711220.mo Sanders et al TRUE TRUE FALSE 0.2711220.p1 Sanders et al TRUE TRUE FALSE 0.2611220.s1 Sanders et al TRUE TRUE FALSE 0.2811229.fa O?Roak et al TRUE TRUE FALSE 0.4511229.mo O?Roak et al TRUE TRUE FALSE 0.4411229.p1 O?Roak et al TRUE TRUE TRUE 0.4211229.s1 O?Roak et al TRUE TRUE TRUE 0.3911241.fa Sanders et al TRUE TRUE FALSE 0.2811241.mo Sanders et al TRUE TRUE FALSE 0.2911241.p1 Sanders et al TRUE TRUE TRUE 0.2811241.s1 Sanders et al TRUE TRUE FALSE 0.2811242.fa Sanders et al TRUE TRUE FALSE 0.2711242.mo Sanders et al TRUE TRUE FALSE 0.2411242.p1 Sanders et al TRUE TRUE FALSE 0.2611242.s1 Sanders et al TRUE TRUE FALSE 0.2611247.fa Sanders et al TRUE TRUE FALSE 0.2611247.mo Sanders et al TRUE TRUE FALSE 0.2311247.p1 Sanders et al TRUE TRUE FALSE 0.2911247.s1 Sanders et al TRUE TRUE FALSE 0.2111252.fa Sanders et al TRUE TRUE FALSE 0.3111252.mo Sanders et al TRUE TRUE FALSE 0.3211252.p1 Sanders et al TRUE TRUE TRUE 0.2611252.s1 Sanders et al TRUE TRUE FALSE 0.3111265.fa Sanders et al TRUE TRUE FALSE 0.3111265.mo Sanders et al TRUE TRUE FALSE 0.3111265.p1 Sanders et al TRUE TRUE FALSE 0.3011265.s1 Sanders et al TRUE TRUE FALSE 0.3811267.fa Sanders et al TRUE TRUE FALSE 0.2911267.mo Sanders et al TRUE TRUE FALSE 0.3111267.p1 Sanders et al TRUE TRUE FALSE 0.3511267.s1 Sanders et al TRUE TRUE TRUE 0.3211282.fa Sanders et al TRUE TRUE FALSE 0.3111282.mo Sanders et al TRUE TRUE FALSE 0.31
11282.p1 Sanders et al TRUE TRUE TRUE 0.3211282.s1 Sanders et al TRUE TRUE TRUE 0.3511285.fa Sanders et al TRUE TRUE FALSE 0.2711285.mo Sanders et al TRUE TRUE FALSE 0.2911285.p1 Sanders et al TRUE TRUE FALSE 0.2911285.s1 Sanders et al TRUE TRUE FALSE 0.2811290.fa Sanders et al TRUE TRUE FALSE 0.2911290.mo Sanders et al TRUE TRUE FALSE 0.3011290.p1 Sanders et al TRUE TRUE FALSE 0.2411290.s1 Sanders et al TRUE TRUE FALSE 0.3611291.fa O?Roak et al TRUE TRUE FALSE 0.3911291.mo O?Roak et al TRUE TRUE FALSE 0.4111291.p1 O?Roak et al TRUE TRUE FALSE 0.4111291.s1 O?Roak et al TRUE TRUE FALSE 0.3711298.fa Sanders et al TRUE TRUE FALSE 0.3511298.mo Sanders et al TRUE TRUE FALSE 0.3211298.p1 Sanders et al TRUE TRUE FALSE 0.2911298.s1 Sanders et al TRUE TRUE FALSE 0.3111301.fa Sanders et al TRUE TRUE FALSE 0.2711301.mo Sanders et al TRUE TRUE FALSE 0.2311301.p1 Sanders et al TRUE TRUE FALSE 0.2611301.s1 Sanders et al TRUE TRUE FALSE 0.2611304.fa Sanders et al TRUE TRUE FALSE 0.3011304.mo Sanders et al TRUE TRUE FALSE 0.3211304.p1 Sanders et al TRUE TRUE FALSE 0.3111304.s1 Sanders et al TRUE TRUE TRUE 0.2611316.fa Sanders et al TRUE TRUE FALSE 0.2811316.mo Sanders et al TRUE TRUE FALSE 0.2911316.p1 Sanders et al TRUE TRUE FALSE 0.3011316.s1 Sanders et al TRUE TRUE FALSE 0.2811336.fa Sanders et al TRUE TRUE FALSE 0.2711336.mo Sanders et al TRUE TRUE FALSE 0.2211336.p1 Sanders et al TRUE TRUE FALSE 0.2311336.s1 Sanders et al TRUE TRUE FALSE 0.2311353.fa Sanders et al TRUE TRUE FALSE 0.3111353.mo Sanders et al TRUE TRUE FALSE 0.4011353.p1 Sanders et al TRUE TRUE FALSE 0.3011353.s1 Sanders et al TRUE TRUE FALSE 0.3611356.fa Sanders et al TRUE TRUE FALSE 0.2111356.mo Sanders et al TRUE TRUE FALSE 0.2111356.p1 Sanders et al TRUE TRUE FALSE 0.2211356.s1 Sanders et al TRUE TRUE FALSE 0.2511364.fa O?Roak et al TRUE TRUE FALSE 0.3811364.mo O?Roak et al TRUE TRUE FALSE 0.4111364.p1 O?Roak et al TRUE TRUE FALSE 0.3911364.s1 O?Roak et al TRUE TRUE FALSE 0.3911382.fa Sanders et al TRUE TRUE FALSE 0.2311382.mo Sanders et al TRUE TRUE FALSE 0.2711382.p1 Sanders et al TRUE TRUE FALSE 0.2411382.s1 Sanders et al TRUE TRUE FALSE 0.2611390.fa O?Roak et al TRUE TRUE FALSE 0.3211390.mo O?Roak et al TRUE TRUE FALSE 0.3611390.p1 O?Roak et al TRUE TRUE FALSE 0.3511390.s1 O?Roak et al TRUE TRUE FALSE 0.3811411.fa Sanders et al TRUE FALSE FALSE 0.2211411.mo Sanders et al TRUE FALSE FALSE 0.2511411.p1 Sanders et al TRUE FALSE FALSE 0.2111411.s1 Sanders et al TRUE FALSE FALSE 0.2611412.fa Sanders et al TRUE TRUE FALSE 0.2611412.mo Sanders et al TRUE TRUE FALSE 0.2811412.p1 Sanders et al TRUE TRUE TRUE 0.2711412.s1 Sanders et al TRUE TRUE TRUE 0.3011429.fa Sanders et al TRUE TRUE FALSE 0.2711429.mo Sanders et al TRUE TRUE FALSE 0.3011429.p1 Sanders et al TRUE TRUE FALSE 0.2711429.s1 Sanders et al TRUE TRUE FALSE 0.2611433.fa Sanders et al TRUE TRUE FALSE 0.3011433.mo Sanders et al TRUE TRUE FALSE 0.3011433.p1 Sanders et al TRUE TRUE FALSE 0.3511433.s1 Sanders et al TRUE TRUE TRUE 0.4111437.fa Sanders et al TRUE TRUE FALSE 0.3211437.mo Sanders et al TRUE TRUE FALSE 0.3511437.p1 Sanders et al TRUE TRUE FALSE 0.3511437.s1 Sanders et al TRUE TRUE FALSE 0.4611452.fa O?Roak et al TRUE TRUE FALSE 0.3911452.mo O?Roak et al TRUE TRUE FALSE 0.4411452.p1 O?Roak et al TRUE TRUE FALSE 0.4711452.s1 O?Roak et al TRUE TRUE FALSE 0.3711456.fa Sanders et al TRUE TRUE FALSE 0.2911456.mo Sanders et al TRUE TRUE FALSE 0.3111456.p1 Sanders et al TRUE TRUE FALSE 0.2911456.s1 Sanders et al TRUE TRUE TRUE 0.2911459.fa O?Roak et al TRUE TRUE FALSE 0.3811459.mo O?Roak et al TRUE TRUE FALSE 0.4111459.p1 O?Roak et al TRUE TRUE FALSE 0.4411459.s1 O?Roak et al TRUE TRUE FALSE 0.3611462.fa Sanders et al TRUE TRUE FALSE 0.2411462.mo Sanders et al TRUE TRUE FALSE 0.2611462.p1 Sanders et al TRUE TRUE FALSE 0.22
11462.s1 Sanders et al TRUE TRUE FALSE 0.2611469.fa O?Roak et al TRUE TRUE FALSE 0.4211469.mo O?Roak et al TRUE TRUE FALSE 0.4611469.p1 O?Roak et al TRUE TRUE FALSE 0.4411469.s1 O?Roak et al TRUE TRUE FALSE 0.4311472.fa O?Roak et al TRUE TRUE FALSE 0.4811472.mo O?Roak et al TRUE TRUE FALSE 0.4211472.p1 O?Roak et al TRUE TRUE FALSE 0.4411472.s1 O?Roak et al TRUE TRUE FALSE 0.4411474.fa Sanders et al TRUE TRUE FALSE 0.3411474.mo Sanders et al TRUE TRUE FALSE 0.3111474.p1 Sanders et al TRUE TRUE FALSE 0.2611474.s1 Sanders et al TRUE TRUE FALSE 0.2411479.fa O?Roak et al TRUE TRUE FALSE 0.3711479.mo O?Roak et al TRUE TRUE FALSE 0.4811479.p1 O?Roak et al TRUE TRUE FALSE 0.3911479.s1 O?Roak et al TRUE TRUE FALSE 0.3811484.fa Sanders et al TRUE TRUE FALSE 0.2511484.mo Sanders et al TRUE TRUE FALSE 0.2511484.p1 Sanders et al TRUE TRUE FALSE 0.2611484.s1 Sanders et al TRUE TRUE TRUE 0.2411490.fa Sanders et al TRUE TRUE FALSE 0.2911490.mo Sanders et al TRUE TRUE FALSE 0.3011490.p1 Sanders et al TRUE TRUE FALSE 0.2511490.s1 Sanders et al TRUE TRUE FALSE 0.4311491.fa O?Roak et al TRUE TRUE FALSE 0.3411491.mo O?Roak et al TRUE TRUE FALSE 0.3711491.p1 O?Roak et al TRUE TRUE FALSE 0.3811491.s1 O?Roak et al TRUE TRUE FALSE 0.3511501.fa Sanders et al TRUE TRUE FALSE 0.2811501.mo Sanders et al TRUE TRUE FALSE 0.2711501.p1 Sanders et al TRUE TRUE FALSE 0.2711501.s1 Sanders et al TRUE TRUE FALSE 0.2611509.fa Sanders et al TRUE TRUE FALSE 0.2611509.mo Sanders et al TRUE TRUE FALSE 0.2511509.p1 Sanders et al TRUE TRUE FALSE 0.2511509.s1 Sanders et al TRUE TRUE FALSE 0.2611519.fa Sanders et al TRUE TRUE FALSE 0.2711519.mo Sanders et al TRUE TRUE FALSE 0.2611519.p1 Sanders et al TRUE TRUE TRUE 0.2911519.s1 Sanders et al TRUE TRUE FALSE 0.3411524.fa Sanders et al TRUE TRUE FALSE 0.2911524.mo Sanders et al TRUE TRUE FALSE 0.2911524.p1 Sanders et al TRUE TRUE FALSE 0.2611524.s1 Sanders et al TRUE TRUE FALSE 0.2711532.fa Sanders et al TRUE TRUE FALSE 0.3011532.mo Sanders et al TRUE TRUE FALSE 0.3011532.p1 Sanders et al TRUE TRUE TRUE 0.3011532.s1 Sanders et al TRUE TRUE FALSE 0.2611551.fa Sanders et al TRUE TRUE FALSE 0.2411551.mo Sanders et al TRUE TRUE FALSE 0.2811551.p1 Sanders et al TRUE TRUE TRUE 0.2511551.s1 Sanders et al TRUE TRUE FALSE 0.2911561.fa Sanders et al TRUE TRUE FALSE 0.2611561.mo Sanders et al TRUE TRUE FALSE 0.2911561.p1 Sanders et al TRUE TRUE FALSE 0.2311561.s1 Sanders et al TRUE TRUE FALSE 0.3511569.fa O?Roak et al TRUE TRUE FALSE 0.3011569.mo O?Roak et al TRUE TRUE FALSE 0.3811569.p1 O?Roak et al TRUE TRUE TRUE 0.3911569.s1 O?Roak et al TRUE TRUE FALSE 0.3511571.fa O?Roak et al TRUE TRUE FALSE 0.4111571.mo O?Roak et al TRUE TRUE FALSE 0.4211571.p1 O?Roak et al TRUE TRUE FALSE 0.4111571.s1 O?Roak et al TRUE TRUE FALSE 0.3811581.fa Sanders et al TRUE TRUE FALSE 0.2511581.mo Sanders et al TRUE TRUE FALSE 0.2811581.p1 Sanders et al TRUE TRUE FALSE 0.2711581.s1 Sanders et al TRUE TRUE FALSE 0.2611610.fa O?Roak et al TRUE TRUE FALSE 0.3811610.mo O?Roak et al TRUE TRUE FALSE 0.4111610.p1 O?Roak et al TRUE TRUE FALSE 0.3511610.s1 O?Roak et al TRUE TRUE FALSE 0.3811611.fa Sanders et al TRUE TRUE FALSE 0.2911611.mo Sanders et al TRUE TRUE FALSE 0.2911611.p1 Sanders et al TRUE TRUE FALSE 0.2711611.s1 Sanders et al TRUE FALSE FALSE 0.3211622.fa Sanders et al TRUE TRUE FALSE 0.2711622.mo Sanders et al TRUE TRUE FALSE 0.2511622.p1 Sanders et al TRUE TRUE FALSE 0.3011622.s1 Sanders et al TRUE TRUE FALSE 0.2411629.fa O?Roak et al TRUE TRUE FALSE 0.3011629.mo O?Roak et al TRUE TRUE FALSE 0.4011629.p1 O?Roak et al TRUE TRUE FALSE 0.3511629.s1 O?Roak et al TRUE TRUE FALSE 0.3811638.fa O?Roak et al TRUE TRUE FALSE 0.4811638.mo O?Roak et al TRUE TRUE FALSE 0.3711638.p1 O?Roak et al TRUE TRUE FALSE 0.3711638.s1 O?Roak et al TRUE TRUE FALSE 0.39
11641.fa Sanders et al TRUE TRUE FALSE 0.2711641.mo Sanders et al TRUE TRUE FALSE 0.2411641.p1 Sanders et al TRUE TRUE FALSE 0.3311641.s1 Sanders et al TRUE TRUE FALSE 0.2411654.fa Sanders et al TRUE TRUE FALSE 0.3511654.mo Sanders et al TRUE TRUE FALSE 0.3011654.p1 Sanders et al TRUE TRUE FALSE 0.3211654.s1 Sanders et al TRUE TRUE FALSE 0.2711659.fa O?Roak et al TRUE TRUE FALSE 0.3711659.mo O?Roak et al TRUE TRUE FALSE 0.3811659.p1 O?Roak et al TRUE TRUE FALSE 0.4011659.s1 O?Roak et al TRUE TRUE TRUE 0.3211667.fa Sanders et al TRUE TRUE FALSE 0.3411667.mo Sanders et al TRUE TRUE FALSE 0.3311667.p1 Sanders et al TRUE TRUE TRUE 0.3411667.s1 Sanders et al TRUE TRUE TRUE 0.2711676.fa Sanders et al TRUE TRUE FALSE 0.2211676.mo Sanders et al TRUE TRUE FALSE 0.2211676.p1 Sanders et al TRUE TRUE FALSE 0.2211676.s1 Sanders et al TRUE TRUE FALSE 0.2611691.fa O?Roak et al TRUE TRUE FALSE 0.3611691.mo O?Roak et al TRUE TRUE FALSE 0.4011691.p1 O?Roak et al TRUE TRUE FALSE 0.3711691.s1 O?Roak et al TRUE TRUE FALSE 0.3711696.fa O?Roak et al TRUE TRUE FALSE 0.4711696.mo O?Roak et al TRUE TRUE FALSE 0.3511696.p1 O?Roak et al TRUE TRUE FALSE 0.4411696.s1 O?Roak et al TRUE TRUE TRUE 0.3911700.fa Sanders et al TRUE TRUE FALSE 0.2611700.mo Sanders et al TRUE TRUE FALSE 0.2711700.p1 Sanders et al TRUE TRUE FALSE 0.2611700.s1 Sanders et al TRUE TRUE FALSE 0.2711711.fa O?Roak et al TRUE TRUE FALSE 0.4011711.mo O?Roak et al TRUE TRUE FALSE 0.3911711.p1 O?Roak et al TRUE TRUE FALSE 0.4011711.s1 O?Roak et al TRUE TRUE FALSE 0.3811715.fa O?Roak et al TRUE TRUE FALSE 0.4011715.mo O?Roak et al TRUE TRUE FALSE 0.3611715.p1 O?Roak et al TRUE TRUE FALSE 0.4211715.s1 O?Roak et al TRUE TRUE TRUE 0.3811716.fa Sanders et al TRUE TRUE FALSE 0.2711716.mo Sanders et al TRUE TRUE FALSE 0.2611716.p1 Sanders et al TRUE TRUE FALSE 0.2611716.s1 Sanders et al TRUE TRUE FALSE 0.2811720.fa Sanders et al TRUE TRUE FALSE 0.2711720.mo Sanders et al TRUE TRUE FALSE 0.4211720.p1 Sanders et al TRUE TRUE FALSE 0.3311720.s1 Sanders et al TRUE TRUE FALSE 0.3611722.fa O?Roak et al TRUE TRUE FALSE 0.3911722.mo O?Roak et al TRUE TRUE FALSE 0.3611722.p1 O?Roak et al TRUE TRUE FALSE 0.4011722.s1 O?Roak et al TRUE TRUE FALSE 0.3711724.fa Sanders et al TRUE TRUE FALSE 0.4711724.mo Sanders et al TRUE TRUE FALSE 0.3811724.p1 Sanders et al TRUE TRUE FALSE 0.3611724.s1 Sanders et al TRUE TRUE FALSE 0.2911740.fa Sanders et al TRUE TRUE FALSE 0.2511740.mo Sanders et al TRUE TRUE FALSE 0.2611740.p1 Sanders et al TRUE TRUE FALSE 0.2511740.s1 Sanders et al TRUE TRUE FALSE 0.2711766.fa Sanders et al TRUE TRUE FALSE 0.2511766.mo Sanders et al TRUE TRUE FALSE 0.2711766.p1 Sanders et al TRUE TRUE FALSE 0.2911766.s1 Sanders et al TRUE TRUE FALSE 0.2811773.fa O?Roak et al TRUE TRUE FALSE 0.4411773.mo O?Roak et al TRUE TRUE FALSE 0.3911773.p1 O?Roak et al TRUE TRUE FALSE 0.3311773.s1 O?Roak et al TRUE TRUE FALSE 0.3411788.fa O?Roak et al TRUE TRUE FALSE 0.3811788.mo O?Roak et al TRUE TRUE FALSE 0.3711788.p1 O?Roak et al TRUE TRUE FALSE 0.4211788.s1 O?Roak et al TRUE TRUE TRUE 0.3711797.fa Sanders et al TRUE TRUE FALSE 0.2811797.mo Sanders et al TRUE TRUE FALSE 0.2711797.p1 Sanders et al TRUE TRUE FALSE 0.2611797.s1 Sanders et al TRUE TRUE FALSE 0.2611808.fa Sanders et al TRUE TRUE FALSE 0.3511808.mo Sanders et al TRUE TRUE FALSE 0.3711808.p1 Sanders et al TRUE TRUE FALSE 0.3411808.s1 Sanders et al TRUE TRUE FALSE 0.4111809.fa Sanders et al TRUE TRUE FALSE 0.3711809.mo Sanders et al TRUE TRUE FALSE 0.3111809.p1 Sanders et al TRUE TRUE FALSE 0.2911809.s1 Sanders et al TRUE TRUE FALSE 0.3411810.fa Sanders et al TRUE TRUE FALSE 0.3111810.mo Sanders et al TRUE TRUE FALSE 0.3011810.p1 Sanders et al TRUE TRUE FALSE 0.2411810.s1 Sanders et al TRUE TRUE FALSE 0.4411824.fa Sanders et al TRUE TRUE FALSE 0.30
11824.mo Sanders et al TRUE TRUE FALSE 0.3111824.p1 Sanders et al TRUE TRUE FALSE 0.3011824.s1 Sanders et al TRUE FALSE FALSE 0.3911828.fa Sanders et al TRUE TRUE FALSE 0.2711828.mo Sanders et al TRUE TRUE FALSE 0.2711828.p1 Sanders et al TRUE TRUE TRUE 0.2611828.s1 Sanders et al TRUE TRUE FALSE 0.2711872.fa O?Roak et al TRUE TRUE FALSE 0.3611872.mo O?Roak et al TRUE TRUE FALSE 0.3511872.p1 O?Roak et al TRUE TRUE TRUE 0.3511872.s1 O?Roak et al TRUE TRUE FALSE 0.3311892.fa Sanders et al TRUE TRUE FALSE 0.3011892.mo Sanders et al TRUE TRUE FALSE 0.3111892.p1 Sanders et al TRUE TRUE FALSE 0.3411892.s1 Sanders et al TRUE TRUE FALSE 0.2711894.fa Iossifov et al TRUE TRUE FALSE 0.4411894.mo Iossifov et al TRUE TRUE FALSE 0.5411894.p1 Iossifov et al TRUE TRUE FALSE 0.5411894.s1 Iossifov et al TRUE TRUE FALSE 0.5511895.fa O?Roak et al TRUE TRUE FALSE 0.4011895.mo O?Roak et al TRUE TRUE FALSE 0.4111895.p1 O?Roak et al TRUE TRUE FALSE 0.4811895.s1 O?Roak et al TRUE TRUE TRUE 0.3411905.fa Sanders et al TRUE TRUE FALSE 0.3211905.mo Sanders et al TRUE TRUE FALSE 0.3311905.p1 Sanders et al TRUE TRUE FALSE 0.3811905.s1 Sanders et al TRUE TRUE FALSE 0.3911942.fa O?Roak et al TRUE TRUE FALSE 0.3411942.mo O?Roak et al TRUE TRUE FALSE 0.4011942.p1 O?Roak et al TRUE TRUE TRUE 0.3411942.s1 O?Roak et al TRUE TRUE FALSE 0.3711959.fa O?Roak et al TRUE TRUE FALSE 0.3811959.mo O?Roak et al TRUE TRUE FALSE 0.3911959.p1 O?Roak et al TRUE TRUE FALSE 0.3411959.s1 O?Roak et al TRUE TRUE FALSE 0.3711962.fa Sanders et al TRUE TRUE FALSE 0.2811962.mo Sanders et al TRUE TRUE FALSE 0.2711962.p1 Sanders et al TRUE TRUE FALSE 0.2611962.s1 Sanders et al TRUE TRUE FALSE 0.2711964.fa O?Roak et al TRUE TRUE FALSE 0.4611964.mo O?Roak et al TRUE TRUE FALSE 0.4111964.p1 O?Roak et al TRUE TRUE FALSE 0.4311964.s1 O?Roak et al TRUE TRUE FALSE 0.3712011.fa O?Roak et al TRUE TRUE FALSE 0.3312011.mo O?Roak et al TRUE TRUE FALSE 0.4012011.p1 O?Roak et al TRUE TRUE FALSE 0.3912011.s1 O?Roak et al TRUE TRUE TRUE 0.3812051.fa Iossifov et al TRUE TRUE FALSE 0.5012051.mo Iossifov et al TRUE TRUE FALSE 0.5612051.p1 Iossifov et al TRUE TRUE FALSE 0.5512051.s1 Iossifov et al TRUE TRUE FALSE 0.5812100.fa Sanders et al TRUE TRUE FALSE 0.3212100.mo Sanders et al TRUE TRUE FALSE 0.3012100.p1 Sanders et al TRUE TRUE FALSE 0.3212100.s1 Sanders et al TRUE TRUE FALSE 0.4012106.fa O?Roak et al TRUE TRUE FALSE 0.3912106.mo O?Roak et al TRUE TRUE FALSE 0.3312106.p1 O?Roak et al TRUE TRUE FALSE 0.3312106.s1 O?Roak et al TRUE TRUE TRUE 0.3812152.fa O?Roak et al TRUE TRUE FALSE 0.3712152.mo O?Roak et al TRUE TRUE FALSE 0.3712152.p1 O?Roak et al TRUE TRUE FALSE 0.3312152.s1 O?Roak et al TRUE TRUE FALSE 0.3712153.fa O?Roak et al TRUE TRUE FALSE 0.3812153.mo O?Roak et al TRUE TRUE FALSE 0.4212153.p1 O?Roak et al TRUE TRUE FALSE 0.3712153.s1 O?Roak et al TRUE TRUE FALSE 0.3112161.fa O?Roak et al TRUE TRUE FALSE 0.3112161.mo O?Roak et al TRUE TRUE FALSE 0.3512161.p1 O?Roak et al TRUE TRUE FALSE 0.4412161.s1 O?Roak et al TRUE TRUE TRUE 0.3712162.fa Sanders et al TRUE TRUE FALSE 0.2912162.mo Sanders et al TRUE TRUE FALSE 0.3312162.p1 Sanders et al TRUE TRUE FALSE 0.3012162.s1 Sanders et al TRUE TRUE FALSE 0.2612175.fa Sanders et al TRUE TRUE FALSE 0.3112175.mo Sanders et al TRUE TRUE FALSE 0.2812175.p1 Sanders et al TRUE TRUE FALSE 0.3012175.s1 Sanders et al TRUE TRUE FALSE 0.2712187.fa Sanders et al TRUE TRUE FALSE 0.3212187.mo Sanders et al TRUE TRUE FALSE 0.3412187.p1 Sanders et al TRUE TRUE FALSE 0.3212187.s1 Sanders et al TRUE TRUE FALSE 0.2812224.fa Sanders et al TRUE TRUE FALSE 0.2912224.mo Sanders et al TRUE TRUE FALSE 0.2912224.p1 Sanders et al TRUE TRUE FALSE 0.2812224.s1 Sanders et al TRUE TRUE FALSE 0.2712228.fa Sanders et al TRUE TRUE FALSE 0.2812228.mo Sanders et al TRUE TRUE FALSE 0.30
12228.p1 Sanders et al TRUE TRUE TRUE 0.2512228.s1 Sanders et al TRUE TRUE FALSE 0.2912233.fa O?Roak et al TRUE TRUE FALSE 0.4012233.mo O?Roak et al TRUE TRUE FALSE 0.3912233.p1 O?Roak et al TRUE TRUE FALSE 0.3612233.s1 O?Roak et al TRUE TRUE FALSE 0.3412235.fa Sanders et al TRUE TRUE FALSE 0.2412235.mo Sanders et al TRUE TRUE FALSE 0.2812235.p1 Sanders et al TRUE TRUE FALSE 0.2612235.s1 Sanders et al TRUE TRUE FALSE 0.2212241.fa Sanders et al TRUE TRUE FALSE 0.3212241.mo Sanders et al TRUE TRUE FALSE 0.3112241.p1 Sanders et al TRUE TRUE FALSE 0.3012241.s1 Sanders et al TRUE TRUE FALSE 0.2912243.fa Iossifov et al TRUE TRUE FALSE 0.5512243.mo Iossifov et al TRUE TRUE FALSE 0.6012243.p1 Iossifov et al TRUE TRUE FALSE 0.5712243.s1 Iossifov et al TRUE TRUE FALSE 0.5912252.fa Iossifov et al TRUE TRUE FALSE 0.5012252.mo Iossifov et al TRUE TRUE FALSE 0.5012252.p1 Iossifov et al TRUE TRUE FALSE 0.5812252.s1 Iossifov et al TRUE TRUE TRUE 0.5212285.fa O?Roak et al TRUE FALSE FALSE 0.4012285.mo O?Roak et al TRUE FALSE FALSE 0.3612285.p1 O?Roak et al TRUE FALSE FALSE 0.4912285.s1 O?Roak et al TRUE FALSE FALSE 0.3012295.fa Sanders et al TRUE TRUE FALSE 0.3712295.mo Sanders et al TRUE TRUE FALSE 0.3512295.p1 Sanders et al TRUE TRUE FALSE 0.3112295.s1 Sanders et al TRUE TRUE FALSE 0.3512297.fa Sanders et al TRUE TRUE FALSE 0.2512297.mo Sanders et al TRUE TRUE FALSE 0.2512297.p1 Sanders et al TRUE TRUE FALSE 0.2112297.s1 Sanders et al TRUE TRUE FALSE 0.2312301.fa Iossifov et al TRUE TRUE FALSE 0.5512301.mo Iossifov et al TRUE TRUE FALSE 0.5512301.p1 Iossifov et al TRUE TRUE FALSE 0.4812301.s1 Iossifov et al TRUE TRUE FALSE 0.4812303.fa Sanders et al TRUE TRUE FALSE 0.3512303.mo Sanders et al TRUE TRUE FALSE 0.2312303.p1 Sanders et al TRUE TRUE FALSE 0.2412303.s1 Sanders et al TRUE TRUE FALSE 0.2612304.fa O?Roak et al TRUE TRUE FALSE 0.3712304.mo O?Roak et al TRUE TRUE FALSE 0.4212304.p1 O?Roak et al TRUE TRUE TRUE 0.4012304.s1 O?Roak et al TRUE TRUE FALSE 0.3512308.fa Sanders et al TRUE TRUE FALSE 0.2912308.mo Sanders et al TRUE TRUE FALSE 0.3012308.p1 Sanders et al TRUE TRUE FALSE 0.3112308.s1 Sanders et al TRUE TRUE FALSE 0.4012313.fa Iossifov et al TRUE TRUE FALSE 0.4812313.mo Iossifov et al TRUE TRUE FALSE 0.5012313.p1 Iossifov et al TRUE TRUE FALSE 0.5312313.s1 Iossifov et al TRUE TRUE FALSE 0.5812317.fa Sanders et al TRUE TRUE FALSE 0.2312317.mo Sanders et al TRUE TRUE FALSE 0.2712317.p1 Sanders et al TRUE TRUE FALSE 0.2312317.s1 Sanders et al TRUE TRUE TRUE 0.2512321.fa Iossifov et al TRUE TRUE FALSE 0.4912321.mo Iossifov et al TRUE TRUE FALSE 0.5012321.p1 Iossifov et al TRUE TRUE FALSE 0.5112321.s1 Iossifov et al TRUE TRUE FALSE 0.5412327.fa Sanders et al TRUE TRUE FALSE 0.3612327.mo Sanders et al TRUE TRUE FALSE 0.3312327.p1 Sanders et al TRUE TRUE FALSE 0.3112327.s1 Sanders et al TRUE TRUE FALSE 0.5012334.fa Iossifov et al TRUE TRUE FALSE 0.4112334.mo Iossifov et al TRUE TRUE FALSE 0.4812334.p1 Iossifov et al TRUE TRUE FALSE 0.4012334.s1 Iossifov et al TRUE TRUE FALSE 0.4112340.fa Sanders et al TRUE TRUE FALSE 0.3812340.mo Sanders et al TRUE TRUE FALSE 0.3812340.p1 Sanders et al TRUE TRUE FALSE 0.4312340.s1 Sanders et al TRUE TRUE FALSE 0.3412343.fa Sanders et al TRUE TRUE FALSE 0.3512343.mo Sanders et al TRUE TRUE FALSE 0.3912343.p1 Sanders et al TRUE TRUE FALSE 0.4012343.s1 Sanders et al TRUE TRUE FALSE 0.3712345.fa Sanders et al TRUE TRUE FALSE 0.2412345.mo Sanders et al TRUE TRUE FALSE 0.2312345.p1 Sanders et al TRUE TRUE FALSE 0.2612345.s1 Sanders et al TRUE TRUE FALSE 0.2412358.fa O?Roak et al TRUE TRUE FALSE 0.4612358.mo O?Roak et al TRUE TRUE FALSE 0.3912358.p1 O?Roak et al TRUE TRUE FALSE 0.4312358.s1 O?Roak et al TRUE TRUE FALSE 0.3812360.fa Iossifov et al TRUE TRUE FALSE 0.6012360.mo Iossifov et al TRUE TRUE FALSE 0.5412360.p1 Iossifov et al TRUE TRUE FALSE 0.57
12360.s1 Iossifov et al TRUE TRUE FALSE 0.6112368.fa Sanders et al TRUE TRUE FALSE 0.3112368.mo Sanders et al TRUE TRUE FALSE 0.2912368.p1 Sanders et al TRUE TRUE FALSE 0.3312368.s1 Sanders et al TRUE TRUE FALSE 0.2812370.fa Sanders et al TRUE TRUE FALSE 0.2712370.mo Sanders et al TRUE TRUE FALSE 0.2912370.p1 Sanders et al TRUE TRUE TRUE 0.3012370.s1 Sanders et al TRUE TRUE TRUE 0.2712373.fa O?Roak et al TRUE TRUE FALSE 0.3512373.mo O?Roak et al TRUE TRUE FALSE 0.3312373.p1 O?Roak et al TRUE TRUE FALSE 0.4412373.s1 O?Roak et al TRUE TRUE FALSE 0.3612375.fa Sanders et al TRUE TRUE FALSE 0.2812375.mo Sanders et al TRUE TRUE FALSE 0.3112375.p1 Sanders et al TRUE TRUE FALSE 0.3012375.s1 Sanders et al TRUE TRUE FALSE 0.2812383.fa Sanders et al TRUE TRUE FALSE 0.3012383.mo Sanders et al TRUE TRUE FALSE 0.3012383.p1 Sanders et al TRUE TRUE TRUE 0.3212383.s1 Sanders et al TRUE TRUE TRUE 0.2912390.fa O?Roak et al TRUE TRUE FALSE 0.3912390.mo O?Roak et al TRUE TRUE FALSE 0.3012390.p1 O?Roak et al TRUE TRUE FALSE 0.3712390.s1 O?Roak et al TRUE TRUE TRUE 0.3912394.fa Iossifov et al TRUE FALSE FALSE 0.5012394.mo Iossifov et al TRUE FALSE FALSE 0.5012394.p1 Iossifov et al TRUE FALSE FALSE 0.4512394.s1 Iossifov et al TRUE FALSE FALSE 0.4712396.fa Iossifov et al TRUE TRUE FALSE 0.6012396.mo Iossifov et al TRUE TRUE FALSE 0.5012396.p1 Iossifov et al TRUE TRUE FALSE 0.4512396.s1 Iossifov et al TRUE TRUE FALSE 0.4712403.fa Sanders et al TRUE TRUE FALSE 0.4012403.mo Sanders et al TRUE TRUE FALSE 0.3612403.p1 Sanders et al TRUE TRUE FALSE 0.3712403.s1 Sanders et al TRUE TRUE FALSE 0.3512409.fa Iossifov et al TRUE TRUE FALSE 0.4412409.mo Iossifov et al TRUE TRUE FALSE 0.4512409.p1 Iossifov et al TRUE TRUE FALSE 0.4212409.s1 Iossifov et al TRUE TRUE TRUE 0.4812412.fa Iossifov et al TRUE FALSE FALSE 0.4912412.mo Iossifov et al TRUE FALSE FALSE 0.5012412.p1 Iossifov et al TRUE FALSE FALSE 0.4612412.s1 Iossifov et al TRUE FALSE FALSE 0.4612420.fa Iossifov et al TRUE TRUE FALSE 0.5012420.mo Iossifov et al TRUE TRUE FALSE 0.5612420.p1 Iossifov et al TRUE TRUE FALSE 0.4412420.s1 Iossifov et al TRUE TRUE FALSE 0.4712424.fa Iossifov et al TRUE TRUE FALSE 0.4612424.mo Iossifov et al TRUE TRUE FALSE 0.3812424.p1 Iossifov et al TRUE TRUE TRUE 0.4612424.s1 Iossifov et al TRUE TRUE FALSE 0.4812438.fa Iossifov et al TRUE TRUE FALSE 0.4912438.mo Iossifov et al TRUE TRUE FALSE 0.5012438.p1 Iossifov et al TRUE TRUE FALSE 0.5112438.s1 Iossifov et al TRUE TRUE FALSE 0.5312441.fa Iossifov et al TRUE TRUE FALSE 0.3812441.mo Iossifov et al TRUE TRUE FALSE 0.4112441.p1 Iossifov et al TRUE TRUE TRUE 0.3912441.s1 Iossifov et al TRUE TRUE FALSE 0.4412445.fa Iossifov et al TRUE TRUE FALSE 0.4512445.mo Iossifov et al TRUE TRUE FALSE 0.5312445.p1 Iossifov et al TRUE TRUE FALSE 0.4112445.s1 Iossifov et al TRUE TRUE FALSE 0.4212460.fa Iossifov et al TRUE TRUE FALSE 0.5312460.mo Iossifov et al TRUE TRUE FALSE 0.4212460.p1 Iossifov et al TRUE TRUE FALSE 0.4612460.s1 Iossifov et al TRUE TRUE FALSE 0.5012462.fa Iossifov et al TRUE TRUE FALSE 0.6112462.mo Iossifov et al TRUE TRUE FALSE 0.5512462.p1 Iossifov et al TRUE TRUE FALSE 0.5012462.s1 Iossifov et al TRUE TRUE FALSE 0.5912463.fa Iossifov et al TRUE TRUE FALSE 0.5112463.mo Iossifov et al TRUE TRUE FALSE 0.4612463.p1 Iossifov et al TRUE TRUE FALSE 0.5912463.s1 Iossifov et al TRUE TRUE FALSE 0.5812467.fa Iossifov et al TRUE FALSE FALSE 0.4812467.mo Iossifov et al TRUE FALSE FALSE 0.5012467.p1 Iossifov et al TRUE FALSE FALSE 0.4712467.s1 Iossifov et al TRUE FALSE FALSE 0.4212473.fa Iossifov et al TRUE FALSE FALSE 0.3312473.mo Iossifov et al TRUE FALSE FALSE 0.4512473.p1 Iossifov et al TRUE FALSE FALSE 0.3012473.s1 Iossifov et al TRUE FALSE FALSE 0.3012480.fa Iossifov et al FALSE TRUE FALSE 0.4612480.mo Iossifov et al FALSE TRUE FALSE 0.5312480.p1 Iossifov et al FALSE TRUE FALSE 0.4312480.s1 Iossifov et al FALSE FALSE FALSE 0.41
12481.fa Iossifov et al TRUE TRUE FALSE 0.4312481.mo Iossifov et al TRUE TRUE FALSE 0.4912481.p1 Iossifov et al TRUE TRUE TRUE 0.4812481.s1 Iossifov et al TRUE TRUE FALSE 0.4512498.fa Iossifov et al TRUE TRUE FALSE 0.3812498.mo Iossifov et al TRUE TRUE FALSE 0.4012498.p1 Iossifov et al TRUE TRUE FALSE 0.4412498.s1 Iossifov et al TRUE TRUE FALSE 0.4012507.fa Sanders et al TRUE TRUE FALSE 0.2812507.mo Sanders et al TRUE TRUE FALSE 0.2712507.p1 Sanders et al TRUE TRUE FALSE 0.2912507.s1 Sanders et al TRUE TRUE FALSE 0.3412510.fa Iossifov et al TRUE TRUE FALSE 0.4212510.mo Iossifov et al TRUE TRUE FALSE 0.4912510.p1 Iossifov et al TRUE TRUE TRUE 0.4912510.s1 Iossifov et al TRUE TRUE TRUE 0.5312512.fa Sanders et al TRUE TRUE FALSE 0.3412512.mo Sanders et al TRUE TRUE FALSE 0.3212512.p1 Sanders et al TRUE TRUE FALSE 0.4012512.s1 Sanders et al TRUE TRUE FALSE 0.3512515.fa Iossifov et al TRUE TRUE FALSE 0.5012515.mo Iossifov et al TRUE TRUE FALSE 0.5412515.p1 Iossifov et al TRUE TRUE FALSE 0.4912515.s1 Iossifov et al TRUE TRUE FALSE 0.5112518.fa Iossifov et al TRUE TRUE FALSE 0.4712518.mo Iossifov et al TRUE TRUE FALSE 0.5512518.p1 Iossifov et al TRUE TRUE FALSE 0.4612518.s1 Iossifov et al TRUE TRUE FALSE 0.4812522.fa Sanders et al TRUE FALSE FALSE 0.1912522.mo Sanders et al TRUE FALSE FALSE 0.1912522.p1 Sanders et al TRUE FALSE FALSE 0.2412522.s1 Sanders et al TRUE FALSE FALSE 0.2412523.fa Iossifov et al TRUE TRUE FALSE 0.5412523.mo Iossifov et al TRUE TRUE FALSE 0.5612523.p1 Iossifov et al TRUE TRUE FALSE 0.4412523.s1 Iossifov et al TRUE TRUE FALSE 0.5812524.fa Sanders et al TRUE TRUE FALSE 0.2912524.mo Sanders et al TRUE TRUE FALSE 0.2912524.p1 Sanders et al TRUE TRUE FALSE 0.3012524.s1 Sanders et al TRUE TRUE FALSE 0.2812526.fa Iossifov et al TRUE FALSE FALSE 0.5912526.mo Iossifov et al TRUE FALSE FALSE 0.6012526.p1 Iossifov et al TRUE FALSE FALSE 0.5812526.s1 Iossifov et al TRUE FALSE FALSE 0.5912534.fa Sanders et al TRUE TRUE FALSE 0.3612534.mo Sanders et al TRUE TRUE FALSE 0.3812534.p1 Sanders et al TRUE TRUE FALSE 0.3712534.s1 Sanders et al TRUE TRUE FALSE 0.3412536.fa Sanders et al TRUE TRUE FALSE 0.3612536.mo Sanders et al TRUE TRUE FALSE 0.3712536.p1 Sanders et al TRUE TRUE FALSE 0.3512536.s1 Sanders et al TRUE TRUE FALSE 0.4512552.fa Sanders et al TRUE TRUE FALSE 0.3612552.mo Sanders et al TRUE TRUE FALSE 0.3812552.p1 Sanders et al TRUE TRUE FALSE 0.3512552.s1 Sanders et al TRUE TRUE TRUE 0.4012561.fa Sanders et al TRUE TRUE FALSE 0.2712561.mo Sanders et al TRUE TRUE FALSE 0.2112561.p1 Sanders et al TRUE TRUE FALSE 0.2112561.s1 Sanders et al TRUE TRUE TRUE 0.2212578.fa O?Roak et al TRUE TRUE FALSE 0.3712578.mo O?Roak et al TRUE TRUE FALSE 0.3812578.p1 O?Roak et al TRUE TRUE FALSE 0.3512578.s1 O?Roak et al TRUE TRUE FALSE 0.3612579.fa Iossifov et al TRUE TRUE FALSE 0.5812579.mo Iossifov et al TRUE TRUE FALSE 0.5212579.p1 Iossifov et al TRUE TRUE FALSE 0.5512579.s1 Iossifov et al TRUE TRUE FALSE 0.6012581.fa O?Roak et al TRUE TRUE FALSE 0.4512581.mo O?Roak et al TRUE TRUE FALSE 0.4612581.p1 O?Roak et al TRUE TRUE FALSE 0.4412581.s1 O?Roak et al TRUE TRUE FALSE 0.3812582.fa Iossifov et al TRUE TRUE FALSE 0.6012582.mo Iossifov et al TRUE TRUE FALSE 0.5912582.p1 Iossifov et al TRUE TRUE TRUE 0.5512582.s1 Iossifov et al TRUE TRUE FALSE 0.5312588.fa Iossifov et al TRUE TRUE FALSE 0.5412588.mo Iossifov et al TRUE TRUE FALSE 0.5312588.p1 Iossifov et al TRUE TRUE FALSE 0.5712588.s1 Iossifov et al TRUE TRUE TRUE 0.5712616.fa Sanders et al TRUE TRUE FALSE 0.2312616.mo Sanders et al TRUE TRUE FALSE 0.2112616.p1 Sanders et al TRUE TRUE FALSE 0.2212616.s1 Sanders et al TRUE TRUE FALSE 0.2212618.fa Iossifov et al TRUE TRUE FALSE 0.4812618.mo Iossifov et al TRUE TRUE FALSE 0.5412618.p1 Iossifov et al TRUE TRUE TRUE 0.5512618.s1 Iossifov et al TRUE TRUE FALSE 0.5512620.fa Iossifov et al TRUE FALSE FALSE 0.54
12620.mo Iossifov et al TRUE FALSE FALSE 0.5412620.p1 Iossifov et al TRUE FALSE FALSE 0.5112620.s1 Iossifov et al TRUE FALSE FALSE 0.4912626.fa Iossifov et al TRUE TRUE FALSE 0.5012626.mo Iossifov et al TRUE TRUE FALSE 0.5212626.p1 Iossifov et al TRUE TRUE FALSE 0.5212626.s1 Iossifov et al TRUE TRUE FALSE 0.5212628.fa Iossifov et al TRUE TRUE FALSE 0.5212628.mo Iossifov et al TRUE TRUE FALSE 0.4912628.p1 Iossifov et al TRUE TRUE FALSE 0.5112628.s1 Iossifov et al TRUE TRUE FALSE 0.5512630.fa O?Roak et al TRUE TRUE FALSE 0.3912630.mo O?Roak et al TRUE TRUE FALSE 0.3912630.p1 O?Roak et al TRUE TRUE FALSE 0.4012630.s1 O?Roak et al TRUE TRUE FALSE 0.3712631.fa Iossifov et al TRUE TRUE FALSE 0.5012631.mo Iossifov et al TRUE TRUE FALSE 0.5212631.p1 Iossifov et al TRUE TRUE FALSE 0.5312631.s1 Iossifov et al TRUE TRUE TRUE 0.5212633.fa Iossifov et al TRUE FALSE FALSE 0.4712633.mo Iossifov et al TRUE FALSE FALSE 0.4812633.p1 Iossifov et al TRUE FALSE FALSE 0.4512633.s1 Iossifov et al TRUE FALSE FALSE 0.4712637.fa Iossifov et al TRUE TRUE FALSE 0.4012637.mo Iossifov et al TRUE TRUE FALSE 0.4112637.p1 Iossifov et al TRUE TRUE TRUE 0.4012637.s1 Iossifov et al TRUE TRUE FALSE 0.4312638.fa Iossifov et al TRUE TRUE FALSE 0.4512638.mo Iossifov et al TRUE TRUE FALSE 0.4512638.p1 Iossifov et al TRUE TRUE FALSE 0.4312638.s1 Iossifov et al TRUE TRUE FALSE 0.4712642.fa Iossifov et al TRUE TRUE FALSE 0.5612642.mo Iossifov et al TRUE TRUE FALSE 0.5412642.p1 Iossifov et al TRUE TRUE FALSE 0.5712642.s1 Iossifov et al TRUE TRUE FALSE 0.5312644.fa Iossifov et al TRUE TRUE FALSE 0.5412644.mo Iossifov et al TRUE TRUE FALSE 0.5112644.p1 Iossifov et al TRUE TRUE FALSE 0.5012644.s1 Iossifov et al TRUE TRUE FALSE 0.4912645.fa Iossifov et al TRUE TRUE FALSE 0.4112645.mo Iossifov et al TRUE TRUE FALSE 0.4312645.p1 Iossifov et al TRUE TRUE FALSE 0.4112645.s1 Iossifov et al TRUE TRUE FALSE 0.4212647.fa Sanders et al TRUE TRUE FALSE 0.3112647.mo Sanders et al TRUE TRUE FALSE 0.2612647.p1 Sanders et al TRUE TRUE FALSE 0.4012647.s1 Sanders et al TRUE TRUE FALSE 0.3912650.fa Sanders et al TRUE TRUE FALSE 0.2412650.mo Sanders et al TRUE TRUE FALSE 0.2112650.p1 Sanders et al TRUE TRUE FALSE 0.2012650.s1 Sanders et al TRUE TRUE TRUE 0.2612651.fa Sanders et al TRUE TRUE FALSE 0.3112651.mo Sanders et al TRUE TRUE FALSE 0.3012651.p1 Sanders et al TRUE TRUE FALSE 0.3712651.s1 Sanders et al TRUE TRUE FALSE 0.2812652.fa Iossifov et al TRUE TRUE FALSE 0.5012652.mo Iossifov et al TRUE TRUE FALSE 0.5512652.p1 Iossifov et al TRUE TRUE FALSE 0.5412652.s1 Iossifov et al TRUE TRUE FALSE 0.5412653.fa Iossifov et al TRUE TRUE FALSE 0.5312653.mo Iossifov et al TRUE TRUE FALSE 0.4912653.p1 Iossifov et al TRUE TRUE FALSE 0.4912653.s1 Iossifov et al TRUE TRUE FALSE 0.5312655.fa Iossifov et al TRUE TRUE FALSE 0.5012655.mo Iossifov et al TRUE TRUE FALSE 0.5812655.p1 Iossifov et al TRUE TRUE TRUE 0.5012655.s1 Iossifov et al TRUE TRUE TRUE 0.4612656.fa Sanders et al TRUE TRUE FALSE 0.3712656.mo Sanders et al TRUE TRUE FALSE 0.3512656.p1 Sanders et al TRUE TRUE FALSE 0.3412656.s1 Sanders et al TRUE TRUE TRUE 0.3312657.fa Sanders et al TRUE TRUE FALSE 0.3112657.mo Sanders et al TRUE TRUE FALSE 0.3512657.p1 Sanders et al TRUE TRUE FALSE 0.3412657.s1 Sanders et al TRUE TRUE FALSE 0.4012664.fa Iossifov et al TRUE TRUE FALSE 0.5012664.mo Iossifov et al TRUE TRUE FALSE 0.5212664.p1 Iossifov et al TRUE TRUE FALSE 0.4612664.s1 Iossifov et al TRUE TRUE FALSE 0.5612680.fa Sanders et al TRUE FALSE FALSE 0.3312680.mo Sanders et al TRUE FALSE FALSE 0.3212680.p1 Sanders et al TRUE FALSE FALSE 0.3112680.s1 Sanders et al TRUE FALSE FALSE 0.4412683.fa Iossifov et al TRUE FALSE FALSE 0.5212683.mo Iossifov et al TRUE FALSE FALSE 0.5412683.p1 Iossifov et al TRUE FALSE FALSE 0.5812683.s1 Iossifov et al TRUE FALSE FALSE 0.5312685.fa Sanders et al TRUE TRUE FALSE 0.3212685.mo Sanders et al TRUE TRUE FALSE 0.29
12685.p1 Sanders et al TRUE TRUE FALSE 0.2912685.s1 Sanders et al TRUE TRUE FALSE 0.3212688.fa Iossifov et al TRUE FALSE FALSE 0.6012688.mo Iossifov et al TRUE FALSE FALSE 0.5512688.p1 Iossifov et al TRUE FALSE FALSE 0.5712688.s1 Iossifov et al TRUE FALSE FALSE 0.5612690.fa Sanders et al TRUE TRUE FALSE 0.2112690.mo Sanders et al TRUE TRUE FALSE 0.2312690.p1 Sanders et al TRUE TRUE FALSE 0.2112690.s1 Sanders et al TRUE TRUE FALSE 0.2212691.fa Iossifov et al TRUE TRUE FALSE 0.4612691.mo Iossifov et al TRUE TRUE FALSE 0.4812691.p1 Iossifov et al TRUE TRUE FALSE 0.4812691.s1 Iossifov et al TRUE TRUE FALSE 0.4912697.fa Iossifov et al TRUE FALSE FALSE 0.5012697.mo Iossifov et al TRUE FALSE FALSE 0.5012697.p1 Iossifov et al TRUE FALSE FALSE 0.4612697.s1 Iossifov et al TRUE FALSE FALSE 0.5212703.fa O?Roak et al TRUE TRUE FALSE 0.5012703.mo O?Roak et al TRUE TRUE FALSE 0.4012703.p1 O?Roak et al TRUE TRUE FALSE 0.3912703.s1 O?Roak et al TRUE TRUE FALSE 0.3112705.fa Iossifov et al TRUE FALSE FALSE 0.5712705.mo Iossifov et al TRUE FALSE FALSE 0.5512705.p1 Iossifov et al TRUE FALSE FALSE 0.5712705.s1 Iossifov et al TRUE FALSE FALSE 0.5512708.fa Iossifov et al TRUE TRUE FALSE 0.4812708.mo Iossifov et al TRUE TRUE FALSE 0.5012708.p1 Iossifov et al TRUE TRUE FALSE 0.4712708.s1 Iossifov et al TRUE TRUE FALSE 0.5312716.fa Iossifov et al TRUE FALSE FALSE 0.4812716.mo Iossifov et al TRUE FALSE FALSE 0.4312716.p1 Iossifov et al TRUE FALSE FALSE 0.4312716.s1 Iossifov et al TRUE FALSE FALSE 0.4312719.fa Iossifov et al TRUE FALSE FALSE 0.4512719.mo Iossifov et al TRUE FALSE FALSE 0.4612719.p1 Iossifov et al TRUE FALSE FALSE 0.4512719.s1 Iossifov et al TRUE FALSE FALSE 0.4612720.fa Iossifov et al TRUE FALSE FALSE 0.4512720.mo Iossifov et al TRUE FALSE FALSE 0.4612720.p1 Iossifov et al TRUE FALSE FALSE 0.4412720.s1 Iossifov et al TRUE FALSE FALSE 0.4912723.fa Iossifov et al TRUE TRUE FALSE 0.4612723.mo Iossifov et al TRUE TRUE FALSE 0.4312723.p1 Iossifov et al TRUE TRUE FALSE 0.4312723.s1 Iossifov et al TRUE TRUE FALSE 0.4412724.fa Iossifov et al TRUE FALSE FALSE 0.4512724.mo Iossifov et al TRUE FALSE FALSE 0.4212724.p1 Iossifov et al TRUE FALSE FALSE 0.4412724.s1 Iossifov et al TRUE FALSE FALSE 0.4112727.fa Iossifov et al TRUE FALSE FALSE 0.4212727.mo Iossifov et al TRUE FALSE FALSE 0.4112727.p1 Iossifov et al TRUE FALSE FALSE 0.4312727.s1 Iossifov et al TRUE FALSE FALSE 0.4312729.fa Sanders et al TRUE TRUE FALSE 0.3412729.mo Sanders et al TRUE TRUE FALSE 0.3712729.p1 Sanders et al TRUE TRUE FALSE 0.3412729.s1 Sanders et al TRUE TRUE FALSE 0.2712733.fa Iossifov et al TRUE FALSE FALSE 0.4512733.mo Iossifov et al TRUE FALSE FALSE 0.5012733.p1 Iossifov et al TRUE FALSE FALSE 0.4512733.s1 Iossifov et al TRUE FALSE FALSE 0.4712735.fa Iossifov et al TRUE TRUE FALSE 0.4112735.mo Iossifov et al TRUE TRUE FALSE 0.4212735.p1 Iossifov et al TRUE TRUE FALSE 0.4112735.s1 Iossifov et al TRUE TRUE FALSE 0.4112736.fa Sanders et al TRUE TRUE FALSE 0.3112736.mo Sanders et al TRUE TRUE FALSE 0.3112736.p1 Sanders et al TRUE TRUE FALSE 0.2412736.s1 Sanders et al TRUE TRUE FALSE 0.4112739.fa Iossifov et al TRUE TRUE FALSE 0.5512739.mo Iossifov et al TRUE TRUE FALSE 0.5812739.p1 Iossifov et al TRUE TRUE FALSE 0.5712739.s1 Iossifov et al TRUE TRUE FALSE 0.5912741.fa O?Roak et al TRUE TRUE FALSE 0.3812741.mo O?Roak et al TRUE TRUE FALSE 0.3812741.p1 O?Roak et al TRUE TRUE FALSE 0.4012741.s1 O?Roak et al TRUE TRUE TRUE 0.4012743.fa Iossifov et al TRUE TRUE FALSE 0.4712743.mo Iossifov et al TRUE TRUE FALSE 0.4912743.p1 Iossifov et al TRUE TRUE FALSE 0.4712743.s1 Iossifov et al TRUE TRUE FALSE 0.4912748.fa Iossifov et al TRUE TRUE FALSE 0.5112748.mo Iossifov et al TRUE TRUE FALSE 0.5312748.p1 Iossifov et al TRUE TRUE FALSE 0.5512748.s1 Iossifov et al TRUE TRUE FALSE 0.5712758.fa Iossifov et al TRUE TRUE FALSE 0.5512758.mo Iossifov et al TRUE TRUE FALSE 0.5412758.p1 Iossifov et al TRUE TRUE FALSE 0.54
12758.s1 Iossifov et al TRUE TRUE FALSE 0.5512759.fa Iossifov et al TRUE TRUE FALSE 0.5112759.mo Iossifov et al TRUE TRUE FALSE 0.5312759.p1 Iossifov et al TRUE TRUE FALSE 0.5012759.s1 Iossifov et al TRUE TRUE FALSE 0.5412763.fa Sanders et al TRUE FALSE FALSE 0.3012763.mo Sanders et al TRUE FALSE FALSE 0.3212763.p1 Sanders et al TRUE FALSE FALSE 0.3512763.s1 Sanders et al TRUE FALSE FALSE 0.3712764.fa Iossifov et al TRUE TRUE FALSE 0.5212764.mo Iossifov et al TRUE TRUE FALSE 0.5212764.p1 Iossifov et al TRUE TRUE FALSE 0.5112764.s1 Iossifov et al TRUE TRUE FALSE 0.5512770.fa Iossifov et al TRUE FALSE FALSE 0.5212770.mo Iossifov et al TRUE FALSE FALSE 0.5612770.p1 Iossifov et al TRUE FALSE FALSE 0.5412770.s1 Iossifov et al TRUE FALSE FALSE 0.5712780.fa Sanders et al TRUE TRUE FALSE 0.2412780.mo Sanders et al TRUE TRUE FALSE 0.2212780.p1 Sanders et al TRUE TRUE FALSE 0.2312780.s1 Sanders et al TRUE TRUE FALSE 0.2112790.fa Sanders et al TRUE TRUE FALSE 0.3812790.mo Sanders et al TRUE TRUE FALSE 0.3212790.p1 Sanders et al TRUE TRUE FALSE 0.3112790.s1 Sanders et al TRUE TRUE FALSE 0.3112802.fa Sanders et al TRUE FALSE FALSE 0.3712802.mo Sanders et al TRUE FALSE FALSE 0.3512802.p1 Sanders et al TRUE FALSE FALSE 0.3512802.s1 Sanders et al TRUE FALSE FALSE 0.3312810.fa O?Roak et al TRUE TRUE FALSE 0.4912810.mo O?Roak et al TRUE TRUE FALSE 0.3912810.p1 O?Roak et al TRUE TRUE TRUE 0.4212810.s1 O?Roak et al TRUE TRUE FALSE 0.3312826.fa Iossifov et al TRUE FALSE FALSE 0.4012826.mo Iossifov et al TRUE FALSE FALSE 0.4412826.p1 Iossifov et al TRUE FALSE FALSE 0.4112826.s1 Iossifov et al TRUE FALSE FALSE 0.4112829.fa Iossifov et al TRUE TRUE FALSE 0.4012829.mo Iossifov et al TRUE TRUE FALSE 0.4212829.p1 Iossifov et al TRUE TRUE FALSE 0.3912829.s1 Iossifov et al TRUE TRUE TRUE 0.4112833.fa Iossifov et al TRUE FALSE FALSE 0.4612833.mo Iossifov et al TRUE FALSE FALSE 0.4512833.p1 Iossifov et al TRUE FALSE FALSE 0.4912833.s1 Iossifov et al TRUE FALSE FALSE 0.4512836.fa Iossifov et al TRUE TRUE FALSE 0.4212836.mo Iossifov et al TRUE TRUE FALSE 0.4312836.p1 Iossifov et al TRUE TRUE TRUE 0.4312836.s1 Iossifov et al TRUE TRUE TRUE 0.4312837.fa Iossifov et al TRUE TRUE FALSE 0.4512837.mo Iossifov et al TRUE TRUE FALSE 0.4512837.p1 Iossifov et al TRUE TRUE FALSE 0.4512837.s1 Iossifov et al TRUE TRUE TRUE 0.4512838.fa Iossifov et al TRUE TRUE FALSE 0.4412838.mo Iossifov et al TRUE TRUE FALSE 0.4412838.p1 Iossifov et al TRUE TRUE FALSE 0.4612838.s1 Iossifov et al TRUE TRUE TRUE 0.4412840.fa Iossifov et al TRUE FALSE FALSE 0.3912840.mo Iossifov et al TRUE FALSE FALSE 0.3812840.p1 Iossifov et al TRUE FALSE FALSE 0.3912840.s1 Iossifov et al TRUE FALSE FALSE 0.4212843.fa Iossifov et al FALSE TRUE FALSE 0.4212843.mo Iossifov et al FALSE TRUE FALSE 0.4012843.p1 Iossifov et al FALSE TRUE FALSE 0.3812843.s1 Iossifov et al FALSE FALSE FALSE 0.4112851.fa Iossifov et al TRUE TRUE FALSE 0.4112851.mo Iossifov et al TRUE TRUE FALSE 0.4012851.p1 Iossifov et al TRUE TRUE FALSE 0.4012851.s1 Iossifov et al TRUE TRUE TRUE 0.4112852.fa Iossifov et al TRUE TRUE FALSE 0.3912852.mo Iossifov et al TRUE TRUE FALSE 0.4212852.p1 Iossifov et al TRUE TRUE FALSE 0.4012852.s1 Iossifov et al TRUE TRUE FALSE 0.4012869.fa Sanders et al TRUE TRUE FALSE 0.3512869.mo Sanders et al TRUE TRUE FALSE 0.4312869.p1 Sanders et al TRUE TRUE FALSE 0.4012869.s1 Sanders et al TRUE FALSE FALSE 0.3212905.fa O?Roak et al TRUE FALSE FALSE 0.3812905.mo O?Roak et al TRUE FALSE FALSE 0.3312905.p1 O?Roak et al TRUE FALSE FALSE 0.3412905.s1 O?Roak et al TRUE FALSE FALSE 0.3912906.fa Sanders et al TRUE FALSE FALSE 0.3512906.mo Sanders et al TRUE FALSE FALSE 0.2912906.p1 Sanders et al TRUE FALSE FALSE 0.2812906.s1 Sanders et al TRUE FALSE FALSE 0.2912937.fa Iossifov et al TRUE TRUE FALSE 0.4812937.mo Iossifov et al TRUE TRUE FALSE 0.5112937.p1 Iossifov et al TRUE TRUE FALSE 0.5012937.s1 Iossifov et al TRUE TRUE FALSE 0.42
12958.fa Sanders et al TRUE FALSE FALSE 0.3712958.mo Sanders et al TRUE FALSE FALSE 0.3612958.p1 Sanders et al TRUE FALSE FALSE 0.4412958.s1 Sanders et al TRUE FALSE FALSE 0.3612962.fa Iossifov et al TRUE TRUE FALSE 0.4312962.mo Iossifov et al TRUE TRUE FALSE 0.5112962.p1 Iossifov et al TRUE TRUE FALSE 0.5012962.s1 Iossifov et al TRUE TRUE FALSE 0.4712975.fa Iossifov et al TRUE TRUE FALSE 0.4112975.mo Iossifov et al TRUE TRUE FALSE 0.5612975.p1 Iossifov et al TRUE TRUE FALSE 0.5012975.s1 Iossifov et al TRUE TRUE FALSE 0.5212984.fa Sanders et al TRUE TRUE FALSE 0.3212984.mo Sanders et al TRUE TRUE FALSE 0.3412984.p1 Sanders et al TRUE TRUE FALSE 0.3112984.s1 Sanders et al TRUE TRUE FALSE 0.4012997.fa Iossifov et al TRUE TRUE FALSE 0.4312997.mo Iossifov et al TRUE TRUE FALSE 0.5012997.p1 Iossifov et al TRUE TRUE TRUE 0.4812997.s1 Iossifov et al TRUE TRUE TRUE 0.5713000.fa Sanders et al TRUE FALSE FALSE 0.2813000.mo Sanders et al TRUE FALSE FALSE 0.2813000.p1 Sanders et al TRUE FALSE FALSE 0.2913000.s1 Sanders et al TRUE FALSE FALSE 0.3013016.fa Iossifov et al TRUE TRUE FALSE 0.4713016.mo Iossifov et al TRUE TRUE FALSE 0.5413016.p1 Iossifov et al TRUE TRUE FALSE 0.5213016.s1 Iossifov et al TRUE TRUE FALSE 0.6013018.fa Iossifov et al TRUE TRUE FALSE 0.5813018.mo Iossifov et al TRUE TRUE FALSE 0.6113018.p1 Iossifov et al TRUE TRUE TRUE 0.4813018.s1 Iossifov et al TRUE TRUE FALSE 0.4413048.fa O?Roak et al TRUE TRUE FALSE 0.4613048.mo O?Roak et al TRUE TRUE FALSE 0.4013048.p1 O?Roak et al TRUE TRUE FALSE 0.4013048.s1 O?Roak et al TRUE TRUE FALSE 0.3313063.fa Sanders et al TRUE TRUE FALSE 0.2813063.mo Sanders et al TRUE TRUE FALSE 0.2813063.p1 Sanders et al TRUE TRUE FALSE 0.2913063.s1 Sanders et al TRUE TRUE FALSE 0.2713073.fa Sanders et al TRUE TRUE FALSE 0.2813073.mo Sanders et al TRUE TRUE FALSE 0.3013073.p1 Sanders et al TRUE TRUE FALSE 0.3213073.s1 Sanders et al TRUE TRUE FALSE 0.3313094.fa Iossifov et al TRUE FALSE FALSE 0.4613094.mo Iossifov et al TRUE FALSE FALSE 0.4813094.p1 Iossifov et al TRUE FALSE FALSE 0.4713094.s1 Iossifov et al TRUE FALSE FALSE 0.4513096.fa Iossifov et al TRUE TRUE FALSE 0.5113096.mo Iossifov et al TRUE TRUE FALSE 0.5213096.p1 Iossifov et al TRUE TRUE FALSE 0.4413096.s1 Iossifov et al TRUE FALSE FALSE 0.4913097.fa Iossifov et al TRUE TRUE FALSE 0.4313097.mo Iossifov et al TRUE TRUE FALSE 0.4113097.p1 Iossifov et al TRUE TRUE TRUE 0.5013097.s1 Iossifov et al TRUE TRUE FALSE 0.5113099.fa Iossifov et al TRUE FALSE FALSE 0.5413099.mo Iossifov et al TRUE FALSE FALSE 0.5813099.p1 Iossifov et al TRUE FALSE FALSE 0.5013099.s1 Iossifov et al TRUE FALSE FALSE 0.4413101.fa Iossifov et al FALSE FALSE FALSE 0.5613101.mo Iossifov et al FALSE FALSE FALSE 0.5613101.p1 Iossifov et al FALSE FALSE FALSE 0.3613101.s1 Iossifov et al FALSE FALSE FALSE 0.6113104.fa Iossifov et al TRUE TRUE FALSE 0.5113104.mo Iossifov et al TRUE TRUE FALSE 0.4813104.p1 Iossifov et al TRUE TRUE FALSE 0.4013104.s1 Iossifov et al TRUE TRUE FALSE 0.5413116.fa O?Roak et al TRUE FALSE FALSE 0.5013116.mo O?Roak et al TRUE FALSE FALSE 0.3913116.p1 O?Roak et al TRUE FALSE FALSE 0.4913116.s1 O?Roak et al TRUE FALSE FALSE 0.3013120.fa Iossifov et al TRUE TRUE FALSE 0.4913120.mo Iossifov et al TRUE TRUE FALSE 0.4713120.p1 Iossifov et al TRUE TRUE FALSE 0.5313120.s1 Iossifov et al TRUE TRUE FALSE 0.5613125.fa Iossifov et al TRUE FALSE FALSE 0.5413125.mo Iossifov et al TRUE FALSE FALSE 0.4413125.p1 Iossifov et al TRUE FALSE FALSE 0.5213125.s1 Iossifov et al TRUE FALSE FALSE 0.4013129.fa Iossifov et al TRUE FALSE FALSE 0.5813129.mo Iossifov et al TRUE FALSE FALSE 0.5213129.p1 Iossifov et al TRUE FALSE FALSE 0.3913129.s1 Iossifov et al TRUE FALSE FALSE 0.4813131.fa Iossifov et al TRUE FALSE FALSE 0.4813131.mo Iossifov et al TRUE FALSE FALSE 0.4613131.p1 Iossifov et al TRUE FALSE FALSE 0.4613131.s1 Iossifov et al TRUE FALSE FALSE 0.4713139.fa Iossifov et al TRUE FALSE FALSE 0.45
13139.mo Iossifov et al TRUE FALSE FALSE 0.4413139.p1 Iossifov et al TRUE FALSE FALSE 0.4313139.s1 Iossifov et al TRUE FALSE FALSE 0.4613144.fa Iossifov et al TRUE TRUE FALSE 0.4713144.mo Iossifov et al TRUE TRUE FALSE 0.4613144.p1 Iossifov et al TRUE TRUE FALSE 0.4513144.s1 Iossifov et al TRUE TRUE FALSE 0.4713146.fa Iossifov et al TRUE TRUE FALSE 0.4613146.mo Iossifov et al TRUE TRUE FALSE 0.4713146.p1 Iossifov et al TRUE TRUE FALSE 0.4613146.s1 Iossifov et al TRUE FALSE FALSE 0.4713148.fa Iossifov et al TRUE FALSE FALSE 0.4413148.mo Iossifov et al TRUE FALSE FALSE 0.4513148.p1 Iossifov et al TRUE FALSE FALSE 0.4613148.s1 Iossifov et al TRUE FALSE FALSE 0.4213152.fa Iossifov et al TRUE TRUE FALSE 0.5113152.mo Iossifov et al TRUE TRUE FALSE 0.5013152.p1 Iossifov et al TRUE TRUE FALSE 0.5213152.s1 Iossifov et al TRUE FALSE FALSE 0.5113153.fa Iossifov et al TRUE TRUE FALSE 0.4413153.mo Iossifov et al TRUE TRUE FALSE 0.4313153.p1 Iossifov et al TRUE TRUE FALSE 0.4213153.s1 Iossifov et al TRUE TRUE FALSE 0.4213154.fa Sanders et al TRUE FALSE FALSE 0.2313154.mo Sanders et al TRUE FALSE FALSE 0.2313154.p1 Sanders et al TRUE FALSE FALSE 0.2213154.s1 Sanders et al TRUE FALSE FALSE 0.2313159.fa Iossifov et al TRUE TRUE FALSE 0.4313159.mo Iossifov et al TRUE TRUE FALSE 0.4413159.p1 Iossifov et al TRUE TRUE FALSE 0.4313159.s1 Iossifov et al TRUE TRUE FALSE 0.4313162.fa Iossifov et al TRUE TRUE FALSE 0.4413162.mo Iossifov et al TRUE TRUE FALSE 0.4213162.p1 Iossifov et al TRUE TRUE FALSE 0.4213162.s1 Iossifov et al TRUE TRUE TRUE 0.4313165.fa Iossifov et al TRUE FALSE FALSE 0.4313165.mo Iossifov et al TRUE FALSE FALSE 0.4413165.p1 Iossifov et al TRUE FALSE FALSE 0.4313165.s1 Iossifov et al TRUE FALSE FALSE 0.4613166.fa Iossifov et al TRUE TRUE FALSE 0.4313166.mo Iossifov et al TRUE TRUE FALSE 0.4213166.p1 Iossifov et al TRUE TRUE FALSE 0.4213166.s1 Iossifov et al TRUE TRUE FALSE 0.4213168.fa Iossifov et al TRUE TRUE FALSE 0.4813168.mo Iossifov et al TRUE TRUE FALSE 0.4613168.p1 Iossifov et al TRUE TRUE FALSE 0.4813168.s1 Iossifov et al TRUE TRUE FALSE 0.4913169.fa O?Roak et al TRUE TRUE FALSE 0.3713169.mo O?Roak et al TRUE TRUE FALSE 0.4013169.p1 O?Roak et al TRUE TRUE FALSE 0.4113169.s1 O?Roak et al TRUE TRUE FALSE 0.3513171.fa Sanders et al TRUE TRUE FALSE 0.3713171.mo Sanders et al TRUE TRUE FALSE 0.3713171.p1 Sanders et al TRUE TRUE FALSE 0.4513171.s1 Sanders et al TRUE TRUE FALSE 0.3513174.fa Iossifov et al TRUE TRUE FALSE 0.5013174.mo Iossifov et al TRUE TRUE FALSE 0.5013174.p1 Iossifov et al TRUE TRUE FALSE 0.5013174.s1 Iossifov et al TRUE TRUE FALSE 0.5113176.fa Iossifov et al TRUE FALSE FALSE 0.4513176.mo Iossifov et al TRUE FALSE FALSE 0.4313176.p1 Iossifov et al TRUE FALSE FALSE 0.4513176.s1 Iossifov et al TRUE FALSE FALSE 0.4513183.fa Iossifov et al TRUE TRUE FALSE 0.5413183.mo Iossifov et al TRUE TRUE FALSE 0.5313183.p1 Iossifov et al TRUE TRUE FALSE 0.5113183.s1 Iossifov et al TRUE TRUE FALSE 0.5113187.fa Iossifov et al TRUE TRUE FALSE 0.4713187.mo Iossifov et al TRUE TRUE FALSE 0.4713187.p1 Iossifov et al TRUE TRUE FALSE 0.4813187.s1 Iossifov et al TRUE TRUE FALSE 0.5113188.fa O?Roak et al TRUE FALSE FALSE 0.3113188.mo O?Roak et al TRUE FALSE FALSE 0.4113188.p1 O?Roak et al TRUE FALSE FALSE 0.2913188.s1 O?Roak et al TRUE FALSE FALSE 0.3913193.fa Iossifov et al TRUE TRUE FALSE 0.4213193.mo Iossifov et al TRUE TRUE FALSE 0.4913193.p1 Iossifov et al TRUE TRUE FALSE 0.5313193.s1 Iossifov et al TRUE TRUE FALSE 0.4713195.fa Sanders et al TRUE TRUE FALSE 0.4013195.mo Sanders et al TRUE TRUE FALSE 0.3613195.p1 Sanders et al TRUE TRUE FALSE 0.3313195.s1 Sanders et al TRUE TRUE FALSE 0.4213196.fa Iossifov et al TRUE TRUE FALSE 0.4613196.mo Iossifov et al TRUE TRUE FALSE 0.4713196.p1 Iossifov et al TRUE TRUE FALSE 0.4313196.s1 Iossifov et al TRUE TRUE FALSE 0.5213197.fa Iossifov et al TRUE FALSE FALSE 0.4713197.mo Iossifov et al TRUE FALSE FALSE 0.47
13197.p1 Iossifov et al TRUE FALSE FALSE 0.4913197.s1 Iossifov et al TRUE FALSE FALSE 0.4913215.fa Iossifov et al TRUE FALSE FALSE 0.5713215.mo Iossifov et al TRUE FALSE FALSE 0.5313215.p1 Iossifov et al TRUE FALSE FALSE 0.4613215.s1 Iossifov et al TRUE FALSE FALSE 0.5713216.fa Iossifov et al TRUE TRUE FALSE 0.2713216.mo Iossifov et al TRUE TRUE FALSE 0.3413216.p1 Iossifov et al TRUE TRUE FALSE 0.2713216.s1 Iossifov et al TRUE TRUE FALSE 0.2713218.fa Iossifov et al TRUE TRUE FALSE 0.4913218.mo Iossifov et al TRUE TRUE FALSE 0.4713218.p1 Iossifov et al TRUE TRUE FALSE 0.4113218.s1 Iossifov et al TRUE FALSE FALSE 0.5213227.fa Iossifov et al TRUE FALSE FALSE 0.3813227.mo Iossifov et al TRUE FALSE FALSE 0.4213227.p1 Iossifov et al TRUE FALSE FALSE 0.3713227.s1 Iossifov et al TRUE FALSE FALSE 0.4013239.fa Iossifov et al TRUE FALSE FALSE 0.3613239.mo Iossifov et al TRUE FALSE FALSE 0.4013239.p1 Iossifov et al TRUE FALSE FALSE 0.3413239.s1 Iossifov et al TRUE FALSE FALSE 0.3913258.fa Iossifov et al TRUE FALSE FALSE 0.4213258.mo Iossifov et al TRUE FALSE FALSE 0.4213258.p1 Iossifov et al TRUE FALSE FALSE 0.4213258.s1 Iossifov et al TRUE FALSE FALSE 0.4013263.fa Iossifov et al TRUE FALSE FALSE 0.4713263.mo Iossifov et al TRUE FALSE FALSE 0.4813263.p1 Iossifov et al TRUE FALSE FALSE 0.4613263.s1 Iossifov et al TRUE FALSE FALSE 0.4813266.fa Iossifov et al TRUE TRUE FALSE 0.4613266.mo Iossifov et al TRUE TRUE FALSE 0.5213266.p1 Iossifov et al TRUE TRUE FALSE 0.4813266.s1 Iossifov et al TRUE TRUE FALSE 0.4713269.fa Iossifov et al TRUE FALSE FALSE 0.5013269.mo Iossifov et al TRUE FALSE FALSE 0.4513269.p1 Iossifov et al TRUE FALSE FALSE 0.4913269.s1 Iossifov et al TRUE FALSE FALSE 0.5013271.fa Sanders et al TRUE FALSE FALSE 0.3013271.mo Sanders et al TRUE FALSE FALSE 0.3113271.p1 Sanders et al TRUE FALSE FALSE 0.3113271.s1 Sanders et al TRUE FALSE FALSE 0.2713293.fa Iossifov et al TRUE FALSE FALSE 0.4713293.mo Iossifov et al TRUE FALSE FALSE 0.5113293.p1 Iossifov et al TRUE FALSE FALSE 0.5013293.s1 Iossifov et al TRUE FALSE FALSE 0.5613296.fa Iossifov et al FALSE TRUE FALSE 0.4713296.mo Iossifov et al FALSE TRUE FALSE 0.4713296.p1 Iossifov et al FALSE TRUE FALSE 0.4613296.s1 Iossifov et al FALSE TRUE TRUE 0.4713307.fa Iossifov et al TRUE FALSE FALSE 0.3313307.mo Iossifov et al TRUE FALSE FALSE 0.3313307.p1 Iossifov et al TRUE FALSE FALSE 0.3413307.s1 Iossifov et al TRUE FALSE FALSE 0.3413309.fa Iossifov et al TRUE FALSE FALSE 0.4413309.mo Iossifov et al TRUE FALSE FALSE 0.4413309.p1 Iossifov et al TRUE FALSE FALSE 0.4513309.s1 Iossifov et al TRUE FALSE FALSE 0.4313312.fa Iossifov et al TRUE FALSE FALSE 0.5313312.mo Iossifov et al TRUE FALSE FALSE 0.5013312.p1 Iossifov et al TRUE FALSE FALSE 0.4513312.s1 Iossifov et al TRUE FALSE FALSE 0.5013315.fa Iossifov et al TRUE FALSE FALSE 0.3313315.mo Iossifov et al TRUE FALSE FALSE 0.3113315.p1 Iossifov et al TRUE FALSE FALSE 0.3513315.s1 Iossifov et al TRUE FALSE FALSE 0.3213322.fa Sanders et al TRUE TRUE FALSE 0.2713322.mo Sanders et al TRUE TRUE FALSE 0.3013322.p1 Sanders et al TRUE TRUE FALSE 0.3313322.s1 Sanders et al TRUE TRUE FALSE 0.2613327.fa Iossifov et al TRUE TRUE FALSE 0.3013327.mo Iossifov et al TRUE TRUE FALSE 0.3413327.p1 Iossifov et al TRUE TRUE FALSE 0.5013327.s1 Iossifov et al TRUE TRUE TRUE 0.4513328.fa Iossifov et al TRUE FALSE FALSE 0.3713328.mo Iossifov et al TRUE FALSE FALSE 0.3813328.p1 Iossifov et al TRUE FALSE FALSE 0.4013328.s1 Iossifov et al TRUE FALSE FALSE 0.4913330.fa Iossifov et al TRUE FALSE FALSE 0.5013330.mo Iossifov et al TRUE FALSE FALSE 0.3913330.p1 Iossifov et al TRUE FALSE FALSE 0.5513330.s1 Iossifov et al TRUE FALSE FALSE 0.5613335.fa O?Roak et al TRUE FALSE FALSE 0.4813335.mo O?Roak et al TRUE FALSE FALSE 0.4313335.p1 O?Roak et al TRUE FALSE FALSE 0.5713335.s1 O?Roak et al TRUE FALSE FALSE 0.4713338.fa Iossifov et al TRUE FALSE FALSE 0.4113338.mo Iossifov et al TRUE FALSE FALSE 0.4513338.p1 Iossifov et al TRUE FALSE FALSE 0.54
13338.s1 Iossifov et al TRUE FALSE FALSE 0.5113346.fa O?Roak et al TRUE FALSE FALSE 0.3813346.mo O?Roak et al TRUE FALSE FALSE 0.3713346.p1 O?Roak et al TRUE FALSE FALSE 0.3813346.s1 O?Roak et al TRUE FALSE FALSE 0.3413349.fa Iossifov et al TRUE FALSE FALSE 0.5013349.mo Iossifov et al TRUE FALSE FALSE 0.4013349.p1 Iossifov et al TRUE FALSE FALSE 0.3813349.s1 Iossifov et al TRUE FALSE FALSE 0.3613355.fa Sanders et al TRUE FALSE FALSE 0.2813355.mo Sanders et al TRUE FALSE FALSE 0.2813355.p1 Sanders et al TRUE FALSE FALSE 0.2913355.s1 Sanders et al TRUE FALSE FALSE 0.2613366.fa Iossifov et al TRUE FALSE FALSE 0.4813366.mo Iossifov et al TRUE FALSE FALSE 0.4313366.p1 Iossifov et al TRUE FALSE FALSE 0.3713366.s1 Iossifov et al TRUE FALSE FALSE 0.5213374.fa Sanders et al TRUE FALSE FALSE 0.3313374.mo Sanders et al TRUE FALSE FALSE 0.3813374.p1 Sanders et al TRUE FALSE FALSE 0.3813374.s1 Sanders et al TRUE FALSE FALSE 0.3413385.fa Sanders et al TRUE FALSE FALSE 0.2513385.mo Sanders et al TRUE FALSE FALSE 0.2413385.p1 Sanders et al TRUE FALSE FALSE 0.2213385.s1 Sanders et al TRUE FALSE FALSE 0.2413387.fa Iossifov et al TRUE FALSE FALSE 0.4513387.mo Iossifov et al TRUE FALSE FALSE 0.4613387.p1 Iossifov et al TRUE FALSE FALSE 0.5413387.s1 Iossifov et al TRUE FALSE FALSE 0.5113393.fa Sanders et al TRUE FALSE FALSE 0.2813393.mo Sanders et al TRUE FALSE FALSE 0.3113393.p1 Sanders et al TRUE FALSE FALSE 0.3013393.s1 Sanders et al TRUE FALSE FALSE 0.2713396.fa Iossifov et al TRUE FALSE FALSE 0.4613396.mo Iossifov et al TRUE FALSE FALSE 0.4513396.p1 Iossifov et al TRUE FALSE FALSE 0.4613396.s1 Iossifov et al TRUE FALSE FALSE 0.4513398.fa Iossifov et al TRUE FALSE FALSE 0.4213398.mo Iossifov et al TRUE FALSE FALSE 0.4413398.p1 Iossifov et al TRUE FALSE FALSE 0.4413398.s1 Iossifov et al TRUE FALSE FALSE 0.4313412.fa Iossifov et al TRUE FALSE FALSE 0.4813412.mo Iossifov et al TRUE FALSE FALSE 0.4913412.p1 Iossifov et al TRUE FALSE FALSE 0.4713412.s1 Iossifov et al TRUE FALSE FALSE 0.4413418.fa Iossifov et al TRUE FALSE FALSE 0.4513418.mo Iossifov et al TRUE FALSE FALSE 0.4313418.p1 Iossifov et al TRUE FALSE FALSE 0.4513418.s1 Iossifov et al TRUE FALSE FALSE 0.4313424.fa Iossifov et al TRUE FALSE FALSE 0.4413424.mo Iossifov et al TRUE FALSE FALSE 0.4213424.p1 Iossifov et al TRUE FALSE FALSE 0.4113424.s1 Iossifov et al TRUE FALSE FALSE 0.4413439.fa Iossifov et al TRUE FALSE FALSE 0.4313439.mo Iossifov et al TRUE FALSE FALSE 0.4513439.p1 Iossifov et al TRUE FALSE FALSE 0.4513439.s1 Iossifov et al TRUE FALSE FALSE 0.4413443.fa Iossifov et al TRUE FALSE FALSE 0.4413443.mo Iossifov et al TRUE FALSE FALSE 0.4413443.p1 Iossifov et al TRUE FALSE FALSE 0.4513443.s1 Iossifov et al TRUE FALSE FALSE 0.4513444.fa Iossifov et al TRUE FALSE FALSE 0.3413444.mo Iossifov et al TRUE FALSE FALSE 0.3613444.p1 Iossifov et al TRUE FALSE FALSE 0.3513444.s1 Iossifov et al TRUE FALSE FALSE 0.3613447.fa O?Roak et al TRUE FALSE FALSE 0.3813447.mo O?Roak et al TRUE FALSE FALSE 0.3413447.p1 O?Roak et al TRUE FALSE FALSE 0.3813447.s1 O?Roak et al TRUE FALSE FALSE 0.3213462.fa Iossifov et al TRUE FALSE FALSE 0.3413462.mo Iossifov et al TRUE FALSE FALSE 0.3613462.p1 Iossifov et al TRUE FALSE FALSE 0.3513462.s1 Iossifov et al TRUE FALSE FALSE 0.3613465.fa Iossifov et al TRUE FALSE FALSE 0.4713465.mo Iossifov et al TRUE FALSE FALSE 0.4413465.p1 Iossifov et al TRUE FALSE FALSE 0.4413465.s1 Iossifov et al TRUE FALSE FALSE 0.4213486.fa Iossifov et al TRUE FALSE FALSE 0.4713486.mo Iossifov et al TRUE FALSE FALSE 0.5113486.p1 Iossifov et al TRUE FALSE FALSE 0.5013486.s1 Iossifov et al TRUE FALSE FALSE 0.5313487.fa Iossifov et al TRUE FALSE FALSE 0.3813487.mo Iossifov et al TRUE FALSE FALSE 0.4013487.p1 Iossifov et al TRUE FALSE FALSE 0.3913487.s1 Iossifov et al TRUE FALSE FALSE 0.4013493.fa Iossifov et al TRUE FALSE FALSE 0.4213493.mo Iossifov et al TRUE FALSE FALSE 0.3913493.p1 Iossifov et al TRUE FALSE FALSE 0.3813493.s1 Iossifov et al TRUE FALSE FALSE 0.38
13496.fa Iossifov et al TRUE FALSE FALSE 0.4113496.mo Iossifov et al TRUE FALSE FALSE 0.4013496.p1 Iossifov et al TRUE FALSE FALSE 0.4113496.s1 Iossifov et al TRUE FALSE FALSE 0.4313502.fa Iossifov et al TRUE FALSE FALSE 0.3413502.mo Iossifov et al TRUE FALSE FALSE 0.3313502.p1 Iossifov et al TRUE FALSE FALSE 0.3413502.s1 Iossifov et al TRUE FALSE FALSE 0.3313504.fa Iossifov et al TRUE FALSE FALSE 0.3813504.mo Iossifov et al TRUE FALSE FALSE 0.3713504.p1 Iossifov et al TRUE FALSE FALSE 0.3513504.s1 Iossifov et al TRUE FALSE FALSE 0.3713505.fa Iossifov et al TRUE FALSE FALSE 0.3513505.mo Iossifov et al TRUE FALSE FALSE 0.3313505.p1 Iossifov et al TRUE FALSE FALSE 0.3313505.s1 Iossifov et al TRUE FALSE FALSE 0.3413507.fa Iossifov et al TRUE FALSE FALSE 0.3513507.mo Iossifov et al TRUE FALSE FALSE 0.3613507.p1 Iossifov et al TRUE FALSE FALSE 0.3513507.s1 Iossifov et al TRUE FALSE FALSE 0.3613508.fa Iossifov et al TRUE FALSE FALSE 0.3413508.mo Iossifov et al TRUE FALSE FALSE 0.3913508.p1 Iossifov et al TRUE FALSE FALSE 0.3813508.s1 Iossifov et al TRUE FALSE FALSE 0.3413509.fa Sanders et al TRUE FALSE FALSE 0.3413509.mo Sanders et al TRUE FALSE FALSE 0.3413509.p1 Sanders et al TRUE FALSE FALSE 0.3313509.s1 Sanders et al TRUE FALSE FALSE 0.3513512.fa Iossifov et al TRUE FALSE FALSE 0.3813512.mo Iossifov et al TRUE FALSE FALSE 0.3913512.p1 Iossifov et al TRUE FALSE FALSE 0.3913512.s1 Iossifov et al TRUE FALSE FALSE 0.3913513.fa Iossifov et al TRUE FALSE FALSE 0.4513513.mo Iossifov et al TRUE FALSE FALSE 0.4413513.p1 Iossifov et al TRUE FALSE FALSE 0.4613513.s1 Iossifov et al TRUE FALSE FALSE 0.4313533.fa O?Roak et al TRUE FALSE FALSE 0.4813533.mo O?Roak et al TRUE FALSE FALSE 0.4213533.p1 O?Roak et al TRUE FALSE FALSE 0.4213533.s1 O?Roak et al TRUE FALSE FALSE 0.3413543.fa Sanders et al TRUE FALSE FALSE 0.3113543.mo Sanders et al TRUE FALSE FALSE 0.2713543.p1 Sanders et al TRUE FALSE FALSE 0.2913543.s1 Sanders et al TRUE FALSE FALSE 0.3013589.fa Iossifov et al TRUE FALSE FALSE 0.4013589.mo Iossifov et al TRUE FALSE FALSE 0.3713589.p1 Iossifov et al TRUE FALSE FALSE 0.4213589.s1 Iossifov et al TRUE FALSE FALSE 0.4013590.fa Iossifov et al TRUE FALSE FALSE 0.4413590.mo Iossifov et al TRUE FALSE FALSE 0.4313590.p1 Iossifov et al TRUE FALSE FALSE 0.4113590.s1 Iossifov et al TRUE FALSE FALSE 0.4113593.fa O?Roak et al TRUE FALSE FALSE 0.5713593.mo O?Roak et al TRUE FALSE FALSE 0.4913593.p1 O?Roak et al TRUE FALSE FALSE 0.4413593.s1 O?Roak et al TRUE FALSE FALSE 0.3113599.fa Iossifov et al TRUE FALSE FALSE 0.4713599.mo Iossifov et al TRUE FALSE FALSE 0.4813599.p1 Iossifov et al TRUE FALSE FALSE 0.4013599.s1 Iossifov et al TRUE FALSE FALSE 0.4613601.fa Iossifov et al TRUE FALSE FALSE 0.4213601.mo Iossifov et al TRUE FALSE FALSE 0.4013601.p1 Iossifov et al TRUE FALSE FALSE 0.4413601.s1 Iossifov et al TRUE FALSE FALSE 0.4313606.fa O?Roak et al TRUE FALSE FALSE 0.3313606.mo O?Roak et al TRUE FALSE FALSE 0.3713606.p1 O?Roak et al TRUE FALSE FALSE 0.3613606.s1 O?Roak et al TRUE FALSE FALSE 0.3213608.fa Sanders et al TRUE FALSE FALSE 0.2513608.mo Sanders et al TRUE FALSE FALSE 0.2413608.p1 Sanders et al TRUE FALSE FALSE 0.2413608.s1 Sanders et al TRUE FALSE FALSE 0.2613618.fa Sanders et al TRUE FALSE FALSE 0.2713618.mo Sanders et al TRUE FALSE FALSE 0.2813618.p1 Sanders et al TRUE FALSE FALSE 0.3313618.s1 Sanders et al TRUE FALSE FALSE 0.2813621.fa Sanders et al TRUE FALSE FALSE 0.3313621.mo Sanders et al TRUE FALSE FALSE 0.4113621.p1 Sanders et al TRUE FALSE FALSE 0.3213621.s1 Sanders et al TRUE FALSE FALSE 0.3013625.fa Sanders et al TRUE FALSE FALSE 0.2913625.mo Sanders et al TRUE FALSE FALSE 0.2913625.p1 Sanders et al TRUE FALSE FALSE 0.2913625.s1 Sanders et al TRUE FALSE FALSE 0.2913629.fa O?Roak et al TRUE FALSE FALSE 0.3313629.mo O?Roak et al TRUE FALSE FALSE 0.3713629.p1 O?Roak et al TRUE FALSE FALSE 0.3813629.s1 O?Roak et al TRUE FALSE FALSE 0.3213660.fa Sanders et al TRUE FALSE FALSE 0.29
13660.mo Sanders et al TRUE FALSE FALSE 0.2913660.p1 Sanders et al TRUE FALSE FALSE 0.3213660.s1 Sanders et al TRUE FALSE FALSE 0.3313684.fa Iossifov et al TRUE FALSE FALSE 0.4413684.mo Iossifov et al TRUE FALSE FALSE 0.4713684.p1 Iossifov et al TRUE FALSE FALSE 0.4513684.s1 Iossifov et al TRUE FALSE FALSE 0.4513689.fa Iossifov et al TRUE FALSE FALSE 0.4313689.mo Iossifov et al TRUE FALSE FALSE 0.4413689.p1 Iossifov et al TRUE FALSE FALSE 0.4213689.s1 Iossifov et al TRUE FALSE FALSE 0.4413695.fa Iossifov et al FALSE FALSE FALSE 0.4813695.mo Iossifov et al FALSE FALSE FALSE 0.4613695.p1 Iossifov et al FALSE FALSE FALSE 0.4613695.s1 Iossifov et al FALSE FALSE FALSE 0.4513698.fa Iossifov et al TRUE FALSE FALSE 0.4013698.mo Iossifov et al TRUE FALSE FALSE 0.4513698.p1 Iossifov et al TRUE FALSE FALSE 0.4913698.s1 Iossifov et al TRUE FALSE FALSE 0.4213726.fa O?Roak et al TRUE FALSE FALSE 0.5313726.mo O?Roak et al TRUE FALSE FALSE 0.5013726.p1 O?Roak et al TRUE FALSE FALSE 0.4113726.s1 O?Roak et al TRUE FALSE FALSE 0.3213730.fa Sanders et al TRUE FALSE FALSE 0.2813730.mo Sanders et al TRUE FALSE FALSE 0.2813730.p1 Sanders et al TRUE FALSE FALSE 0.3313730.s1 Sanders et al TRUE FALSE FALSE 0.2913739.fa Sanders et al TRUE FALSE FALSE 0.3013739.mo Sanders et al TRUE FALSE FALSE 0.2913739.p1 Sanders et al TRUE FALSE FALSE 0.2813739.s1 Sanders et al TRUE FALSE FALSE 0.2913752.fa Sanders et al TRUE FALSE FALSE 0.2413752.mo Sanders et al TRUE FALSE FALSE 0.2413752.p1 Sanders et al TRUE FALSE FALSE 0.2413752.s1 Sanders et al TRUE FALSE FALSE 0.2413774.fa Sanders et al TRUE FALSE FALSE 0.2913774.mo Sanders et al TRUE FALSE FALSE 0.3213774.p1 Sanders et al TRUE FALSE FALSE 0.2713774.s1 Sanders et al TRUE FALSE FALSE 0.3013793.fa O?Roak et al TRUE FALSE FALSE 0.4913793.mo O?Roak et al TRUE FALSE FALSE 0.5213793.p1 O?Roak et al TRUE FALSE FALSE 0.3913793.s1 O?Roak et al TRUE FALSE FALSE 0.3613795.fa Sanders et al TRUE FALSE FALSE 0.2813795.mo Sanders et al TRUE FALSE FALSE 0.2913795.p1 Sanders et al TRUE FALSE FALSE 0.2713795.s1 Sanders et al TRUE FALSE FALSE 0.3013798.fa O?Roak et al TRUE FALSE FALSE 0.4113798.mo O?Roak et al TRUE FALSE FALSE 0.3713798.p1 O?Roak et al TRUE FALSE FALSE 0.3613798.s1 O?Roak et al TRUE FALSE FALSE 0.3913808.fa Sanders et al TRUE FALSE FALSE 0.2113808.mo Sanders et al TRUE FALSE FALSE 0.2113808.p1 Sanders et al TRUE FALSE FALSE 0.2013808.s1 Sanders et al TRUE FALSE FALSE 0.2513809.fa Sanders et al TRUE FALSE FALSE 0.2713809.mo Sanders et al TRUE FALSE FALSE 0.2713809.p1 Sanders et al TRUE FALSE FALSE 0.2713809.s1 Sanders et al TRUE FALSE FALSE 0.2713815.fa O?Roak et al TRUE FALSE FALSE 0.5413815.mo O?Roak et al TRUE FALSE FALSE 0.4013815.p1 O?Roak et al TRUE FALSE FALSE 0.3713815.s1 O?Roak et al TRUE FALSE FALSE 0.4813821.fa Sanders et al TRUE FALSE FALSE 0.2813821.mo Sanders et al TRUE FALSE FALSE 0.2313821.p1 Sanders et al TRUE FALSE FALSE 0.2613821.s1 Sanders et al TRUE FALSE FALSE 0.2613825.fa Sanders et al TRUE FALSE FALSE 0.2913825.mo Sanders et al TRUE FALSE FALSE 0.2913825.p1 Sanders et al TRUE FALSE FALSE 0.3113825.s1 Sanders et al TRUE FALSE FALSE 0.3113832.fa Sanders et al TRUE FALSE FALSE 0.2813832.mo Sanders et al TRUE FALSE FALSE 0.2913832.p1 Sanders et al TRUE FALSE FALSE 0.2713832.s1 Sanders et al TRUE FALSE FALSE 0.2813835.fa O?Roak et al FALSE FALSE FALSE 0.3313835.mo O?Roak et al FALSE FALSE FALSE 0.3713835.p1 O?Roak et al FALSE FALSE FALSE 0.3613835.s1 O?Roak et al FALSE FALSE FALSE 0.3813840.fa Sanders et al TRUE FALSE FALSE 0.2513840.mo Sanders et al TRUE FALSE FALSE 0.2713840.p1 Sanders et al TRUE FALSE FALSE 0.2613840.s1 Sanders et al TRUE FALSE FALSE 0.2413843.fa Sanders et al TRUE FALSE FALSE 0.2213843.mo Sanders et al TRUE FALSE FALSE 0.2413843.p1 Sanders et al TRUE FALSE FALSE 0.2413843.s1 Sanders et al TRUE FALSE FALSE 0.2413876.fa Sanders et al TRUE FALSE FALSE 0.2113876.mo Sanders et al TRUE FALSE FALSE 0.24
13876.p1 Sanders et al TRUE FALSE FALSE 0.2413876.s1 Sanders et al TRUE FALSE FALSE 0.3013887.fa Sanders et al TRUE FALSE FALSE 0.2913887.mo Sanders et al TRUE FALSE FALSE 0.2713887.p1 Sanders et al TRUE FALSE FALSE 0.2913887.s1 Sanders et al TRUE FALSE FALSE 0.3013890.fa O?Roak et al TRUE FALSE FALSE 0.3413890.mo O?Roak et al TRUE FALSE FALSE 0.3413890.p1 O?Roak et al TRUE FALSE FALSE 0.4513890.s1 O?Roak et al TRUE FALSE FALSE 0.3713912.fa Sanders et al TRUE FALSE FALSE 0.2413912.mo Sanders et al TRUE FALSE FALSE 0.2413912.p1 Sanders et al TRUE FALSE FALSE 0.2713912.s1 Sanders et al TRUE FALSE FALSE 0.2313922.fa Sanders et al TRUE FALSE FALSE 0.2413922.mo Sanders et al TRUE FALSE FALSE 0.2513922.p1 Sanders et al TRUE FALSE FALSE 0.2613922.s1 Sanders et al TRUE FALSE FALSE 0.2613926.fa O?Roak et al TRUE FALSE FALSE 0.3713926.mo O?Roak et al TRUE FALSE FALSE 0.3813926.p1 O?Roak et al TRUE FALSE FALSE 0.3713926.s1 O?Roak et al TRUE FALSE FALSE 0.3813992.fa Sanders et al TRUE FALSE FALSE 0.2313992.mo Sanders et al TRUE FALSE FALSE 0.2713992.p1 Sanders et al TRUE FALSE FALSE 0.2413992.s1 Sanders et al TRUE FALSE FALSE 0.2614009.fa Sanders et al TRUE FALSE FALSE 0.2714009.mo Sanders et al TRUE FALSE FALSE 0.2314009.p1 Sanders et al TRUE FALSE FALSE 0.2514009.s1 Sanders et al TRUE FALSE FALSE 0.2414011.fa O?Roak et al FALSE FALSE FALSE 0.4014011.mo O?Roak et al FALSE FALSE FALSE 0.3514011.p1 O?Roak et al FALSE FALSE FALSE 0.3014011.s1 O?Roak et al FALSE FALSE FALSE 0.3314110.fa Sanders et al TRUE FALSE FALSE 0.2914110.mo Sanders et al TRUE FALSE FALSE 0.3014110.p1 Sanders et al TRUE FALSE FALSE 0.3114110.s1 Sanders et al TRUE FALSE FALSE 0.3614167.fa Sanders et al TRUE FALSE FALSE 0.3014167.mo Sanders et al TRUE FALSE FALSE 0.3114167.p1 Sanders et al TRUE FALSE FALSE 0.3014167.s1 Sanders et al TRUE FALSE FALSE 0.3014201.fa O?Roak et al FALSE FALSE FALSE 0.3114201.mo O?Roak et al FALSE FALSE FALSE 0.3314201.p1 O?Roak et al FALSE FALSE FALSE 0.3314201.s1 O?Roak et al FALSE FALSE FALSE 0.37
Study Quads (before QC) CoNIFER Standard Deviation Accession CodesMedian Minimum AverageO?Roak et al. (2012) 70 (70) 137,828,338 54,895,318 0.39 dbGAP: phs000482.v1.p1 or NDAR: NDARCOL0001878Iossifov et al. (2012) 165 (165) 111,781,453 51,482,969 0.47 NDAR: NDARCOL0001936Sanders et al. (2012) 176 (177) 160,294,934 48,431,443 0.29 SRA: SRP010920.1All probands 411 (412) 142,557,586 48,431,443 0.38All siblings 411 (412) 138,779,930 51,482,969 0.38All mothers 411 (412) 134,381,120 51,504,433 0.38  All fathers 411 (412) 140,013,304 51,366,451 0.38Discordant SRS 276 134,837,454 51,482,969 0.38Concordant SRS 115 146,397,189 48,431,443 0.37All 411 (412) 138,205,341 48,431,443 0.38
Reads (36mers) mapped with mrsFAST to exome targets (per sample)
cnvrID
callID
familyID
relChrom
osomeSt
art (hg19)
Stop (hg19)
length (bp)
length (exon
s)state
Frequency i
n 411 families
ESP Frequency
Inheritance
Genes
SRS discordant
Previously seen genes
de novo SN
V summary
 in 
sample
16
5636
13698p1
12106
66222
34542
127880
13dup
1? 0.1
%mo_to_bot
hC1orf86
, PRKCZ, SKI
FALSEPR
KCZ
16
8285
13698s1
12116
02121
18961
2940
5dup
1? 0.1
%mo_to_bot
hPRKCZ
FALSEPR
KCZ
19
191
1571p1
12529
64525
38508
8863
7del
1? 0.1
%mo_to_p1
MMEL1
TRUE
21
1749
12383p1
13413
21834
17328
4110
8dup
3? 0.5
%fa_to_p1
MEGF6
FALSE
20
1548
11336p1
13424
35834
32091
7733
9del
2? 0.1
%mo_to_p1
MEGF6
TRUE
SLC26A5 [mis
sense]
34
105
13798p1
19063
35891
71548
108190
27dup
1? 0.1
%fa_to_p1
SLC2A7, SLC
2A5, GPR157
FALSE
39
5495
12618p1
111134
287111
55938
21651
16del
1? 0.1
%mo_to_bot
hEXOSC1
0
TRUE
39
5496
12618s1
111134
287111
55938
21651
16del
1? 0.1
%mo_to_bot
hEXOSC1
0
TRUE
45
5483
12498s1
112020
702120
30873
10171
8dup
1? 0.1
%mo_to_bot
hPLOD1
TRUEPL
OD1
GPR82 [misse
nse]
45
8286
12498p1
112020
702120
27148
6446
7dup
1? 0.1
%mo_to_bot
hPLOD1
TRUEPL
OD1
CPA4 [missen
se]
87
1822
13730p1
132084
793321
46668
61875
46dup
1? 0.1
%mo_to_p1
HCRTR1, CO
L16A1, PEF1
FALSE
DICER1 [miss
ense]
86
7578
12647p1
132084
793321
10465
25672
12dup
2? 0.1
%mo_to_p1
HCRTR1, PEF
1
TRUE
SLC30A5 [mis
sense]
100
1561
11433s1
140204
572403
12969
108397
21dup
1? 0.1
%fa_to_both
PPIE, TRIT1, 
BMP8B
TRUE
100
1560
11433p1
140205
836403
13332
107496
21dup
1? 0.1
%fa_to_both
PPIE, TRIT1, 
BMP8B
TRUE
106
1581
11667s1
142693
553427
44343
50790
3dup
1? 0.1
%fa_to_both
FOXJ3
TRUE
CDH3 [missen
se]
112
1522
11203s1
147512
138475
15846
3708
4dup
7? 2
%mo_to_s1
CYP4X1
TRUE
112
5458
12394p1
147512
138475
15846
3708
4dup
7? 2
%fa_to_p1
CYP4X1
TRUE
112
5463
12396p1
147512
138475
15846
3708
4dup
7? 2
%mo_to_bot
hCYP4X1
TRUE
112
5464
12396s1
147512
138475
15846
3708
4dup
7? 2
%mo_to_bot
hCYP4X1
TRUE
ZNF518A [mis
sense]
112
5622
13590p1
147512
138475
15846
3708
4dup
7? 2
%fa_to_p1
CYP4X1
TRUE
MTHFS [fram
eshift], 
EFCAB5 [fram
eshift]
112
1831
13809p1
147512
138475
15846
3708
4dup
7? 2
%mo_to_p1
CYP4X1
TRUE
115
1508
11146p1
148865
022488
69554
4532
2del
1? 1
%mo_to_bot
hSPATA6
TRUE
115
1509
11146s1
148865
022488
69554
4532
2del
1? 1
%mo_to_bot
hSPATA6
TRUE
DIP2B [misse
nse]
123
881
3346s1
157340
621573
41882
1261
2del
1? 0.1
%mo_to_s1
C8A
FALSE
125
401
1872p1
165730
593658
31879
101286
4dup
1? 0.1
%fa_to_p1
DNAJC6
TRUE
CACNA1D [m
issense], 
KATNAL2 [sp
lice]
126
5555
13097p1
165849
875658
55310
5435
6dup
2? 2
%mo_to_bot
hDNAJC6
FALSE
126
5557
13097s1
165849
875658
55310
5435
6dup
2? 2
%mo_to_bot
hDNAJC6
FALSE
127
1499
11118p1
166837
995670
00051
162056
2dup
1? 0.1
%fa_to_p1
PDE4B, SGIP
1
TRUEPD
E4B
133
1792
12906p1
174663
962748
36076
172114
21dup
1? 0.1
%mo_to_p1
FPGT-TNNI3K
, FPGT
TRUE
SLCO1C1 [m
issense], A2M
 
[missense], M
YO7B 
[missense]
134
5487
12518s1
184944
964849
62053
17089
8dup
1? 0.1
%mo_to_s1
RPF1
FALSE
138
691
2810p1
187029
343870
38403
9060
6del
5? 2
%mo_to_p1
CLCA4
TRUE
138
5561
13144p1
187029
343870
40438
11095
7del
5? 2
%mo_to_p1
CLCA4
TRUE
138
5596
13387s1
187029
343870
40438
11095
7del
5? 2
%mo_to_s1
CLCA4
TRUE
141
7590
11000p1
192262
843925
73558
310715
37dup
1? 0.1
%fa_to_p1
TGFBR3, BRD
T, EPHX4, BT
BD8
FALSE
148
8296
13096s1
1104068
6921040
94402
25710
14dup
2? 0.1
%mo_to_s1
RNPC3
TRUE
157
251
1610s1
1111724
8081118
63088
138280
37dup
1? 0.1
%fa_to_s1
CHIA, CEPT1
, CHI3L2, DEN
ND2D
FALSE
160
8299
13296s1
1113245
1841132
64970
19786
13dup
1? 0.1
%fa_to_s1
RHOC, PPM1
J, FAM19A3
TRUE
165
5628
13601p1
1115316
8911153
23228
6337
5dup
1? 0.1
%fa_to_both
SIKE1
TRUE
AK1 [missens
e]
165
5629
13601s1
1115316
8911153
22836
5945
4dup
1? 0.1
%fa_to_both
SIKE1
TRUE
195
5571
13215s1
1145415
2781459
23443
508165
156del
1? 0.1
%fa_to_s1
RNF115, GPR
89A, RBM8A,
 PIAS3, CD16
0, HFE2, 
ANKRD34A, L
IX1L, POLR3G
L, ANKRD35,
 ITGA10, 
PEX11B, NUD
T17, TXNIP, G
PR89C, PDZK
1, 
POLR3C
TRUE
187
5513
12719p1
1146715
4941467
67190
51696
23del
1? 0.1
%mo_to_p1
CHD1L
TRUECH
D1L
204
1809
13355p1
1150955
5811509
67175
11594
11del
1? 0.1
%fa_to_both
ANXA9
TRUE
204
1810
13355s1
1150955
8131509
67175
11362
10del
1? 0.1
%fa_to_both
ANXA9
TRUE
206
5509
12691p1
1151337
0181514
03317
66299
35dup
1? 0.1
%mo_to_p1
SELENBP1, P
OGZ, PSMB4
FALSE
222
5627
13599p1
1156890
5951569
33386
42791
45dup
1? 0.1
%mo_to_p1
ARHGEF11, L
RRC71
TRUE
TMEM62 [mis
sense]
225
1555
11411s1
1158323
7781583
26686
2908
6del
1? 0.1
%fa_to_s1
CD1E
TRUE
DCLRE1B [m
issense]
252
551
2161s1
1182550
3591825
55941
5582
4del
1? 0.1
%mo_to_s1
RNASEL
FALSE
255
361
1715p1
1185097
8001851
30057
32257
11dup
3? 1
%fa_to_both
TRMT1L, SW
T1
TRUE
ASAH2 [frame
shift]
255
371
1715s1
1185097
8001851
30057
32257
11dup
3? 1
%fa_to_both
TRMT1L, SW
T1
TRUE
255
8312
12313p1
1185097
8001851
30057
32257
11dup
3? 0.5
%mo_to_p1
TRMT1L, SW
T1
TRUE
255
5623
13590p1
1185097
8001851
30057
32257
11dup
3? 0.5
%mo_to_bot
hTRMT1L
, SWT1
TRUE
MTHFS [fram
eshift], 
EFCAB5 [fram
eshift]
255
5624
13590s1
1185106
7371851
21067
14330
9dup
3? 0.5
%mo_to_bot
hTRMT1L
TRUE
257
5488
12518s1
1190195
2111902
50880
55669
4del
2? 0.1
%mo_to_s1
FAM5C
FALSE
260
1863
14110p1
1197479
5891974
82083
2494
3del
1? 1
%mo_to_p1
DENND1B
FALSE
PHF3 [missen
se]
277
1643
12175p1
1213003
5832130
09491
5908
2del
6? 2
%fa_to_p1
C1orf227
TRUE
279
116
13890p1
1220088
7902201
00447
11657
3dup
1? 0.1
%fa_to_both
RNU5F
TRUE
DYRK1A [spli
ce]
279
117
13890s1
1220088
7902201
00447
11657
3dup
1? 0.1
%fa_to_both
RNU5F
TRUE
296
7648
12403p1
1230313
9632303
39036
25073
2del
1? 0.1
%fa_to_p1
GALNT2
TRUE
295
1640
12162p1
1230313
9632320
94638
1780675
160del
1? 0.1
%mo_to_p1
C1orf124, PG
BD5, COG2, E
XOC8, EGLN1
, 
C1orf131, C1
orf198, TSNA
X-DISC1, GAL
NT2, 
TTC13, GNPA
T, FAM89A, A
RV1, AGT, CA
PN9, 
TRIM67
TRUEAG
T, GNPAT
308
1725
12308p1
1245026
9182451
33745
106827
2dup
1? 0.1
%mo_to_p1
HNRNPU
TRUE
309
1544
11301p1
1245530
1352457
22314
192179
4del
1? 0.1
%fa_to_both
KIF26B
TRUE
ZNF335 [miss
ense], BRD1 
[missense]
309
1545
11301s1
1245530
1352457
22314
192179
4del
1? 0.1
%fa_to_both
KIF26B
TRUE
311
1597
11797s1
1245912
8642468
05339
892475
27dup
2? 0.1
%mo_to_s1
TFB2M, CNS
T, SMYD3
TRUE
311
1523
11203s1
1246490
5022468
11354
320852
21dup
2? 0.1
%mo_to_s1
TFB2M, CNS
T, SMYD3
TRUE
312
7653
11676s1
1246927
5472469
30602
3055
3dup
1? 0.1
%mo_to_s1
SCCPDH
TRUE
316
8327
12463p1
1247599
2712477
69817
170546
14dup
1? 0.1
%fa_to_p1
OR2B11, NLR
P3, OR2G3, O
R2G2, OR2C3
, 
OR2W5, C1o
rf150
TRUENL
RP3U
NC80 [nonsen
se]
318
8328
12463p1
1248023
9182481
13098
89180
8dup
1? 0.5
%fa_to_p1
OR2W3, OR2
L13, TRIM58,
 OR2T8
TRUE
UNC80 [nons
ense]
1605
5679
12837p1
21396
88315
46246
149363
28dup
2? 0.5
%mo_to_p1
TPO
TRUE
SH3RF3 [miss
ense]
1605
5672
12733s1
21426
81615
20754
93938
19dup
2? 0.5
%mo_to_s1
TPO
FALSE
RANBP9 [mis
sense]
1604
5709
13296s1
21437
20914
79843
42634
8dup
1? 0.5
%mo_to_s1
TPO
TRUE
1628
2030
12729p1
229117
564291
64440
46876
15dup
1? 0.1
%mo_to_bot
hWDR43
TRUE
1628
2032
12729s1
229117
564291
69645
52081
18dup
1? 0.1
%mo_to_bot
hWDR43
TRUE
1632
5701
13176s1
232631
566332
46273
614707
87dup
1? 0.5
%mo_to_s1
BIRC6, TTC2
7, LTBP1
TRUE
1633
5674
12826p1
244071
645440
73450
1805
2dup
1? 0.1
%mo_to_p1
ABCG8
TRUE
TROVE2 [fram
eshift]
1634
1898
11118p1
244527
109445
41090
13981
5dup
2? 0.1
%fa_to_both
SLC3A1
TRUESL
C3A1
1634
7946
11118s1
244527
109445
41090
13981
5dup
2? 0.1
%fa_to_both
SLC3A1
TRUESL
C3A1
1634
146
11472p1
244527
109445
41090
13981
5dup
2? 0.1
%fa_to_both
SLC3A1
TRUESL
C3A1
KRT80 [misse
nse], SP7 
[missense]
1634
147
11472s1
244527
109445
41090
13981
5dup
2? 0.1
%fa_to_both
SLC3A1
TRUESL
C3A1
MACC1 [miss
ense], 
ZYG11A [mis
sense]
1635
2031
12729p1
245616
448458
32580
216132
20dup
1? 0.5
%fa_to_both
SRBD1
TRUE
1635
2033
12729s1
245616
448458
79587
263139
21dup
1? 0.5
%fa_to_both
SRBD1, PRKC
E
TRUE
1644
7950
11716p1
261558
410629
98527
1440117
76dup
1? 0.1
%fa_to_p1
XPO1, FAM16
1A, EHBP1, T
MEM17, B3G
NT2, 
COMMD1, CC
T4, USP34
TRUE
1652
1985
12228p1
274129
486741
66149
36663
9dup
4? 0.1
%mo_to_p1
ACTG2, DGU
OK
TRUE
CYP4F3 [miss
ense]
1661
175
11895p1
286292
396865
09365
216969
64dup
2? 0.1
%fa_to_p1
REEP1, POLR
1A, MRPL35,
 PTCD3, IMM
T
FALSE
1666
5682
12851s1
296780
544977
84254
1003710
255dup
1? 0.1
%mo_to_s1
FER1L5, CIAO
1, ANKRD36,
 SEMA4C, AN
KRD39, 
SNRNP200, A
STL, ADRA2B
, NEURL3, LM
AN2L, 
DUSP2, TME
M127, ITPRIP
L1, ANKRD23
, 
FAHD2B, CNN
M3, CNNM4, 
ARID5A, LOC
285033, 
STARD7, NCA
PH, KIAA1310
, FAM178B
TRUE
1674
5735
13507p1
298192
841982
75940
83099
18dup
2? 0.5
%fa_to_both
ANKRD36B, C
OX5B, ACTR1
B
FALSE
1674
5736
13507s1
298192
841982
74779
81938
15dup
2? 0.5
%fa_to_both
ANKRD36B, C
OX5B, ACTR1
B
FALSE
SCOC [frame
shift], LAMA1
 
[missense]
1674
1930
11484p1
298262
549982
75940
13391
12dup
2? 0.5
%mo_to_bot
hACTR1B
, COX5B
TRUE
TMEM85 [mis
sense], 
PPP2R1B [mi
ssense]
1674
1931
11484s1
298263
529982
75940
12411
11dup
2? 0.5
%mo_to_bot
hACTR1B
, COX5B
TRUE
1675
5695
13166s1
298866
780988
72637
5857
2del
1? 2
%fa_to_s1
VWA3B
TRUE
USP34 [misse
nse]
1677
5740
13508s1
299858
826999
12133
53307
10dup
2? 0.5
%mo_to_s1
LYG1, LYG2
FALSE
1677
2057
13795p1
299900
855999
09103
8248
4dup
2? 0.5
%mo_to_bot
hLYG1
TRUE
1680
7964
11045p1
2102407
1811024
16105
8924
3dup
1? 0.1
%mo_to_p1
MAP4K4
FALSE
1688
5663
12691p1
2111395
5401114
38066
42526
26dup
1? 0.1
%mo_to_p1
BUB1
FALSE
1688
5656
12579p1
2111395
5401130
90065
1694525
167dup
1? 0.1
%fa_to_both
ZC3H8, BCL2
L11, ACOXL, 
ZC3H6, MERT
K, 
ANAPC1, BU
B1, FBLN7, T
MEM87B
TRUE
1687
5657
12579s1
2111395
5401130
90065
1694525
167dup
1? 0.1
%fa_to_both
ZC3H8, BCL2
L11, ACOXL, 
ZC3H6, MERT
K, 
ANAPC1, BU
B1, FBLN7, T
MEM87B
TRUE
1692
2045
13355p1
2113251
7191132
78002
26283
4dup
1? 0.1
%mo_to_p1
TTL
TRUE
1693
2016
12552s1
2113346
4421134
04739
58297
2dup
1? 2
%mo_to_s1
SLC20A1, CH
CHD5
TRUE
1694
5654
12526s1
2113537
0721135
41347
4275
4del
1? 0.1
%mo_to_s1
IL1A
TRUEIL1
A
1699
5716
13418p1
2116535
3621165
99921
64559
12dup
1? 0.1
%fa_to_p1
DPP10
FALSED
PP10
CGNL1 [miss
ense], 
DENND5B [m
issense], 
LRRC40 [miss
ense]
1703
5749
13601p1
2128414
9761289
45188
530212
107dup
1? 0.1
%fa_to_both
LIMS2, POLR
2D, AMMECR
1L, WDR33, S
AP130, 
UGGT1
TRUE
AK1 [missens
e]
1703
5750
13601s1
2128463
8961289
45188
481292
104dup
1? 0.1
%fa_to_both
POLR2D, AM
MECR1L, SAP
130, WDR33,
 UGGT1
TRUE
1716
5675
12826p1
2135206
2191352
15744
9525
3dup
1? 0.1
%mo_to_bot
hTMEM16
3, MGAT5
TRUE
TROVE2 [fram
eshift]
1716
8556
12826s1
2135206
2191352
15744
9525
3dup
1? 0.1
%mo_to_bot
hTMEM16
3, MGAT5
TRUE
1718
1938
11519s1
2159663
5661599
22483
258917
3dup
1? 0.1
%mo_to_s1
TANC1, DAPL
1
TRUETA
NC1
1719
164
11711p1
2160585
5331606
05414
19881
4dup
1? 0.1
%mo_to_bot
hMARCH
7
TRUE
KIAA0182 [mi
ssense]
1719
166
11711s1
2160585
5331606
05414
19881
4dup
1? 0.1
%mo_to_bot
hMARCH
7
TRUE
1722
5707
13269s1
2165381
5131656
00385
218872
17del
1? 0.1
%fa_to_s1
GRB14, COB
LL1
FALSE
1743
5683
12851s1
2192711
1681930
59250
348082
11del
1? 0.1
%mo_to_s1
TMEFF2, SDP
R
TRUE
1754
1946
11622p1
2211299
2102115
42709
243499
48del
1? 0.1
%mo_to_bot
hLANCL1
, CPS1
TRUECP
S1
1754
1947
11622s1
2211299
2102115
42709
243499
48del
1? 0.1
%mo_to_bot
hLANCL1
, CPS1
TRUECP
S1
1762
5644
12420p1
2224758
9502248
31751
72801
12dup
1? 0.1
%fa_to_both
WDFY1, MRP
L44
FALSE
HRH2 [missen
se], GOLGA4
 
[missense]
1762
5645
12420s1
2224758
9502248
31751
72801
12dup
1? 0.1
%fa_to_both
WDFY1, MRP
L44
FALSE
VPS18 [misse
nse]
1764
5689
12997p1
2230632
2692307
24290
92021
39dup
1? 0.1
%fa_to_p1
TRIP12
TRUE
1767
160
11659s1
2232070
9512320
72965
2014
2del
1? 0.1
%mo_to_s1
ARMC9
FALSE
1770
1967
12175s1
2234181
6112342
29469
47858
16dup
1? 0.1
%fa_to_s1
ATG16L1, SA
G
TRUE
1778
5746
13589p1
2241439
3742415
14045
74671
19dup
1? 0.1
%fa_to_both
DUSP28, ANK
MY1, RNPEP
L1
TRUE
WDR55 [miss
ense]
1778
5747
13589s1
2241439
8682415
14045
74177
18dup
1? 0.1
%fa_to_both
DUSP28, ANK
MY1, RNPEP
L1
TRUE
1780
5640
12394p1
2241538
0672417
09123
171056
42dup
3? 0.5
%mo_to_p1
KIF1A, GPR3
5, AQP12B, A
QP12A, CAPN
10
TRUEKI
F1A
1782
1894
11107p1
2242371
0932423
75953
4860
4del
1? 0.1
%mo_to_bot
hFARP2
TRUE
TROAP [miss
ense]
1782
1895
11107s1
2242371
0932423
75953
4860
4del
1? 0.1
%mo_to_bot
hFARP2
TRUE
2232
5765
12588p1
37594
80877
82093
187285
6del
1? 0.1
%mo_to_bot
hGRM7
TRUE
TEKT1 [misse
nse]
2232
5766
12588s1
37594
80877
82093
187285
6del
1? 0.1
%mo_to_bot
hGRM7
TRUE
NBPF9 [nons
ense]
2237
2179
11824p1
38578
87086
72615
93745
11dup
1? 0.1
%fa_to_both
LMCD1, C3or
f32
FALSE
EBAG9 [misse
nse]
2237
2180
11824s1
38578
87086
75624
96754
13dup
1? 0.1
%fa_to_both
LMCD1, C3or
f32
FALSE
2238
5784
12851p1
39867
48398
74929
7446
4dup
1? 0.1
%mo_to_bot
hARPC4-
TTLL3
TRUE
SLC25A29 [m
issense], 
HMGXB3 [mis
sense], 
UBE3C [miss
ense]
2238
8377
12851s1
39867
48398
71079
3596
3dup
1? 0.1
%mo_to_bot
hARPC4-
TTLL3
TRUE
2242
2182
11905p1
312475
396127
91331
315935
47dup
2? 0.1
%mo_to_p1
PPARG, MKR
N2, RAF1, TS
EN2, TMEM4
0
TRUE
2242
5761
12481p1
312632
296127
91331
159035
22dup
2? 0.1
%mo_to_bot
hRAF1, T
MEM40
FALSE
2242
5762
12481s1
312632
296127
91331
159035
22dup
2? 0.1
%mo_to_bot
hRAF1, T
MEM40
FALSE
2236
2239
12534p1
312940
888129
78197
37309
13del
2? 0.1
%mo_to_p1
IQSEC1
TRUE
MTMR9 [miss
ense], MMP8
 
[missense]
2236
5778
12727p1
312940
888129
83365
42477
14dup
2? 0.1
%mo_to_bot
hIQSEC1
TRUE
ZMAT5 [misse
nse]
2236
5779
12727s1
312942
481129
83365
40884
13dup
2? 0.1
%mo_to_bot
hIQSEC1
TRUE
TCF12 [frame
shift]
2262
2258
13322s1
331917
924319
21322
3398
2del
1? 0.1
%mo_to_s1
OSBPL10
TRUE
HPS6 [missen
se]
2265
2163
11501s1
335833
873358
35450
1577
2dup
3? 0.5
%fa_to_s1
ARPP21
FALSE
ANPEP [miss
ense]
2265
2232
12383s1
335833
873358
35450
1577
2dup
3? 0.5
%fa_to_s1
ARPP21
FALSE
2265
2261
13393p1
335833
873358
35450
1577
2dup
3? 0.5
%mo_to_bot
hARPP21
FALSE
2265
2262
13393s1
335833
873358
35450
1577
2dup
3? 0.5
%mo_to_bot
hARPP21
FALSE
2266
5819
13512p1
337095
341370
96656
1315
3del
1? 0.1
%fa_to_p1
LRRFIP2
TRUE
2301
8380
12683p1
357143
564571
44339
775
2del
1? 0.1
%mo_to_p1
IL17RD
FALSE
KDM6B [fram
eshift]
2307
5788
13094p1
378648
062787
96050
147988
27dup
1? 0.1
%fa_to_both
ROBO1
FALSER
OBO1
WDFY3 [nons
ense]
2307
5789
13094s1
378649
262787
96050
146788
26dup
1? 0.1
%fa_to_both
ROBO1
FALSER
OBO1
CNOT4 [miss
ense]
2308
5755
12252p1
381627
075816
40315
13240
4del
1? 0.1
%mo_to_bot
hGBE1
FALSEG
BE1
2308
5756
12252s1
381627
075816
40315
13240
4del
1? 0.1
%mo_to_bot
hGBE1
FALSEG
BE1
2309
5791
13099p1
397486
951976
34880
147929
19del
1? 0.1
%fa_to_p1
None, ARL6
TRUEAR
L6
2317
5796
13162p1
3113588
3531136
19993
31640
5del
2? 0.1
%mo_to_bot
hGRAMD
1C
TRUE
RIMS1 [frame
shift]
2317
5797
13162s1
3113588
3531136
19993
31640
5del
2? 0.1
%mo_to_bot
hGRAMD
1C
TRUE
2326
260
13890p1
3129120
4351291
27701
7266
4del
1? 0.5
%mo_to_p1
C3orf25
TRUE
DYRK1A [spli
ce]
2329
2172
11676p1
3132277
8141322
80061
2247
3del
7? 1
%mo_to_p1
NPHP3-ACAD
11
TRUE
2329
237
12304p1
3132277
8141322
80061
2247
3del
7? 1
%mo_to_p1
NPHP3-ACAD
11
FALSE
STIL [missens
e], PSEN1 
[missense], P
HF19 
[missense]
2329
2266
13660s1
3132277
8141322
80061
2247
3del
7? 1
%fa_to_s1
NPHP3-ACAD
11
TRUE
ZNF423 [miss
ense]
2329
255
13793s1
3132277
8141322
80061
2247
3del
7? 1
%mo_to_s1
NPHP3-ACAD
11
TRUE
2331
2098
11067p1
3137781
6571378
16689
35032
14dup
2? 0.1
%mo_to_bot
hDZIP1L
TRUE
2331
2099
11067s1
3137781
6571378
16689
35032
14dup
2? 0.1
%mo_to_bot
hDZIP1L
TRUE
2331
230
12106s1
3137781
6571378
03095
21438
9dup
2? 0.1
%mo_to_s1
DZIP1L
TRUE
2335
2096
11057s1
3141884
4631420
84208
199745
31dup
6? 0.5
%fa_to_s1
GK5, XRN1
FALSE
2335
2124
11108p1
3141884
4631420
75961
191498
28dup
6? 0.5
%mo_to_bot
hGK5, XR
N1
FALSE
KANK1 [misse
nse]
2335
2126
11108s1
3141884
4631420
84021
199558
30dup
6? 0.5
%mo_to_bot
hGK5, XR
N1
FALSE
MAN2A1 [mis
sense]
2335
2152
11304s1
3141884
4631420
84021
199558
30dup
6? 0.5
%fa_to_s1
GK5, XRN1
TRUE
KLC2 [missen
se]
2335
247
13335p1
3141884
4631420
84208
199745
31dup
6? 0.5
%fa_to_both
GK5, XRN1
FALSE
ZNF420 [miss
ense]
2335
248
13335s1
3141884
4631420
84208
199745
31dup
6? 0.5
%fa_to_both
GK5, XRN1
FALSE
2335
5768
12631p1
3141889
1661420
75961
186795
27dup
6? 0.5
%mo_to_bot
hGK5, XR
N1
FALSE
2335
5769
12631s1
3141889
1661420
84208
195042
30dup
6? 0.5
%mo_to_bot
hGK5, XR
N1
FALSE
GAPVD1 [mis
sense], YIF1A
 
[missense]
2334
5800
13176s1
3141896
3231419
01891
5568
3del
2? 0.1
%fa_to_both
GK5
TRUE
2334
8388
13176p1
3141896
3231419
01891
5568
3del
2? 0.1
%fa_to_both
GK5
TRUE
ZFYVE26 [fram
eshift]
2341
2218
12297s1
3150280
3281502
86079
5751
6del
1? 0.1
%fa_to_s1
EIF2A
TRUEEIF
2AG
NA14 [missen
se]
2342
8389
13487p1
3155481
3051554
93637
12332
3dup
2? 0.5
%fa_to_p1
C3orf33
FALSE
CYP4Z1 [miss
ense]
2342
2279
13922s1
3155481
3051554
93637
12332
3dup
2? 1
%fa_to_s1
C3orf33
TRUE
2356
224
11788p1
3183822
5741838
23757
1183
3del
2? 0.1
%mo_to_p1
HTR3E
TRUE
YTHDC2 [mis
sense]
2349
2103
11075p1
3184104
8381842
89170
184332
7del
1? 0.1
%mo_to_bot
hEPHB3, 
CHRD
FALSE
2349
2104
11075s1
3184104
8381841
07210
2372
6del
1? 0.1
%mo_to_bot
hCHRD
FALSE
MED12L [mis
sense], 
C4orf40 [miss
ense]
2357
244
13169s1
3194947
4371954
53443
506006
38dup
1? 0.1
%mo_to_s1
C3orf21, APO
D, MUC20, A
CAP2, PPP1R
2
FALSE
2362
245
13169s1
3197238
7651972
60432
21667
5dup
1? 0.1
%mo_to_s1
BDH1
FALSE
2347
2154
11336p1
3197553
7481975
56544
2796
2del
1? 0.1
%mo_to_bot
hLRCH3
TRUE
SLC26A5 [mis
sense]
2347
2155
11336s1
3197553
7481975
56544
2796
2del
1? 0.1
%mo_to_bot
hLRCH3
TRUE
RGS7 [missen
se]
2364
2125
11108p1
3197574
2791976
40913
66634
14dup
1? 0.1
%mo_to_p1
IQCG, LRCH3
FALSE
KANK1 [misse
nse]
2386
5834
12645p1
4818
2798
45762
27483
5dup
1? 0.1
%mo_to_p1
CPLX1, GAK
TRUECP
LX1A
RHGAP21 [m
issense], 
ANK2 [nonsen
se], LRRC31 
[missense]
2392
2369
12340p1
41746
67417
95770
49096
2dup
1? 0.1
%fa_to_p1
TACC3, FGFR
3
TRUEFG
FR3P
PM1D [nonse
nse], 
BCORL1 [mis
sense]
2395
7474
11773p1
42641
46128
35561
194100
34dup
1? 0.1
%mo_to_p1
TNIP2, FAM1
93A, SH3BP2
TRUETN
IP2, SH3BP2
KRBA1 [misse
nse], 
CACNA1E [m
issense]
2398
2324
11219p1
42906
49032
01666
295176
93dup
1? 0.1
%mo_to_bot
hADD1, G
RK4, C4orf10
, MFSD10, HT
T, NOP14
FALSEH
TT
2398
2325
11219s1
42906
49032
01666
295176
93dup
1? 0.1
%mo_to_bot
hADD1, G
RK4, C4orf10
, MFSD10, HT
T, NOP14
FALSEH
TT
LAMA4 [miss
ense]
2399
2304
11060p1
43445
76534
50003
4238
9del
4? 0.5
%fa_to_p1
HGFAC
FALSE
2399
8152
11700p1
43446
99134
49762
2771
4del
4? 0.5
%mo_to_p1
HGFAC
FALSE
2403
278
12161p1
46594
89966
13005
18106
10del
1? 0.1
%mo_to_bot
hMAN2B2
FALSE
UBR3 [frames
hift], CARKD 
[nonsense]
2403
279
12161s1
46594
89966
13005
18106
10del
1? 0.1
%mo_to_bot
hMAN2B2
FALSE
2419
2310
11075p1
440762
450407
78276
15826
5dup
1? 0.5
%mo_to_bot
hNSUN7
FALSE
2419
2311
11075s1
440762
450407
78276
15826
5dup
1? 0.5
%mo_to_bot
hNSUN7
FALSE
MED12L [mis
sense], 
C4orf40 [miss
ense]
2420
2305
11066p1
441258
993412
59143
150
2dup
3? 0.1
%fa_to_p1
UCHL1
TRUEUC
HL1T
CF7L1 [misse
nse]
2423
8403
12962s1
447625
587476
25965
378
2del
1? 0.1
%mo_to_s1
CORIN
TRUE
2425
2379
12524p1
448165
718481
78203
12485
6dup
1? 0.1
%fa_to_p1
TEC
TRUE
2429
2302
11014p1
457361
522575
22178
160656
9dup
1? 0.1
%mo_to_p1
ARL9, SRP72
, HOPX
FALSE
2445
5824
12420p1
481207
478820
92958
885480
26dup
1? 0.1
%mo_to_bot
hFGF5, C
4orf22, BMP3
, PRKG2
FALSE
HRH2 [missen
se], GOLGA4
 
[missense]
2445
5825
12420s1
481207
478820
92958
885480
26dup
1? 0.1
%mo_to_bot
hFGF5, C
4orf22, BMP3
, PRKG2
FALSE
VPS18 [misse
nse]
2448
5821
12409p1
489978
088900
35703
57615
2del
1? 0.1
%mo_to_bot
hTIGD2, F
AM13A
TRUE
LRP2 [nonsen
se]
2448
5822
12409s1
489978
088900
35703
57615
2del
1? 0.1
%mo_to_bot
hTIGD2, F
AM13A
TRUE
2450
292
13533p1
490855
960908
74569
18609
3del
2? 0.1
%fa_to_p1
MMRN1
FALSE
2453
2341
11622p1
4100738
0701007
82782
44712
5dup
1? 0.1
%fa_to_both
DAPP1
TRUE
2453
2342
11622s1
4100738
0701007
74505
36435
4dup
1? 0.1
%fa_to_both
DAPP1
TRUE
2457
263
11190p1
4108535
4041088
71579
336175
22dup
1? 0.1
%fa_to_both
PAPSS1, SGM
S2, CYP2U1
TRUE
2457
264
11190s1
4108535
4041088
71579
336175
22dup
1? 0.1
%fa_to_both
PAPSS1, SGM
S2, CYP2U1
TRUE
2465
5839
12770p1
4134071
2951351
22348
1051053
6dup
1? 0.1
%fa_to_both
PCDH10
TRUEPC
DH10
2465
5841
12770s1
4134071
2951351
22348
1051053
6dup
1? 0.1
%fa_to_both
PCDH10
TRUEPC
DH10
2473
275
11959s1
4151727
4221520
70713
343291
49dup
1? 0.1
%fa_to_s1
SH3D19, RPS
3A, LRBA
FALSE
2459
2373
12370p1
4159590
7641596
16795
26031
9dup
1? 0.1
%mo_to_p1
C4orf46, ETF
DH
TRUE
2458
2390
13385p1
4169083
6781690
86477
2799
3del
1? 0.5
%fa_to_p1
ANXA10
TRUE
2462
2348
12100p1
4187192
7621874
76519
283757
15dup
1? 0.1
%fa_to_p1
F11, MTNR1A
TRUE
2463
8168
11429s1
4189018
2141890
68526
50312
12dup
1? 0.1
%fa_to_s1
TRIML2, TRIM
L1
FALSE
IL6R [missens
e]
2483
341
12373p1
5619
1048
01279
182175
15dup
2? 0.1
%mo_to_bot
hTPPP, ZD
HHC11, CEP7
2
TRUE
2483
342
12373s1
5619
1046
78175
59071
14dup
2? 0.1
%mo_to_bot
hTPPP, C
EP72
TRUE
2483
8408
13293p1
5619
1046
44540
25436
9dup
2? 0.1
%fa_to_p1
CEP72
TRUE
DNAH10 [mis
sense], 
KIAA0240 [mi
ssense], 
MUC4 [misse
nse]
2490
5918
13396p1
59136
61592
38002
101387
8del
1? 0.1
%mo_to_bot
hSEMA5A
TRUESE
MA5A
2490
5919
13396s1
59136
61593
37924
201309
10del
1? 0.1
%mo_to_bot
hSEMA5A
TRUESE
MA5AC
DC123 [misse
nse]
2503
313
11659p1
540931
165409
37792
6627
4del
1? 0.1
%fa_to_both
C7
FALSE
2503
314
11659s1
540931
165409
37792
6627
4del
1? 0.1
%fa_to_both
C7
FALSE
2537
2442
11456s1
581283
389813
54421
71032
2dup
2? 0.1
%mo_to_s1
ATG10
TRUE
EPG5 [missen
se]
2537
5890
12851s1
581283
389813
54421
71032
2dup
2? 0.1
%fa_to_s1
ATG10
TRUE
2545
5876
12588s1
5110427
9861104
46977
18991
15dup
6? 0.5
%mo_to_s1
WDR36
TRUE
NBPF9 [nons
ense]
2545
2490
12370p1
5110430
6171104
46977
16360
14dup
6? 0.5
%mo_to_bot
hWDR36
TRUE
2545
2492
12370s1
5110430
6171104
46977
16360
14dup
6? 0.5
%mo_to_bot
hWDR36
TRUE
2545
2508
12690p1
5110430
6171104
46977
16360
14dup
6? 0.5
%mo_to_p1
WDR36
TRUE
HIST1H2AE [m
issense]
2545
5939
13684p1
5110430
6171104
46977
16360
14dup
6? 0.5
%mo_to_p1
WDR36
TRUE
2545
2533
13876s1
5110430
6171104
46977
16360
14dup
6? 0.5
%fa_to_s1
WDR36
FALSE
2545
2520
13322p1
5110432
7761104
46977
14201
13dup
6? 0.5
%fa_to_p1
WDR36
TRUE
2546
303
11469p1
5112915
2821129
29080
13798
6dup
1? 0.5
%fa_to_both
YTHDC2
TRUE
2546
304
11469s1
5112915
2821129
29080
13798
6dup
1? 0.5
%fa_to_both
YTHDC2
TRUE
2549
5911
13312s1
5121297
7271213
30365
32638
4del
2? 0.1
%mo_to_s1
SRFBP1
FALSE
TRIM37 [miss
ense]
2549
336
12161p1
5121309
8901213
58102
48212
6dup
2? 0.1
%mo_to_bot
hSRFBP1
FALSE
UBR3 [frames
hift], CARKD 
[nonsense]
2549
337
12161s1
5121309
8901213
62821
52931
7dup
2? 0.1
%mo_to_bot
hSRFBP1
FALSE
2550
2527
13825p1
5122161
7441221
63341
1597
3del
1? 0.5
%fa_to_p1
SNX2
FALSE
2551
5916
13387s1
5123973
5901240
36962
63372
8dup
1? 0.1
%mo_to_s1
ZNF608
TRUE
2566
346
12390s1
5141312
8231413
14151
1328
3del
1? 0.1
%mo_to_s1
KIAA0141
TRUE
2568
5937
13601s1
5147498
5481474
99699
1151
2dup
1? 0.1
%mo_to_s1
SPINK5
TRUE
2570
8183
13625p1
5150175
0021502
77746
102744
4del
3? 0.1
%mo_to_p1
C5orf62, ZNF
300
TRUE
2572
2454
11828p1
5156456
7431564
79665
22922
6dup
1? 0.1
%fa_to_both
HAVCR1
TRUE
2572
2455
11828s1
5156456
7431564
79665
22922
6dup
1? 0.1
%fa_to_both
HAVCR1
TRUE
2574
2540
13922p1
5175992
3531759
95818
3465
3del
1? 0.1
%mo_to_bot
hCDHR2
TRUE
2574
2541
13922s1
5175992
3531759
95818
3465
3del
1? 0.1
%mo_to_bot
hCDHR2
TRUE
2577
2514
12906s1
5176314
0021763
18192
4190
10dup
3? 0.5
%mo_to_s1
HK3
TRUE
2577
5929
13504s1
5176314
4511763
17942
3491
7dup
3? 0.5
%mo_to_bot
hHK3
TRUE
2577
8436
13504p1
5176314
4511763
17942
3491
7dup
3? 0.5
%mo_to_bot
hHK3
TRUE
2611
2571
11115p1
62949
14730
17219
68072
10del
2? 0.1
%fa_to_p1
NQO2, SERP
INB6
TRUE
2617
2688
13876p1
620781
375208
46409
65034
2dup
1? 0.1
%mo_to_p1
CDKAL1
FALSE
TUBGCP5 [m
issense]
2618
5978
12758p1
624454
242245
23153
68911
20dup
1? 0.1
%mo_to_p1
ALDH5A1, GP
LD1
TRUEAL
DH5A1
2652
2584
11220s1
633384
873333
85087
214
2dup
2? 0.1
%mo_to_s1
CUTA
FALSE
2655
5986
13153p1
635959
435359
67885
8450
4dup
1? 0.1
%fa_to_p1
SLC26A8
TRUE
MAPK13 [mis
sense]
2657
5974
12743p1
641243
862413
18602
74740
9dup
1? 0.1
%mo_to_p1
TREM1, NCR
2
FALSE
2658
7509
13335s1
641709
533417
10227
694
2del
1? 0.1
%mo_to_s1
PGC
FALSE
2660
5960
12637s1
642883
807428
92011
8204
3del
1? 0.1
%fa_to_s1
PTCRA
FALSE
CDC7 [missen
se]
2664
5955
12467s1
643495
896435
01744
5848
6del
1? 0.1
%mo_to_bot
hXPO5
FALSE
2664
5953
12467p1
643496
565435
01744
5179
5del
1? 0.1
%mo_to_bot
hXPO5
FALSE
2666
2685
13825s1
643603
612436
04383
771
2dup
1? 0.1
%fa_to_s1
MAD2L1BP
FALSE
2672
2585
11220s1
647471
015475
67179
96164
12del
1? 0.1
%fa_to_s1
CD2AP
FALSE
2673
7512
11013s1
649426
794494
40571
13777
5dup
2? 2
%mo_to_s1
MUT, CENPQ
TRUE
2674
378
11229p1
651747
890517
52043
4153
3del
1? 0.1
%fa_to_p1
PKHD1
FALSE
2679
393
12106p1
656915
571569
19661
4090
2dup
2? 0.5
%fa_to_both
KIAA1586
TRUEKI
AA1586
2679
394
12106s1
656915
571569
19661
4090
2dup
2? 0.5
%fa_to_both
KIAA1586
TRUEKI
AA1586
2682
2661
12869p1
665523
270662
05303
682033
19dup
1? 0.1
%fa_to_p1
EYS
TRUE
PRCP [missen
se]
2683
2624
11828s1
665596
589656
22636
26047
4del
1? 0.1
%mo_to_s1
TRUE
2688
2612
11551p1
688315
634883
18947
3313
3del
1? 0.5
%mo_to_p1
ORC3
TRUE
2689
380
11459p1
688317
390883
66700
49310
10del
1? 0.1
%fa_to_p1
ORC3
TRUE
DEPDC7 [mis
sense]
2692
5987
13153s1
699998
868999
99771
903
2dup
3? 2
%fa_to_s1
CCNC
TRUE
2700
5941
12334p1
6118786
5671189
53774
167207
13dup
1? 0.1
%mo_to_bot
hC6orf20
4
FALSE
2700
5942
12334s1
6118786
5671192
15686
429119
14dup
1? 0.1
%mo_to_bot
hC6orf20
4
FALSE
CSNK1G3 [fra
meshift], 
PDS5A [nons
ense]
2703
6010
13487p1
6123573
5561235
86483
12927
5del
3? 0.1
%mo_to_bot
hTRDN
FALSE
CYP4Z1 [miss
ense]
2712
6021
13513p1
6139206
6421392
06967
325
2del
1? 0.1
%mo_to_bot
hECT2L
FALSE
ZMYND11 [sp
lice]
2712
6022
13513s1
6139206
6421392
06967
325
2del
1? 0.1
%mo_to_bot
hECT2L
FALSE
FGD5 [missen
se]
2713
5989
13183p1
6142396
7841424
00040
3256
2del
1? 0.1
%mo_to_bot
hNMBR
TRUE
BCL11A [fram
eshift], 
CNOT6 [miss
ense]
2713
5990
13183s1
6142396
7841424
00040
3256
2del
1? 0.1
%mo_to_bot
hNMBR
TRUE
2715
2614
11581p1
6146870
5991468
75741
5142
2del
3? 0.5
%fa_to_p1
RAB32
TRUE
2715
5962
12655s1
6146870
5991468
75741
5142
2del
3? 0.5
%mo_to_s1
RAB32
TRUE
2715
2677
13739p1
6146870
5991468
75741
5142
2del
3? 0.5
%mo_to_bot
hRAB32
TRUE
CD151 [misse
nse]
2715
2678
13739s1
6146870
5991468
75741
5142
2del
3? 0.5
%mo_to_bot
hRAB32
TRUE
COL11A1 [mi
ssense]
2717
2608
11509s1
6151089
7731511
53341
63568
13del
1? 0.1
%mo_to_s1
PLEKHG1
FALSE
2718
6003
13396p1
6151865
7061518
69624
3918
2del
1? 0.1
%fa_to_both
C6orf97
TRUE
2718
6004
13396s1
6151865
7061518
69624
3918
2del
1? 0.1
%fa_to_both
C6orf97
TRUE
CDC123 [mis
sense]
2722
5998
13327s1
6160575
8291605
77106
1277
2del
1? 0.1
%mo_to_s1
SLC22A1
TRUE
2723
2655
12650s1
6162683
5561628
64505
180949
2dup
1? 0.1
%mo_to_s1
PARK2
FALSEPA
RK2
2726
2643
12317p1
6168187
9251682
27435
39510
3dup
3? 0.1
%fa_to_p1
C6orf124
FALSE
2730
2610
11519p1
6169617
9151696
46376
28461
19dup
1? 0.1
%mo_to_p1
THBS2
TRUE
SMC3 [misse
nse], 
SUV420H1 [m
issense]
2732
6016
13504p1
6170844
3071708
92835
48528
18dup
1? 0.1
%mo_to_bot
hTBP, PD
CD2, PSMB1
TRUETB
P
2732
6017
13504s1
6170844
3071708
92835
48528
18dup
1? 0.1
%mo_to_bot
hTBP, PD
CD2, PSMB1
TRUETB
P
2739
6031
12334p1
74050
59741
19216
68619
8del
1? 0.1
%mo_to_p1
SDK1
FALSESD
K1
2758
488
13335s1
77398
26374
95743
97480
20del
1? 0.1
%mo_to_s1
COL28A1
FALSE
2767
450
11696p1
716839
367173
82688
543321
22dup
1? 0.1
%fa_to_p1
AHR, AGR2, A
GR3
FALSE
INCENP [miss
ense]
2766
512
14201p1
716900
123169
13467
13344
5del
1? 0.1
%mo_to_p1
AGR3
TRUE
2768
2757
11437p1
719737
938197
39903
1965
2del
1? 0.1
%fa_to_p1
TWISTNB
FALSE
MADD [misse
nse]
2769
2724
11117s1
720180
568201
99868
19300
3dup
1? 0.1
%mo_to_s1
MACC1
TRUE
2776
2853
13608p1
724904
960249
11688
6728
5dup
1? 0.1
%fa_to_both
OSBPL3
FALSE
CSDE1 [nons
ense]
2776
2854
13608s1
724904
960249
11688
6728
5dup
1? 0.1
%fa_to_both
OSBPL3
FALSE
2778
2721
11107p1
726236
934262
40199
3265
3dup
1? 0.1
%mo_to_p1
HNRNPA2B1
TRUE
TROAP [miss
ense]
2783
6104
13412p1
733102
179331
85976
83797
7dup
3? 0.5
%fa_to_p1
RP9, BBS9, N
T5C3
TRUEBB
S9
2783
482
13116s1
733134
845331
85976
51131
6dup
3? 0.5
%mo_to_s1
RP9, BBS9
TRUEBB
S9S
RRM5 [misse
nse]
2784
6043
12473p1
733945
225340
14396
69171
6dup
1? 0.1
%fa_to_p1
BMPER
TRUE
2788
2784
11740p1
735707
043357
33940
26897
4dup
1? 0.1
%mo_to_p1
HERPUD2
FALSE
EIF2C1 [miss
ense]
2796
454
11722p1
748308
576484
16169
107593
20del
1? 0.1
%mo_to_p1
ABCA13
TRUEAB
CA13A
PAF1 [missen
se]
2815
2780
11622s1
773631
154736
39072
7918
10del
1? 0.1
%fa_to_s1
LAT2
TRUE
2847
6032
12334p1
789556
556895
83631
27075
6del
1? 0.1
%mo_to_bot
h
FALSE
2847
6033
12334s1
789556
556895
83631
27075
6del
1? 0.1
%mo_to_bot
h
FALSE
CSNK1G3 [fra
meshift], 
PDS5A [nons
ense]
2850
2813
12368s1
791737
807917
46526
8719
4dup
1? 0.1
%mo_to_bot
hAKAP9, 
CYP51A1
TRUEAK
AP9
2850
8228
12368p1
791737
807917
39473
1666
2dup
1? 0.1
%mo_to_bot
hAKAP9
TRUEAK
AP9
2839
465
12304p1
798628
206986
33339
5133
3dup
1? 0.1
%mo_to_bot
hSMURF1
FALSE
STIL [missens
e], PSEN1 
[missense], P
HF19 
[missense]
2839
466
12304s1
798628
206986
33339
5133
3dup
1? 0.1
%mo_to_bot
hSMURF1
FALSE
2880
6040
12463p1
7100150
9691001
51881
912
2dup
1? 0.1
%fa_to_p1
AGFG2
TRUE
UNC80 [nons
ense]
2837
492
13346p1
7107261
7751072
69629
7854
2del
1? 0.1
%mo_to_bot
hNone, B
CAP29
FALSE
KIAA0100 [no
nsense]
2837
493
13346s1
7107261
7751072
69629
7854
2del
1? 0.1
%mo_to_bot
hNone, B
CAP29
FALSE
2841
433
11479p1
7111127
2931111
61503
34210
2del
3? 0.5
%fa_to_p1
IMMP2L
TRUEIM
MP2L
2841
495
13533s1
7111127
2931111
61503
34210
2del
3? 0.5
%fa_to_s1
IMMP2L
FALSEIM
MP2L
ENOX2 [misse
nse]
2830
6066
12837p1
7133502
0771336
02491
100414
3del
1? 0.1
%mo_to_p1
EXOC4
TRUE
SH3RF3 [miss
ense]
2870
6089
13327p1
7137206
6111375
97824
391213
29dup
1? 0.1
%fa_to_p1
CREB3L2, DG
KI
TRUE
2835
2753
11336s1
7138391
3681383
94540
3172
2del
1? 0.1
%mo_to_s1
ATP6V0A4
TRUE
RGS7 [missen
se]
2842
428
11459s1
7140064
2071400
69482
5275
2dup
1? 0.1
%mo_to_s1
SLC37A3
TRUE
2863
6098
13338p1
7141952
0401424
80066
528026
43dup
5? 0.1
%mo_to_bot
hPRSS1, 
None, PRSS5
8
TRUE
2863
6101
13338s1
7141952
0401424
81378
529338
44dup
5? 0.1
%mo_to_bot
hPRSS1, 
None, PRSS5
8
TRUE
2868
6099
13338p1
7142569
4591426
59296
89837
51dup
1? 0.1
%mo_to_bot
hTRPV5, 
TRPV6, KEL, 
C7orf34
TRUE
2868
6102
13338s1
7142571
8281426
59296
87468
48dup
1? 0.1
%mo_to_bot
hTRPV5, 
TRPV6, KEL, 
C7orf34
TRUE
2838
470
12578p1
7150706
0171507
25697
19680
23dup
1? 0.1
%fa_to_both
ATG9B, NOS3
, ABCB8
TRUENO
S3
2838
472
12578s1
7150706
0171507
25697
19680
23dup
1? 0.1
%fa_to_both
ATG9B, NOS3
, ABCB8
TRUENO
S3
2872
2731
11146p1
7151833
9161518
53431
19515
15dup
1? 0.1
%fa_to_both
MLL3
TRUE
2872
2732
11146s1
7151833
9161518
53431
19515
15dup
1? 0.1
%fa_to_both
MLL3
TRUE
DIP2B [misse
nse]
2873
6067
12837s1
7151833
9161520
27824
193908
58dup
1? 0.1
%mo_to_s1
MLL3
TRUE
2931
2917
11196p1
8190
8951
96362
5467
5dup
1? 2
%mo_to_bot
hZNF596
TRUE
2931
2918
11196s1
8190
8953
82935
192040
8dup
1? 2
%mo_to_bot
hZNF596,
 FBXO25
TRUE
2933
6133
12727s1
81824
73620
92905
268169
58dup
1? 0.1
%fa_to_s1
ARHGEF10, M
YOM2
TRUE
TCF12 [frame
shift]
2937
2928
11336p1
82796
10630
87753
291647
44dup
1? 0.1
%fa_to_both
CSMD1
TRUE
SLC26A5 [mis
sense]
2937
2929
11336s1
82796
10630
87753
291647
44dup
1? 0.1
%fa_to_both
CSMD1
TRUE
RGS7 [missen
se]
2944
2935
11501p1
815480
588160
35497
554909
19del
4? 0.1
%mo_to_bot
hMSR1, T
USC3
FALSETU
SC3
2944
2936
11501s1
815480
588160
35497
554909
19del
4? 0.1
%mo_to_bot
hMSR1, T
USC3
FALSETU
SC3A
NPEP [missen
se]
2944
543
12810s1
815967
593160
21760
54167
7del
4? 2
%mo_to_s1
MSR1
TRUE
2944
6146
13196p1
815967
593160
21760
54167
7del
4? 2
%mo_to_p1
MSR1
TRUE
2955
8489
12631p1
827378
399273
80025
1626
2del
1? 0.1
%mo_to_bot
hEPHX2
FALSE
2955
8490
12631s1
827378
399273
80025
1626
2del
1? 0.1
%mo_to_bot
hEPHX2
FALSE
GAPVD1 [mis
sense], YIF1A
 
[missense]
2956
532
11722p1
829959
413300
40689
81276
14dup
1? 0.1
%fa_to_p1
MIR548O2
TRUE
APAF1 [misse
nse]
2958
2940
11716p1
838090
512381
17639
27127
16del
1? 0.1
%fa_to_p1
DDHD2
TRUE
2957
8491
12733p1
842585
736425
87692
1956
2del
1? 0.5
%fa_to_both
CHRNB3
FALSE
BRD4 [3n-non
-
frameshifting]
2957
8492
12733s1
842585
736425
87692
1956
2del
1? 0.5
%fa_to_both
CHRNB3
FALSE
RANBP9 [mis
sense]
2961
6152
13293s1
842938
260432
12038
273778
34dup
2? 0.1
%mo_to_s1
HGSNAT, FNT
A, POTEA, SG
K196
TRUEHG
SNAT
2967
530
11715s1
857078
801570
80828
2027
2dup
7? 1
%mo_to_s1
PLAG1
TRUE
2967
2948
12224p1
857078
801570
80828
2027
2dup
7? 1
%mo_to_bot
hPLAG1
TRUE
MPHOSPH8 [
nonsense]
2967
2950
12224s1
857078
801570
80828
2027
2dup
7? 1
%mo_to_bot
hPLAG1
TRUE
2967
2954
12297s1
857078
801570
80828
2027
2dup
7? 1
%fa_to_both
PLAG1
TRUE
GNA14 [misse
nse]
2967
8244
12297p1
857078
801570
80828
2027
2dup
7? 1
%fa_to_both
PLAG1
TRUE
2967
8493
13293p1
857078
801570
80828
2027
2dup
7? 1
%mo_to_bot
hPLAG1
TRUE
DNAH10 [mis
sense], 
KIAA0240 [mi
ssense], 
MUC4 [misse
nse]
2967
8494
13293s1
857078
801570
80828
2027
2dup
7? 1
%mo_to_bot
hPLAG1
TRUE
2967
6167
13599p1
857078
801570
80828
2027
2dup
7? 1
%fa_to_both
PLAG1
TRUE
TMEM62 [mis
sense]
2967
6168
13599s1
857078
801570
80828
2027
2dup
7? 1
%fa_to_both
PLAG1
TRUE
2972
544
13048p1
867790
808677
91154
346
2del
1? 0.1
%fa_to_both
C8orf45
TRUE
2972
545
13048s1
867790
808677
91154
346
2del
1? 0.1
%fa_to_both
C8orf45
TRUE
2978
6157
13412p1
886351
940865
75726
223786
14dup
1? 0.1
%fa_to_p1
CA3, CA2, RE
XO1L1
TRUECA
2
2981
6139
12840p1
8101642
5541016
61742
19188
4del
1? 0.1
%fa_to_both
SNX31
TRUE
ATP1B1 [nons
ense], 
TM4SF19 [sp
lice]
2981
6140
12840s1
8101642
5541016
61742
19188
4del
1? 0.1
%fa_to_both
SNX31
TRUE
2986
3002
14110s1
8120638
8041208
59320
220516
35del
1? 0.1
%fa_to_both
ENPP2, TAF2
, DSCC1
FALSE
VPS53 [misse
nse]
2986
3001
14110p1
8120756
5271208
62756
106229
31del
1? 0.1
%fa_to_both
TAF2, DSCC1
FALSE
PHF3 [missen
se]
2988
525
11629p1
8124975
5171249
98416
22899
10del
1? 0.1
%fa_to_both
FER1L6
TRUE
FBXO10 [miss
ense]
2988
526
11629s1
8124975
5171249
98416
22899
10del
1? 0.1
%fa_to_both
FER1L6
TRUE
2992
516
11472p1
8128748
8391287
53204
4365
3dup
1? 0.1
%mo_to_p1
MYC
TRUE
KRT80 [misse
nse], SP7 
[missense]
3003
2922
11252p1
8144295
1421444
50815
155673
24dup
1? 0.1
%fa_to_both
ZFP41, GPIH
BP1, TOP1MT
, GLI4, ZNF69
6
FALSE
LCN10 [misse
nse]
3003
2923
11252s1
8144295
1421444
50815
155673
24dup
1? 0.1
%fa_to_both
ZFP41, GPIH
BP1, TOP1MT
, GLI4, ZNF69
6
FALSE
NT5E [missen
se]
3002
2989
13825s1
8144391
6101444
00256
8646
6del
1? 0.1
%mo_to_s1
TOP1MT
FALSE
3004
2924
11252s1
8144511
5141445
48018
36504
6dup
1? 0.1
%fa_to_both
MAFA, ZC3H3
FALSE
NT5E [missen
se]
3019
2963
12534p1
8145947
0281460
33780
86752
18dup
1? 0.1
%mo_to_p1
ZNF251, ZNF
34, ZNF517, R
PL8
TRUE
MTMR9 [miss
ense], MMP8
 
[missense]
3020
551
13815p1
8146029
0251460
29623
598
2del
1? 0.5
%mo_to_bot
hZNF517
TRUE
3020
553
13815s1
8146029
0251460
29623
598
2del
1? 0.5
%mo_to_bot
hZNF517
TRUE
3023
591
12106p1
9172
0803
40321
168241
19dup
4? 0.5
%fa_to_both
DOCK8, C9or
f66, CBWD1
TRUEDO
CK8
3023
6239
13366s1
9214
5083
12166
97658
6dup
1? 0.1
%fa_to_s1
DOCK8, C9or
f66
FALSED
OCK8
3022
593
12106s1
9214
5083
40321
125813
14dup
4? 0.5
%fa_to_both
DOCK8, C9or
f66
TRUEDO
CK8
3023
3046
11316p1
9271
6264
07069
135443
27dup
4? 0.5
%mo_to_p1
DOCK8
TRUEDO
CK8
3023
3048
11353p1
9286
4604
07069
120609
26dup
4? 0.5
%mo_to_p1
DOCK8
TRUEDO
CK8
3026
6194
12655p1
9368
0176
77009
308992
35dup
1? 0.5
%mo_to_p1
DOCK8, KAN
K1
TRUEDO
CK8, KANK1
EIF4A1 [misse
nse]
3031
597
12161s1
95968
01860
15607
47589
4dup
1? 0.1
%mo_to_s1
RANBP6, KIA
A2026
FALSE
3035
3103
12512s1
912693
996127
76026
82030
8del
1? 0.1
%mo_to_s1
TYRP1, C9orf
150
FALSE
3036
6208
13094p1
914857
550148
68975
11425
4dup
1? 0.1
%fa_to_p1
FREM1
FALSE
WDFY3 [nons
ense]
3037
6187
12480p1
915564
086156
23411
59325
6del
1? 0.1
%mo_to_p1
C9orf93
TRUE
3044
633
13926s1
927217
685272
29230
11545
5dup
1? 0.1
%fa_to_s1
TEK
TRUE
3047
6176
12252p1
933135
186332
61167
125981
11dup
1? 0.5
%fa_to_both
B4GALT1, No
ne, SPINK4, B
AG1
FALSEB
4GALT1
3047
6177
12252s1
933166
755332
61167
94412
10dup
1? 0.5
%fa_to_both
B4GALT1, No
ne, SPINK4, B
AG1
FALSEB
4GALT1
3050
6214
13129s1
933886
878339
48585
61707
29dup
1? 0.1
%mo_to_bot
hUBAP2, 
UBE2R2
TRUE
3050
8504
13129p1
933886
878339
48585
61707
29dup
1? 0.1
%mo_to_bot
hUBAP2, 
UBE2R2
TRUE
3052
6251
13589p1
934286
614342
90387
3773
2del
2? 0.1
%mo_to_bot
hKIF24
TRUE
WDR55 [miss
ense]
3052
8506
13589s1
934286
614342
90387
3773
2del
2? 0.1
%mo_to_bot
hKIF24
TRUE
3054
606
12741p1
935228
011352
37823
9812
4del
1? 0.1
%mo_to_bot
hUNC13B
TRUEUN
C13BE
HD2 [missens
e]
3054
607
12741s1
935228
011352
37823
9812
4del
1? 0.1
%mo_to_bot
hUNC13B
TRUEUN
C13B
3057
3087
12308p1
935662
942356
64489
1547
4del
8? 1
%mo_to_p1
C9orf100
TRUE
3057
3108
12534p1
935662
942356
64489
1547
4del
8? 1
%fa_to_p1
C9orf100
TRUE
MTMR9 [miss
ense], MMP8
 
[missense]
3057
599
12578p1
935662
942356
64489
1547
4del
8? 1
%fa_to_p1
C9orf100
TRUE
3057
6191
12637p1
935662
942356
64489
1547
4del
8? 1
%fa_to_p1
C9orf100
FALSE
RNPEPL1 [mi
ssense], 
RHOT2 [miss
ense]
3057
6200
12829p1
935662
942356
64489
1547
4del
8? 1
%mo_to_bot
hC9orf10
0
FALSE
3057
6201
12829s1
935662
942356
64489
1547
4del
8? 1
%mo_to_bot
hC9orf10
0
FALSE
3057
8507
13099s1
935664
004356
64489
485
2del
8? 1
%fa_to_s1
C9orf100
TRUE
3073
572
11479s1
988923
343889
68114
44771
21dup
1? 0.1
%fa_to_s1
ZCCHC6
TRUE
3075
6183
12334p1
994984
801949
85771
970
2del
1? 0.5
%mo_to_p1
IARS
FALSE
3087
6231
13296p1
9115407
9291155
80095
172166
8dup
1? 1
%mo_to_p1
SNX30, KIAA1
958, C9orf80
TRUE
3088
3098
12507p1
9115811
6411158
12152
511
2del
2? 0.5
%mo_to_bot
hZFP37
FALSE
3088
3099
12507s1
9115811
6411158
18968
7327
3del
2? 0.5
%mo_to_bot
hZFP37
FALSE
3088
623
13793p1
9115811
6411158
18968
7327
3del
2? 0.5
%mo_to_p1
ZFP37
TRUE
PCDHB4 [mis
sense]
3090
618
13629p1
9116122
7861161
32422
9636
4dup
2? 0.5
%fa_to_both
BSPRY
TRUE
3090
621
13629s1
9116124
6481161
32422
7774
3dup
2? 0.5
%fa_to_both
BSPRY
TRUE
3093
3057
11437s1
9123898
0831239
21297
23214
17del
1? 0.1
%fa_to_s1
CNTRL
FALSE
3097
6232
13296p1
9125562
4011255
89066
26665
4dup
1? 0.1
%mo_to_bot
hOR1K1, 
PDCL
TRUE
3097
6233
13296s1
9125562
4011255
89066
26665
4dup
1? 0.1
%mo_to_bot
hOR1K1, 
PDCL
TRUE
3119
6240
13418p1
9136259
4171362
62421
3004
3del
1? 0.1
%mo_to_p1
C9orf96
FALSE
CGNL1 [miss
ense], 
DENND5B [m
issense], 
LRRC40 [miss
ense]
3127
3050
11356p1
9139634
4011396
51044
16643
16dup
1? 0.1
%mo_to_p1
LCN6, LCN10
, LCN8
TRUE
NAPRT1 [spli
ce], SV2B [missense]
3129
6216
13129s1
9139887
3761398
90995
3619
11dup
1? 0.1
%fa_to_s1
CLIC3, C9orf1
42
TRUE
3135
6182
12334p1
9140126
1541401
27856
1702
6dup
1? 0.1
%fa_to_p1
SLC34A3
FALSE
3138
3126
13808p1
9140458
8851404
59606
721
3del
1? 0.1
%mo_to_p1
WDR85
TRUE
3139
8279
12647s1
9140622
8001407
33293
110493
25dup
1? 0.1
%mo_to_bot
hEHMT1,
 MIR602
TRUEEH
MT1T
MEM218 [mis
sense], 
DHCR7 [miss
ense]
3139
3113
12647p1
9140646
7821407
33293
86511
22del
1? 0.1
%mo_to_bot
hEHMT1,
 MIR602
TRUEEH
MT1S
LC30A5 [miss
ense]
3141
3129
13825s1
9141000
1501411
24256
124106
15dup
3? 0.1
%fa_to_s1
CACNA1B
FALSEC
ACNA1B
323
662
11895s1
1022
5952
532470
306518
50dup
1? 0.1
%fa_to_s1
DIP2C, ZMYN
D11
FALSE
324
3213
11611p1
1085
8865
910210
51345
15dup
1? 0.1
%mo_to_bot
hLARP4B
TRUE
324
3214
11611s1
1085
8865
910210
51345
15dup
1? 0.1
%mo_to_bot
hLARP4B
TRUE
325
8330
13144s1
10312
45793
178030
53451
20dup
1? 0.1
%mo_to_s1
PFKP
TRUE
CACNA1H [fr
ameshift]
326
3215
11622p1
10317
53943
176774
1380
2del
1? 0.5
%fa_to_p1
PFKP
TRUE
330
6333
13162p1
10520
33845
260723
57339
12del
1? 0.1
%mo_to_p1
AKR1C4, AKR
1CL1
TRUE
RIMS1 [frame
shift]
340
3249
12317p1
101827
640718
292287
15880
6dup
1? 0.1
%fa_to_both
SLC39A12
FALSE
340
3250
12317s1
101828
959518
331762
42167
3dup
1? 0.1
%fa_to_both
SLC39A12
FALSE
CHST2 [misse
nse]
344
3150
11094p1
102768
722227
700858
13636
3del
3? 1
%fa_to_p1
PTCHD3
TRUE
CCDC14 [mis
sense]
344
3245
12303p1
102768
722227
700858
13636
3del
3? 1
%mo_to_p1
PTCHD3
TRUE
NCAPD2 [mis
sense]
346
8331
12313s1
102894
040828
945596
5188
3del
2? 0.1
%fa_to_s1
TRUE
PGM3 [misse
nse]
346
684
13793s1
102894
040828
945596
5188
3del
2? 0.1
%mo_to_s1
TRUE
348
679
12741p1
103529
923835
305322
6084
4dup
1? 0.1
%mo_to_p1
CUL2
TRUE
EHD2 [missen
se]
370
6362
13465p1
104837
053251
890838
3520306
320dup
1? 0.1
%fa_to_both
PARG, AGAP
7, ERCC6, CH
AT, VSTM4, N
COA4, 
GDF2, ARHG
AP22, C10orf
71, MSMB, FA
M21A, 
MAPK8, OGD
HL, C10orf53
, None, DRGX
, 
FRMPD2P1, R
BP3, FRMPD
2, ZNF488, C
10orf128, 
WDFY4, PTPN
20B, GDF10, 
TIMM23
TRUECH
AT, ERCC6
370
6365
13465s1
104837
053251
890838
3520306
320dup
1? 0.1
%fa_to_both
PARG, AGAP
7, ERCC6, CH
AT, VSTM4, N
COA4, 
GDF2, ARHG
AP22, C10orf
71, MSMB, FA
M21A, 
MAPK8, OGD
HL, C10orf53
, None, DRGX
, 
FRMPD2P1, R
BP3, FRMPD
2, ZNF488, C
10orf128, 
WDFY4, PTPN
20B, GDF10, 
TIMM23
TRUECH
AT, ERCC6
374
656
11696p1
105452
789654
531395
3499
4del
5? 2
%mo_to_p1
MBL2
FALSE
INCENP [miss
ense]
374
8347
12997p1
105452
789654
531395
3499
4del
5? 2
%fa_to_both
MBL2
TRUE
374
8348
12997s1
105452
789654
531395
3499
4del
5? 2
%fa_to_both
MBL2
TRUE
376
673
12304s1
105628
757156
424022
136451
4del
1? 0.1
%mo_to_s1
PCDH15
FALSEPC
DH15
379
8349
13097s1
106442
588864
430061
4173
2del
1? 0.5
%fa_to_s1
ZNF365
FALSE
380
658
11711p1
106828
037468
381542
101168
2del
1? 0.1
%fa_to_both
CTNNA3
TRUECT
NNA3K
IAA0182 [mis
sense]
380
659
11711s1
106828
037468
381542
101168
2del
1? 0.1
%fa_to_both
CTNNA3
TRUECT
NNA3
382
3174
11267p1
107260
422972
645686
41457
16del
1? 0.1
%mo_to_bot
hSGPL1, 
PCBD1
TRUE
382
3175
11267s1
107260
422972
645686
41457
16del
1? 0.1
%mo_to_bot
hSGPL1, 
PCBD1
TRUE
384
3178
11298p1
107501
060975
016174
5565
5dup
1? 0.5
%mo_to_p1
C10orf103
TRUE
SLC6A13 [mis
sense]
395
6270
12510p1
109070
355390
707143
3590
2dup
4? 2
%fa_to_both
ACTA2
TRUE
395
6271
12510s1
109070
355390
707143
3590
2dup
4? 2
%fa_to_both
ACTA2
TRUE
395
6281
12518p1
109070
355390
707143
3590
2dup
4? 2
%fa_to_both
ACTA2
FALSE
395
6286
12518s1
109070
355390
707143
3590
2dup
4? 2
%fa_to_both
ACTA2
FALSE
403
3194
11474p1
109553
713595
557560
20425
6dup
1? 0.1
%mo_to_bot
hLGI1
FALSELG
I1
403
3195
11474s1
109553
713595
557560
20425
6dup
1? 0.1
%mo_to_bot
hLGI1
FALSELG
I1Z
W10 [missens
e]
404
3167
11180s1
109646
654096
495201
28661
5del
1? 0.1
%mo_to_bot
hCYP2C1
8
FALSE
404
3166
11180p1
109648
015296
495201
15049
4del
1? 0.1
%mo_to_bot
hCYP2C1
8
FALSE
400
670
12285p1
1010159
4136101
596047
1911
2dup
5? 2
%mo_to_bot
hABCC2
FALSE
EIF4G1 [miss
ense]
400
671
12285s1
1010159
4136101
596047
1911
2dup
5? 2
%mo_to_bot
hABCC2
FALSE
ZNF780A [mis
sense]
400
3253
12383s1
1010159
4136101
596047
1911
2dup
5? 2
%mo_to_bot
hABCC2
FALSE
401
652
11629p1
1010329
0993103
310617
19624
8dup
1? 0.5
%fa_to_both
BTRC
TRUE
FBXO10 [miss
ense]
401
653
11629s1
1010329
0993103
310617
19624
8dup
1? 0.5
%fa_to_both
BTRC
TRUE
396
6295
12644p1
1011442
7976114
496847
68871
3dup
1? 0.1
%mo_to_bot
hVTI1A
TRUE
396
6296
12644s1
1011442
7976114
496847
68871
3dup
1? 0.1
%mo_to_bot
hVTI1A
TRUE
402
3201
11519p1
1012221
6817122
349014
132197
7del
1? 0.1
%mo_to_p1
PPAPDC1A
TRUE
SMC3 [misse
nse], 
SUV420H1 [m
issense]
413
8359
13512p1
1012467
2277124
697675
25398
3del
1? 0.1
%mo_to_p1
C10orf88, FAM
24A
TRUE
414
6375
13512p1
1012480
0724124
924572
123848
19dup
1? 0.1
%mo_to_p1
HMX3, HMX2
, BUB3, ACAD
SB
TRUE
416
3209
11561p1
1013473
4122134
793304
59182
20dup
2? 0.5
%mo_to_bot
hC10orf9
3
FALSE
416
3210
11561s1
1013473
4122134
793304
59182
20dup
2? 0.5
%mo_to_bot
hC10orf9
3
FALSE
397
3291
14110s1
1013503
2334135
032594
260
2dup
1? 0.1
%mo_to_bot
hKNDC1
FALSE
VPS53 [misse
nse]
420
6334
13162s1
1013516
8873135
179599
10726
9dup
1? 0.1
%mo_to_bot
hECHS1, 
C10orf125
TRUE
420
8356
13162p1
1013517
6371135
179599
3228
3dup
1? 0.1
%mo_to_bot
hECHS1
TRUE
RIMS1 [frame
shift]
449
3356
11810p1
1119
3099
249057
55958
36dup
1? 0.5
%fa_to_both
ODF3, SIRT3,
 RIC8A, PSMD
13, SCGB1C1
, BET1L
FALSE
449
3358
11810s1
1119
3099
249057
55958
36dup
1? 0.5
%fa_to_both
ODF3, SIRT3,
 RIC8A, PSMD
13, SCGB1C1
, BET1L
FALSE
RSRC1 [nons
ense]
450
3350
11766p1
1124
4040
244469
429
2dup
10? 
2%mo_to_p1
PSMD13
FALSE
KIAA0100 [mi
ssense]
464
3321
11180p1
11486
94664
904113
34647
2del
1? 0.5
%fa_to_both
OR51T1, OR5
1S1
FALSE
464
3322
11180s1
11486
94664
904113
34647
2del
1? 0.5
%fa_to_both
OR51T1, OR5
1S1
FALSE
465
6393
12683s1
11500
94415
021160
11719
7del
1? 0.1
%mo_to_s1
OR51L1, MM
P26
FALSE
C6orf174 [3n-
non-
frameshifting]
467
697
11569p1
11586
21855
878932
16747
2del
4? 2
%fa_to_both
OR52E6, OR5
2E8
TRUE
TNKS [missen
se]
467
699
11569s1
11586
21855
878932
16747
2del
4? 2
%fa_to_both
OR52E6, OR5
2E8
TRUE
473
721
12581s1
11794
92647
961067
11803
2del
1? 0.1
%fa_to_s1
OR10A3, OR1
0A6
TRUE
MAML3 [miss
ense]
476
714
11964p1
111488
058814
902314
21726
7dup
1? 0.1
%fa_to_p1
CYP2R1, PDE
3B
TRUE
479
3345
11667p1
111872
736318
729842
2479
4del
1? 0.1
%mo_to_bot
hIGSF22
TRUE
479
3347
11667s1
111872
736318
729842
2479
4del
1? 0.1
%mo_to_bot
hIGSF22
TRUE
CDH3 [missen
se]
480
3408
12512s1
112080
522520
869299
64074
2del
1? 0.1
%fa_to_s1
NELL1
FALSE
481
6387
12523s1
112221
503822
833562
618524
42del
1? 0.1
%mo_to_bot
hANO5, S
LC17A6, GAS
2, FANCF
FALSEFA
NCF, SLC17A6
481
6386
12523p1
112223
280922
833562
600753
40del
1? 0.1
%mo_to_bot
hANO5, S
LC17A6, GAS
2, FANCF
FALSEFA
NCF, SLC17A6
G3BP2 [misse
nse]
482
744
13926p1
113131
219331
541638
229445
16del
1? 0.1
%mo_to_bot
hIMMP1L
, DNAJC24, D
CDC1, ELP4
TRUE
FBXW9 [miss
ense]
482
745
13926s1
113131
219331
541638
229445
16del
1? 0.1
%mo_to_bot
hIMMP1L
, DNAJC24, D
CDC1, ELP4
TRUE
483
6436
13296p1
113260
855732
623945
15388
10dup
1? 1
%mo_to_p1
EIF3M
TRUE
484
728
12810p1
113269
711032
781789
84679
8del
1? 0.1
%mo_to_p1
CCDC73
TRUE
485
6383
12445s1
113298
784533
113886
126041
30dup
1? 0.1
%fa_to_s1
TCP11L1, QS
ER1, DEPDC7
, CSTF3
TRUE
490
3439
13843p1
114377
246043
775671
3211
2del
1? 1
%fa_to_p1
HSD17B12
TRUE
AGK [missens
e]
498
6411
13097s1
115543
264255
595630
162988
6del
1? 1
%mo_to_s1
OR5L1, OR5L
2, OR5D18, O
R5D13, OR4C
6, 
OR5D14
FALSE
501
3327
11219p1
115725
638857
327905
71517
25dup
1? 0.1
%fa_to_p1
UBE2L6, Non
e, TIMM10, S
LC43A1, SMT
NL1
FALSE
500
3393
12303p1
115924
490259
283330
38428
3dup
3? 0.1
%mo_to_p1
OR4D9, OR4D
10, OR4D11
TRUE
NCAPD2 [mis
sense]
503
6402
12836s1
115962
044759
623531
3084
4del
1? 0.5
%mo_to_bot
hTCN1
FALSE
503
6401
12836p1
115962
067559
622308
1633
2del
1? 0.5
%mo_to_bot
hTCN1
FALSE
TBC1D2B [mi
ssense]
511
6379
12334s1
116260
042362
607042
6619
11dup
1? 0.5
%mo_to_s1
WDR74
FALSE
CSNK1G3 [fra
meshift], 
PDS5A [nons
ense]
512
710
11788s1
116305
763763
059115
1478
2dup
1? 0.5
%mo_to_s1
SLC22A10
TRUE
514
3451
14110s1
116456
905064
569229
179
2dup
1? 0.1
%fa_to_both
MAP4K2
FALSE
VPS53 [misse
nse]
529
6415
13148s1
117695
473876
956547
1809
2del
1? 0.1
%mo_to_bot
hGDPD4
TRUE
KIAA1244 [mi
ssense]
529
8375
13148p1
117695
473876
956547
1809
2del
1? 0.1
%mo_to_bot
hGDPD4
TRUE
534
701
11610p1
118542
983285
468768
38936
10del
1? 0.5
%mo_to_bot
hSYTL2
FALSE
DNAH5 [fram
eshift], HDLB
P 
[missense]
534
702
11610s1
118542
983285
468768
38936
10del
1? 0.5
%mo_to_bot
hSYTL2
FALSE
549
7715
11115s1
1110055
8409100
831720
273311
14dup
8? 0.1
%fa_to_s1
TRUE
549
7466
11229p1
1110055
8409100
859532
301123
24dup
8? 0.1
%fa_to_p1
FALSE
550
6461
13502p1
1110270
8028102
709441
1413
2del
1? 0.1
%mo_to_bot
hMMP3
TRUE
550
6462
13502s1
1110270
8028102
709441
1413
2del
1? 0.1
%mo_to_bot
hMMP3
TRUE
554
3346
11667s1
1110825
6646108
264105
7459
2dup
3? 0.5
%mo_to_bot
hC11orf6
5
TRUE
CDH3 [missen
se]
554
7717
11667p1
1110825
6646108
264105
7459
2dup
3? 0.5
%mo_to_bot
hC11orf6
5
TRUE
554
7467
11773s1
1110825
6646108
264105
7459
2dup
3? 0.5
%mo_to_s1
C11orf65
TRUE
CCDC15 [mis
sense]
554
3433
13808s1
1110825
6646108
264105
7459
2dup
3? 0.5
%mo_to_s1
C11orf65
TRUE
558
725
12630s1
1111204
6324112
084584
38260
9del
2? 0.1
%fa_to_s1
BCO2
TRUE
559
6439
13307s1
1111366
9961113
725057
55096
24dup
1? 1
%mo_to_s1
USP28
TRUE
571
3357
11810p1
1112450
8437124
509763
1326
2del
1? 0.1
%mo_to_p1
SIAE
FALSE
576
3430
13730p1
1113415
1270134
215023
63753
21del
2? 0.5
%mo_to_p1
GLB1L2, GLB
1L3
FALSE
DICER1 [miss
ense]
576
6429
13239s1
1113416
2025134
214349
52324
16dup
2? 0.1
%fa_to_s1
GLB1L2, GLB
1L3
FALSE
575
6406
12851p1
1113417
7017134
257553
80536
34dup
1? 0.1
%fa_to_both
GLB1L2, GLB
1L3, B3GAT1
TRUEB3
GAT1S
LC25A29 [mis
sense], 
HMGXB3 [mis
sense], 
UBE3C [miss
ense]
575
6407
12851s1
1113417
7017134
257553
80536
34dup
1? 0.1
%fa_to_both
GLB1L2, GLB
1L3, B3GAT1
TRUEB3
GAT1
579
7722
11282p1
12465
10494
668159
17110
8dup
3? 0.5
%mo_to_bot
hRAD51A
P1
TRUE
579
7723
11519p1
12465
10494
668159
17110
8dup
3? 0.5
%fa_to_p1
RAD51AP1
TRUE
SMC3 [misse
nse], 
SUV420H1 [m
issense]
579
3495
11282s1
12465
54744
668159
12685
6dup
3? 0.5
%mo_to_bot
hRAD51A
P1
TRUE
579
3621
13809p1
12465
54744
668159
12685
6dup
3? 0.5
%fa_to_p1
RAD51AP1
TRUE
596
3597
12802s1
12860
87078
689862
81155
14del
1? 0.1
%mo_to_bot
hCLEC4E
, CLEC4D, CL
EC6A
TRUE
596
7730
12802p1
12860
87078
689862
81155
14del
1? 0.1
%mo_to_bot
hCLEC4E
, CLEC4D, CL
EC6A
TRUE
602
3594
12736p1
121033
792810
342739
4811
3dup
1? 0.1
%mo_to_bot
hC12orf5
9
TRUE
602
7731
12736s1
121033
792810
342739
4811
3dup
1? 0.1
%mo_to_bot
hC12orf5
9
TRUE
614
6513
13125p1
122100
796121
377773
369812
40del
2? 0.1
%mo_to_bot
hLST-3TM
12, SLCO1B1
, SLCO1B3
TRUE
RAD21L1 [no
nsense]
614
6514
13125s1
122100
796121
392123
384162
41del
2? 0.1
%mo_to_bot
hLST-3TM
12, SLCO1B1
, SLCO1B3
TRUE
619
8313
12997p1
122526
470525
267804
3099
2del
1? 0.1
%fa_to_both
CASC1
TRUE
619
8314
12997s1
122526
470525
267804
3099
2del
1? 0.1
%fa_to_both
CASC1
TRUE
628
3618
13739s1
123353
276638
712260
5179494
11dup
2? 0.5
%mo_to_s1
ALG10, SYT1
0, ALG10B
TRUE
COL11A1 [mi
ssense]
640
3463
11090p1
124968
898349
691056
2073
5dup
5? 2
%mo_to_p1
PRPH
TRUE
640
6506
12836s1
124968
898349
691056
2073
5dup
5? 2
%fa_to_s1
PRPH
FALSE
642
6494
12719s1
125047
532750
484373
9046
9dup
1? 0.1
%mo_to_s1
ACCN2, SMA
RCD1
TRUE
643
3540
11828p1
125120
323851
213562
10324
4dup
1? 0.1
%mo_to_p1
ATF1
TRUE
647
3473
11114p1
125277
486352
779369
4506
6dup
1? 1
%mo_to_bot
hKRT84
TRUE
SCN2A [nons
ense]
647
7739
11114s1
125277
486352
779369
4506
6dup
1? 1
%mo_to_bot
hKRT84
TRUE
DZANK1 [mis
sense]
658
3479
11180s1
125661
996256
620203
241
2dup
1? 0.1
%fa_to_s1
OBFC2B
FALSE
663
8395
12723p1
125734
581257
351246
5434
4dup
4? 1
%fa_to_both
RDH16
TRUE
663
8396
12723s1
125734
581257
351246
5434
4dup
4? 1
%fa_to_both
RDH16
TRUE
SERAC1 [mis
sense]
663
807
13835p1
125734
581257
348948
3136
3del
4? 1
%mo_to_bot
hRDH16
TRUE
663
808
13835s1
125734
581257
351246
5434
4del
4? 1
%mo_to_bot
hRDH16
TRUE
671
6474
12334p1
126974
218869
744052
1864
2del
1? 0.1
%fa_to_p1
LYZ
FALSE
672
7751
12763p1
127543
609275
905377
469285
49dup
1? 0.1
%mo_to_p1
KCNC2, GLIP
R1, KRR1, CA
PS2, GLIPR1L
2, 
GLIPR1L1
TRUE
674
6492
12716s1
128019
056180
211298
20737
9dup
1? 0.1
%mo_to_s1
PPP1R12A
TRUE
675
3599
12869p1
128063
263080
730784
98154
31del
2? 0.5
%mo_to_bot
hOTOGL
TRUE
PRCP [missen
se]
675
3601
12869s1
128063
263080
730335
97705
30del
2? 0.5
%mo_to_bot
hOTOGL
TRUE
675
802
13798p1
128063
263080
730784
98154
32del
2? 0.5
%fa_to_p1
OTOGL
FALSE
680
8398
13215p1
129670
482996
707232
2403
2dup
1? 0.1
%mo_to_p1
CDK17
TRUE
684
3629
13832s1
1210417
1606104
174142
2536
2del
1? 0.1
%fa_to_s1
NT5DC3
TRUE
688
774
11895s1
1210929
0781109
293251
2470
3del
1? 0.1
%mo_to_s1
DAO
FALSED
AO
693
786
12581p1
1211218
2446112
308984
126538
28dup
2? 0.5
%mo_to_p1
ACAD10, MA
PKAPK5, C12
orf47, ALDH2
TRUE
697
3607
13322p1
1211429
6595114
404032
107437
22dup
1? 0.1
%mo_to_p1
RBM19
TRUE
703
3489
11241p1
1212087
5929120
884632
8703
7dup
2? 0.5
%mo_to_p1
GATC, COX6A
1, TRIAP1
TRUE
C18orf26 [mis
sense]
703
6505
12836p1
1212087
5929120
884632
8703
7dup
2? 0.1
%fa_to_p1
GATC, COX6A
1, TRIAP1
FALSE
TBC1D2B [mi
ssense]
716
7488
12285p1
1213263
3328132
636946
3618
7dup
1? 0.1
%fa_to_p1
NOC4L
FALSE
EIF4G1 [miss
ense]
720
3483
11196p1
1213361
8053133
781116
163063
16dup
1? 0.5
%mo_to_p1
ZNF140, ZNF
268, ZNF10, Z
NF84
TRUE
724
3678
11509s1
132042
549420
426320
826
2dup
5? 1
%fa_to_s1
ZMYM5
FALSE
724
7476
12011p1
132042
549420
426320
826
2dup
5? 1
%fa_to_both
ZMYM5
TRUE
FAM45A [mis
sense]
724
7477
12011s1
132042
549420
426320
826
2dup
5? 1
%fa_to_both
ZMYM5
TRUE
724
3718
12561p1
132042
549420
426320
826
2dup
5? 1
%fa_to_p1
ZMYM5
FALSE
NLRX1 [misse
nse], 
ADAM33 [non
sense]
724
3739
13322p1
132042
549420
426320
826
2dup
5? 1
%fa_to_p1
ZMYM5
TRUE
730
3683
11676p1
132130
321821
311944
8726
3dup
1? 0.1
%mo_to_p1
N6AMT2
TRUE
736
3656
11154s1
132527
607225
285660
9588
10dup
3? 0.1
%fa_to_s1
ATP12A
TRUE
742
6576
13590p1
133330
623734
405491
1099254
43dup
1? 0.1
%mo_to_bot
hKL, STAR
D13, RFC3, P
DS5B
TRUE
MTHFS [fram
eshift], 
EFCAB5 [fram
eshift]
742
6577
13590s1
133330
930834
540262
1230954
45dup
1? 0.1
%mo_to_bot
hKL, STAR
D13, RFC3, P
DS5B
TRUE
754
3749
13808p1
134908
476449
086310
1546
3del
4? 0.1
%fa_to_both
RCBTB2
TRUE
754
3750
13808s1
134908
476449
086310
1546
3del
4? 0.1
%fa_to_both
RCBTB2
TRUE
755
3666
11298s1
135012
359350
134220
10627
5dup
1? 0.5
%mo_to_s1
RCBTB1
TRUERC
BTB1
759
3747
13774p1
137611
179976
123998
12199
3dup
9? 2
%mo_to_p1
COMMD6, UC
HL3
TRUE
DNAH11 [mis
sense]
764
3671
11412p1
139650
841196
515968
7557
4del
1? 0.5
%fa_to_both
UGGT2
FALSE
764
3672
11412s1
139650
841196
515968
7557
4del
1? 0.5
%fa_to_both
UGGT2
FALSE
C16orf62 [mis
sense]
779
817
11190s1
1311117
6011111
319820
143809
21dup
1? 0.1
%mo_to_s1
RAB20, CARS
2, None, CAR
KD
TRUE
785
3687
11808p1
1311397
9975113
980406
431
2dup
1? 0.1
%mo_to_p1
GRTP1
FALSE
786
6555
12829s1
1311413
8154114
175048
36894
8dup
1? 0.1
%mo_to_s1
DCUN1D2, TM
CO3
FALSE
789
3732
13063p1
1311451
4708114
535461
20753
7dup
2? 0.5
%mo_to_bot
hGAS6, F
AM70B
FALSE
TPK1 [missen
se]
789
3733
13063s1
1311451
4708114
531684
16976
6dup
2? 0.5
%mo_to_bot
hGAS6, F
AM70B
FALSE
790
3660
11196p1
1311500
2273115
091756
89483
26dup
1? 0.5
%mo_to_bot
hUPF3A, 
CDC16, ZNF8
28
TRUEUP
F3A
790
3661
11196s1
1311500
7595115
091756
84161
23dup
1? 0.5
%mo_to_bot
hUPF3A, 
CDC16, ZNF8
28
TRUEUP
F3A
798
6586
12460p1
142121
573921
424416
208677
6dup
4? 0.1
%mo_to_p1
RNASE1, RNA
SE3, RNASE2
, RNASE6, ED
DM3A, 
EDDM3B
TRUE
794
3811
11766s1
142123
830921
270227
31918
3dup
1? 0.1
%fa_to_s1
RNASE1, EDD
M3B, RNASE
6
FALSE
SYTL3 [nonse
nse]
795
7780
11094p1
142135
984521
624184
264339
73dup
1? 0.1
%fa_to_p1
RNASE3, RNA
SE2, RNASE7
, RNASE13, R
NASE8, 
ZNF219, OR5
AU1, TPPP2, 
NDRG2, SLC
39A2, 
METTL17, AR
HGEF40
TRUE
CCDC14 [mis
sense]
811
3788
11242p1
142213
329622
410148
276852
19dup
1? 0.1
%fa_to_p1
None, OR4E2
TRUE
810
6604
13139p1
142218
023522
192868
12633
2del
1? 0.1
%fa_to_p1
None
FALSE
809
3851
13912s1
142289
120922
928469
37260
8del
5? 0.1
%mo_to_bot
hNone
TRUE
845
6606
13153p1
146375
759563
860646
103051
10dup
2? 0.5
%fa_to_p1
RHOJ, GPHB
5, PPP2R5E
TRUE
MAPK13 [mis
sense]
846
3843
13195p1
146400
624664
066660
60414
2dup
1? 0.1
%fa_to_both
PPP2R5E, WD
R89
TRUE
TSPYL5 [3n-n
on-
frameshifting]
, SCARB2 
[missense]
851
865
12304p1
146501
671565
019579
2864
3dup
3? 2
%fa_to_p1
C14orf50
FALSE
STIL [missens
e], PSEN1 
[missense], P
HF19 
[missense]
851
877
12810p1
146501
671565
019579
2864
3dup
3? 2
%fa_to_p1
C14orf50
TRUE
854
8420
12582p1
146991
995769
969596
49639
10dup
1? 0.5
%mo_to_p1
SLC39A9
TRUE
EMID2 [misse
nse]
856
888
13533p1
147452
228674
541720
19434
15del
1? 0.1
%fa_to_both
ALDH6A1, C1
4orf45
FALSE
856
891
13533s1
147452
228674
541720
19434
15del
1? 0.1
%fa_to_both
ALDH6A1, C1
4orf45
FALSE
ENOX2 [misse
nse]
860
6596
12697p1
147837
414578
392304
18159
3del
1? 0.1
%mo_to_p1
ADCK1
TRUE
863
3814
11810p1
149076
758290
784457
16875
6dup
1? 0.1
%mo_to_p1
C14orf102
FALSE
870
3782
11118p1
149918
252899
183611
1083
2del
4? 2
%fa_to_p1
C14orf177
TRUE
870
3795
11356p1
149918
252899
183611
1083
2del
4? 2
%mo_to_p1
C14orf177
TRUE
NAPRT1 [spli
ce], SV2B [missense]
870
879
13116p1
149918
252899
183611
1083
2del
4? 2
%fa_to_both
C14orf177
TRUE
870
881
13116s1
149918
252899
183611
1083
2del
4? 2
%fa_to_both
C14orf177
TRUE
SRRM5 [miss
ense]
870
904
14201s1
149918
252899
183611
1083
2del
4? 2
%mo_to_s1
C14orf177
TRUE
871
6580
11894p1
1410040
2365100
405664
3299
4dup
3? 0.5
%mo_to_p1
EML1
FALSEEM
L1
871
3826
12317s1
1410040
2365100
405664
3299
4dup
3? 0.5
%fa_to_s1
EML1
FALSEEM
L1C
HST2 [missen
se]
874
6583
12396p1
1410583
6177105
861009
24832
17dup
1? 0.1
%fa_to_p1
PACS2
TRUE
888
6675
13493p1
152073
949623
445762
2706266
97del
6? 0.5
%fa_to_p1
None, CYFIP1
, NIPA2, LOC
727924, GOL
GA6L1, 
OR4N4, POTE
B, NIPA1, GO
LGA8E, TUBG
CP5
TRUENI
PA2, NIPA1, CYFIP1, TUBGCP5
888
4002
13393p1
152273
953323
062319
322786
66dup
6? 0.5
%fa_to_both
NIPA2, NIPA1
, CYFIP1, GO
LGA6L1, TUB
GCP5
FALSEN
IPA2, NIPA1, CYFIP1, TUBGCP5
888
6640
12735p1
152283
591523
062319
226404
62del
6? 0.5
%fa_to_p1
NIPA2, NIPA1
, CYFIP1, TUB
GCP5
TRUENI
PA2, NIPA1, CYFIP1, TUBGCP5
888
4003
13393s1
152283
591523
062319
226404
62dup
6? 0.5
%fa_to_both
NIPA2, NIPA1
, CYFIP1, TUB
GCP5
FALSEN
IPA2, NIPA1, CYFIP1, TUBGCP5
8437
13493p1
152283
591523
062319
226404
62del
6? 0.5
%fa_to_p1
NIPA2, NIPA1
, CYFIP1, TUB
GCP5
TRUENI
PA2, NIPA1, CYFIP1, TUBGCP5
898
6672
13487s1
153135
691031
369124
12214
6del
4? 0.5
%fa_to_s1
TRPM1
FALSE
897
8440
12833p1
153232
310032
404100
81000
3dup
2? 1
%fa_to_both
CHRNA7
TRUECH
RNA7
897
8441
12833s1
153232
310032
404100
81000
3dup
2? 1
%fa_to_both
CHRNA7
TRUECH
RNA7
905
936
13890s1
153698
388537
002162
18277
5del
1? 0.1
%fa_to_s1
C15orf41
TRUE
909
3954
12317s1
154099
326141
001314
8053
3del
1? 0.1
%mo_to_s1
RAD51
FALSER
AD51
CHST2 [misse
nse]
915
934
13629s1
154206
747342
147943
80470
75dup
1? 0.1
%mo_to_s1
JMJD7-PLA2
G4B, SPTBN5
, MAPKBP1
TRUE
918
4013
13825p1
154280
743442
836322
28888
5dup
1? 0.1
%mo_to_p1
LRRC57, SNA
P23
FALSE
920
6655
13148p1
154357
701143
585152
8141
5del
1? 0.1
%fa_to_both
TGM7
TRUE
920
6656
13148s1
154357
701143
584295
7284
4del
1? 0.1
%fa_to_both
TGM7
TRUE
KIAA1244 [mi
ssense]
921
6651
13018p1
154362
789343
644143
16250
9dup
1? 0.1
%mo_to_p1
ADAL
TRUE
PCOLCE [fram
eshift], 
MCOLN3 [mis
sense]
922
914
11479p1
154369
661043
701294
4684
5dup
1? 0.1
%mo_to_p1
TP53BP1, TU
BGCP4
TRUE
929
6664
13216s1
155073
127051
535109
803839
113dup
1? 0.1
%fa_to_s1
USP8, TNFAIP
8L3, CYP19A
1, USP50, AP
4E1, 
SPPL2A, TRP
M7
TRUEAP
4E1
932
6670
13330p1
155399
194654
025346
33400
12del
2? 0.1
%mo_to_p1
WDR72
FALSE
934
4015
13843p1
155547
551255
497903
22391
6dup
2? 0.1
%mo_to_p1
RAB27A, RSL
24D1
TRUERA
B27AA
GK [missense
]
936
3938
12295p1
155773
019757
754090
23893
7dup
2? 0.5
%mo_to_bot
hCGNL1
FALSE
936
6644
12837p1
155773
019757
754090
23893
7dup
2? 0.5
%fa_to_p1
CGNL1
TRUE
SH3RF3 [miss
ense]
936
3939
12295s1
155773
257457
746016
13442
5dup
2? 0.5
%mo_to_bot
hCGNL1
FALSE
963
906
11229s1
157642
660476
641077
214473
28dup
1? 0.1
%fa_to_s1
ETFA, C15orf
27, ISL2, SCA
PER
FALSE
974
3878
11241s1
158546
175485
488431
26677
10del
3? 0.1
%fa_to_s1
SLC28A1
TRUE
SNRNP200 [m
issense]
974
6628
12420p1
158546
175485
488431
26677
10del
3? 0.1
%mo_to_p1
SLC28A1
FALSE
HRH2 [missen
se], GOLGA4
 
[missense]
974
6633
12445p1
158546
175485
488431
26677
10del
3? 0.1
%mo_to_p1
SLC28A1
TRUE
976
3867
11092p1
158602
895186
129054
100103
7dup
1? 0.5
%mo_to_p1
AKAP13
TRUE
981
3977
12651s1
159148
614191
500955
14814
15del
2? 0.1
%fa_to_s1
RCCD1, UNC
45A
TRUE
981
4005
13543p1
159148
812191
520001
31880
25del
2? 0.1
%mo_to_p1
RCCD1, PRC
1, UNC45A
TRUE
983
6658
13197s1
159485
751094
888393
30883
6dup
1? 0.1
%mo_to_s1
MCTP2
TRUE
GMPPA [3n-n
on-
frameshifting]
984
911
11452s1
1510197
0191101
983865
13674
4dup
1? 0.1
%mo_to_s1
PCSK6
TRUE
985
3892
11356s1
1510222
4277102
501099
276822
15dup
2? 0.1
%mo_to_s1
OR4F6, TARS
L2, OR4F4, O
R4F15
TRUE
DLGAP3 [mis
sense], 
STEAP3 [miss
ense], DHX30
 
[missense]
985
3969
12375p1
1510222
4277102
261525
37248
10dup
2? 0.1
%fa_to_p1
TARSL2
TRUE
FAM205A [mi
ssense]
992
6741
12826p1
1645
4955
461578
6623
7del
1? 0.1
%mo_to_bot
hDECR2
TRUE
TROVE2 [fram
eshift]
992
6742
12826s1
1645
4955
461578
6623
7del
1? 0.1
%mo_to_bot
hDECR2
TRUE
1015
6809
13398s1
16301
45213
100547
86026
53dup
1? 0.1
%mo_to_bot
hHCFC1R
1, PAQR4, PK
MYT1, CCDC
64B, CLDN9,
 
MMP25, KRE
MEN2, TNFRS
F12A, THOC6
, CLDN6
TRUE
1015
6808
13398p1
16301
63253
100547
84222
50dup
1? 0.1
%mo_to_bot
hHCFC1R
1, PAQR4, PK
MYT1, CCDC
64B, CLDN9,
 
MMP25, KRE
MEN2, TNFRS
F12A, THOC6
, CLDN6
TRUE
POGZ [frames
hift], PYHIN1 
[missense], TT
N [missense]
1017
6736
12735p1
16310
70333
119356
12323
12del
1? 0.1
%fa_to_p1
MMP25, IL32
TRUE
1020
6825
13512p1
16439
03194
391505
1186
3dup
1? 0.1
%fa_to_p1
CORO7-PAM
16
TRUE
1022
6752
13018s1
16462
15614
642368
20807
10del
1? 0.1
%mo_to_bot
h
TRUE
1022
6751
13018p1
16462
49724
642368
17396
8del
1? 0.1
%mo_to_bot
h
TRUE
PCOLCE [fram
eshift], 
MCOLN3 [mis
sense]
1023
4043
11118p1
16487
13804
871598
218
2dup
2? 1
%mo_to_p1
GLYR1
TRUE
1029
6796
13327s1
16872
25868
997259
274673
53dup
1? 0.1
%mo_to_bot
hUSP7, M
ETTL22, TME
M186, ABAT, 
PMM2, 
CARHSP1
TRUEPM
M2, ABAT
1028
6794
13327p1
16872
88958
740000
11105
8dup
1? 0.1
%mo_to_bot
hMETTL2
2
TRUE
1039
6719
12652s1
161512
559115
168718
43127
24dup
3? 1
%fa_to_s1
NTAN1, PDXD
C1, RRN3
TRUE
CTSB [missen
se]
1039
6756
13125p1
161512
559115
135533
9942
12dup
3? 1
%fa_to_both
NTAN1, PDXD
C1
TRUE
RAD21L1 [no
nsense]
1039
6757
13125s1
161512
671715
135533
8816
11dup
3? 1
%fa_to_both
NTAN1, PDXD
C1
TRUE
1045
6772
13215p1
161559
617815
609285
13107
6del
1? 0.1
%fa_to_p1
C16orf45
TRUE
1046
6829
13689s1
161614
658016
276445
129865
36dup
1? 0.1
%mo_to_s1
ABCC6, ABC
C1
TRUE
ODZ4 [missen
se], FBN1 
[missense], IA
RS 
[nonsense]
1054
989
13593s1
161960
311119
663412
60301
19dup
1? 0.1
%fa_to_s1
C16orf62
FALSE
1057
6760
13139p1
162162
396521
742251
118286
30del
5? 1
%mo_to_bot
hMETTL9
, OTOA
FALSE
1057
6761
13139s1
162162
396521
763826
139861
35del
5? 1
%mo_to_bot
hMETTL9
, OTOA
FALSE
C7orf71 [miss
ense]
1057
6696
12441p1
162165
561021
763826
108216
29dup
5? 1
%mo_to_p1
METTL9, OTO
A
TRUE
1057
6727
12697p1
162165
561021
768598
112988
31del
5? 1
%fa_to_p1
METTL9, OTO
A
TRUE
1062
4115
12100s1
162307
936623
117808
38442
14dup
3? 0.1
%fa_to_s1
USP31
TRUE
1080
7516
11629p1
162967
504930
199897
524848
192dup
7? 0.1
%mo_to_p1
DOC2A, ASP
HD1, CORO1
A, TBX6, KIF2
2, CDIPT, 
QPRT, YPEL3
, PPP4C, MAP
K3, SPN, MVP
, 
FAM57B, ALD
OA, INO80E, 
SEZ6L2, TAO
K2, 
KCTD13, MAZ
, PRRT2, GDP
D3, C16orf92
, 
C16orf53, TM
EM219, C16o
rf54, HIRIP3
TRUEAL
DOA, MAPK3, SEZ6L2
FBXO10 [miss
ense]
1080
4203
13509p1
162967
504930
199408
524359
191dup
7? 0.1
%fa_to_p1
DOC2A, ASP
HD1, CORO1
A, TBX6, KIF2
2, CDIPT, 
QPRT, YPEL3
, PPP4C, MAP
K3, SPN, MVP
, 
FAM57B, ALD
OA, INO80E, 
SEZ6L2, TAO
K2, 
KCTD13, MAZ
, PRRT2, GDP
D3, C16orf92
, 
C16orf53, TM
EM219, C16o
rf54, HIRIP3
TRUEAL
DOA, MAPK3, SEZ6L2
1085
966
12152s1
163079
303830
849705
56667
5dup
1? 0.1
%mo_to_s1
ZNF629
TRUE
1088
943
11229p1
163147
717031
488897
11727
11dup
1? 0.5
%mo_to_bot
hTGFB1I1
, ARMC5
FALSE
1088
945
11229s1
163147
717031
487883
10713
10dup
1? 0.5
%mo_to_bot
hTGFB1I1
, ARMC5
FALSE
1098
6806
13366p1
165759
600557
604447
8442
9del
1? 0.1
%mo_to_bot
hGPR114
FALSE
1098
8471
13366s1
165759
600557
604447
8442
9del
1? 0.1
%mo_to_bot
hGPR114
FALSE
1101
8472
13227s1
166696
788966
972144
4255
4del
1? 0.1
%mo_to_s1
CES2, FAM96
B
TRUE
LCOR [missen
se]
1105
4217
13825p1
166822
467068
283349
58679
19dup
1? 0.1
%mo_to_bot
hESRP2, 
NFATC3, PLA
2G15
FALSE
1105
4218
13825s1
166822
467068
283349
58679
19dup
1? 0.1
%mo_to_bot
hESRP2, 
NFATC3, PLA
2G15
FALSE
1107
996
14201p1
166871
028768
713877
3590
5dup
2? 0.1
%mo_to_p1
CDH3
TRUE
1114
979
13169s1
167028
662370
296427
9804
10dup
2? 0.1
%fa_to_s1
AARS
FALSE
1109
4191
12790p1
167036
319470
405466
42272
16dup
1? 0.1
%fa_to_p1
DDX19A, DDX
19B
TRUE
1110
4113
12100p1
167071
469670
714928
232
2dup
3? 0.5
%fa_to_p1
MTSS1L
TRUE
1112
6715
12628s1
167199
710372
001906
4803
3del
2? 0.5
%fa_to_both
TRUE
CSDE1 [misse
nse]
1112
6714
12628p1
167200
103572
001906
871
2del
2? 0.5
%fa_to_both
TRUE
OTUD7A [3n-
non-
frameshifting]
1112
986
13447s1
167200
103572
001906
871
2del
2? 0.5
%fa_to_s1
FALSE
1125
4208
13621s1
167527
636775
690509
414142
55dup
1? 0.1
%fa_to_s1
KARS, TMEM
170A, BCAR1
, GABARAPL2
, 
TMEM231, AD
AT1, TERF2IP
, CFDP1, CHS
T5, 
CHST6
TRUECH
ST5
1128
4091
11581p1
167723
200477
769883
537879
36dup
1? 0.1
%mo_to_p1
ADAMTS18, M
ON1B, NUDT
7
TRUE
1129
6814
13493p1
167789
664477
918699
22055
4del
1? 0.1
%fa_to_both
VAT1L
TRUE
1129
6815
13493s1
167789
664477
918699
22055
4del
1? 0.1
%fa_to_both
VAT1L
TRUE
1131
4104
11797p1
167805
652378
062087
5564
2del
2? 0.5
%fa_to_both
CLEC3A
TRUE
1131
4105
11797s1
167805
652378
062087
5564
2del
2? 0.5
%fa_to_both
CLEC3A
TRUE
1131
6774
13216p1
167805
652378
064738
8215
3del
2? 0.5
%fa_to_both
CLEC3A
TRUE
ITGA7 [misse
nse]
1131
6776
13216s1
167805
652378
064738
8215
3del
2? 0.5
%fa_to_both
CLEC3A
TRUE
1135
970
12373p1
168131
446181
396216
81755
10dup
1? 0.1
%fa_to_p1
GAN, BCMO1
TRUEGA
N
1142
962
11964p1
168443
873384
459407
20674
9del
2? 0.1
%mo_to_p1
ATP2C2
TRUE
1143
7863
11412p1
168469
336584
812688
119323
15del
9? 2
%mo_to_p1
KLHL36, USP
10
FALSE
1160
1036
12578p1
17
6010
11981
5971
3dup
1? 0.1
%mo_to_p1
TRUE
1162
6906
13263s1
17
6010
636421
630411
38dup
2? 0.1
%mo_to_s1
RPH3AL, FAM
101B, C17orf
97, FAM57A, 
VPS53
FALSE
1161
6898
13165s1
1729
5626
729318
433692
48dup
1? 0.1
%fa_to_s1
RNMTL1, FAM
57A, GLOD4,
 GEMIN4, VPS
53, NXN
TRUE
1159
1061
13335p1
1767
3108
695309
22201
12del
2? 0.1
%fa_to_both
RNMTL1, GLO
D4
FALSE
ZNF420 [miss
ense]
1159
1062
13335s1
1767
3108
695309
22201
12del
2? 0.1
%fa_to_both
RNMTL1, GLO
D4
FALSE
1159
6959
13599s1
1767
3108
695309
22201
12del
2? 0.1
%mo_to_s1
RNMTL1, GLO
D4
TRUE
1166
6903
13196s1
17345
80253
486724
28699
9del
2? 0.1
%fa_to_s1
TRPV1, TRPV
3
TRUETR
PV3
1168
4278
11285s1
17351
86303
561469
42839
14del
2? 0.1
%mo_to_s1
CTNS, SHPK
FALSE
1168
4306
11532p1
17351
86303
561469
42839
14del
2? 0.1
%mo_to_p1
CTNS, SHPK
TRUE
1167
4390
13271p1
17355
07373
552225
1488
2del
1? 0.1
%mo_to_bot
hCTNS
TRUE
1178
4275
11267p1
17725
85237
259951
1428
7dup
1? 0.1
%mo_to_bot
hTMEM95
TRUE
1178
4276
11267s1
17725
85237
259951
1428
7dup
1? 0.1
%mo_to_bot
hTMEM95
TRUE
1199
4234
11067s1
171166
675611
713686
46930
10dup
1? 0.1
%mo_to_s1
DNAH9
TRUE
1202
6919
13328p1
171409
530515
458678
1363373
31dup
1? 0.5
%fa_to_p1
PMP22, CDR
T15, TEKT3, N
one, FAM18B
2-
CDRT4, HS3S
T3B1, COX10
FALSE
SMAD2 [miss
ense]
1207
6859
12579p1
171632
098216
351274
30292
18dup
1? 0.1
%fa_to_both
NCRNA00188
, TRPV2
TRUE
1207
6860
12579s1
171632
098216
347445
26463
17dup
1? 0.1
%fa_to_both
NCRNA00188
, TRPV2
TRUE
1218
4412
13843p1
172691
871526
920084
1369
2del
1? 0.1
%fa_to_both
SPAG5
TRUE
AGK [missens
e]
1218
4413
13843s1
172691
871526
920084
1369
2del
1? 0.1
%fa_to_both
SPAG5
TRUE
DOCK10 [mis
sense]
1225
4295
11484s1
172931
163429
324349
12715
3dup
1? 0.1
%fa_to_s1
RNF135
TRUE
1237
4269
11196p1
173367
937433
769305
89931
9del
4? 1
%mo_to_p1
SLFN11, SLF
N12, SLFN13
TRUE
1237
6948
13507s1
173367
937433
769305
89931
9del
4? 1
%mo_to_s1
SLFN11, SLF
N12, SLFN13
FALSE
SCOC [frame
shift], LAMA1
 
[missense]
1239
1006
11459p1
173418
206634
183809
1743
4del
1? 0.1
%fa_to_both
C17orf66
TRUE
DEPDC7 [mis
sense]
1239
1007
11459s1
173418
206634
183809
1743
4del
1? 0.1
%fa_to_both
C17orf66
TRUE
1264
4388
13195s1
173967
172339
680792
9069
9dup
1? 0.5
%fa_to_s1
KRT15, KRT1
9
TRUE
1300
1009
11479s1
175731
185257
430887
119035
11dup
1? 0.1
%mo_to_s1
GDPD1, YPEL
2
TRUE
1313
4256
11107p1
176707
511367
087403
12290
14del
2? 0.1
%mo_to_bot
hABCA6
TRUE
TROAP [miss
ense]
1313
4257
11107s1
176707
511367
087403
12290
14del
2? 0.1
%mo_to_bot
hABCA6
TRUE
1320
4352
12561p1
177287
445972
877385
2926
3del
1? 0.5
%fa_to_both
FADS6
FALSE
NLRX1 [misse
nse], 
ADAM33 [non
sense]
1320
4353
12561s1
177287
445972
877385
2926
3del
1? 0.5
%fa_to_both
FADS6
FALSE
ROBO3 [miss
ense]
1330
4376
13000s1
177462
141174
625793
4382
8dup
1? 0.1
%mo_to_s1
ST6GALNAC1
TRUE
1331
4396
13608p1
177486
511175
398785
533674
38del
1? 0.1
%fa_to_p1
SEPT9, NCRN
A00338, MGA
T5B, SEC14L
1
FALSE
CSDE1 [nons
ense]
1333
6962
13601p1
177619
857976
210508
11929
10dup
1? 0.1
%fa_to_both
BIRC5, AFMID
TRUE
AK1 [missens
e]
1333
6963
13601s1
177619
878376
202131
3348
7dup
1? 0.1
%fa_to_both
AFMID
TRUE
1336
4272
11220s1
177818
841378
188937
524
2del
1? 0.1
%fa_to_both
SGSH
FALSESG
SH
1336
7895
11220p1
177818
841378
188937
524
2del
1? 0.1
%fa_to_both
SGSH
FALSESG
SH
1337
6867
12691s1
177822
192878
323706
101778
36dup
1? 0.1
%mo_to_s1
SLC26A11, R
NF213
FALSE
PNPLA6 [mis
sense], 
SLC43A1 [mis
sense]
1345
6881
12937s1
177982
704879
827282
234
2del
1? 0.1
%fa_to_s1
ARHGDIA
TRUE
OR2T3 [misse
nse]
1351
6838
12424p1
178015
163180
153240
1609
3del
1? 0.1
%mo_to_p1
CCDC57
FALSE
1356
4423
11008p1
1858
0408
645097
64689
8dup
1? 0.1
%mo_to_bot
hCLUL1, 
CETN1
FALSE
KATNAL2 [sp
lice]
1356
4424
11008s1
1858
0408
645097
64689
8dup
1? 0.1
%mo_to_bot
hCLUL1, 
CETN1
FALSE
1357
4435
11252p1
1868
8573
697355
8782
7del
4? 0.1
%mo_to_p1
ENOSF1
FALSE
LCN10 [misse
nse]
1373
4442
11429p1
181853
134118
650574
119233
32dup
1? 0.1
%fa_to_both
ROCK1
FALSE
1373
4443
11429s1
181853
134118
650574
119233
32dup
1? 0.1
%fa_to_both
ROCK1
FALSE
IL6R [missens
e]
1375
6973
12394s1
182105
696921
059388
2419
3dup
1? 0.1
%fa_to_s1
RIOK3
TRUE
1379
6981
12697p1
182443
617424
628467
192293
10dup
1? 0.1
%fa_to_p1
CHST9, AQP4
, CHST9-AS1
TRUE
1380
6979
12523p1
182896
832929
049312
80983
25dup
1? 0.5
%fa_to_p1
DSG3, DSG4
FALSE
G3BP2 [misse
nse]
1384
1079
11942p1
183295
341932
954256
837
2del
2? 0.5
%mo_to_p1
ZNF396
FALSE
1384
6997
13443p1
183295
341932
954256
837
2del
2? 0.5
%mo_to_p1
ZNF396
TRUE
1385
4483
12656p1
183383
693033
848648
11718
4dup
1? 0.1
%mo_to_p1
MOCOS
TRUE
CPZ [missens
e]
1392
1082
12285s1
184700
871447
010141
1427
2dup
1? 0.1
%fa_to_s1
C18orf32
FALSE
ZNF780A [mis
sense]
1397
6975
12420s1
184780
134647
801814
468
3dup
1? 2
%fa_to_s1
MBD1
FALSEM
BD1
VPS18 [misse
nse]
1406
1087
13726p1
186421
120764
212140
933
2del
4? 0.1
%mo_to_bot
hCDH19
TRUE
1406
1088
13726s1
186421
120764
212140
933
2del
4? 0.1
%mo_to_bot
hCDH19
TRUE
MAPK8 [miss
ense]
1412
4484
12656p1
187041
626270
417855
1593
2dup
1? 0.1
%fa_to_both
NETO1
TRUE
CPZ [missens
e]
1412
4485
12656s1
187041
626270
417855
1593
2dup
1? 0.1
%fa_to_both
NETO1
TRUE
1414
4489
12869p1
187222
928172
251798
22517
8dup
1? 0.1
%fa_to_p1
CNDP1
TRUE
PRCP [missen
se]
1415
7002
13508p1
187481
716674
980858
163692
4dup
1? 0.1
%mo_to_bot
hGALR1, 
MBP
FALSE
1415
7003
13508s1
187496
811374
980858
12745
2dup
1? 0.1
%mo_to_bot
hGALR1
FALSE
1417
4438
11356p1
187747
034577
891075
420730
28dup
2? 0.1
%fa_to_p1
KCNG2, RBFA
, CTDP1, ADN
P2, TXNL4A, 
PQLC1
TRUECT
DP1N
APRT1 [splice
], SV2B 
[missense]
1417
4461
11810p1
187766
397578
005231
341256
23dup
2? 0.1
%fa_to_both
PARD6G, LOC
100130522, R
BFA, ADNP2,
 
TXNL4A, PQL
C1
FALSE
1417
4462
11810s1
187769
396878
005231
311263
21dup
2? 0.1
%fa_to_both
PARD6G, LOC
100130522, R
BFA, ADNP2,
 
TXNL4A, PQL
C1
FALSE
RSRC1 [nons
ense]
1423
1143
12358p1
19178
30261
784945
1919
2del
2? 0.5
%fa_to_p1
ATP8B3
TRUE
TSR2 [missen
se]
1424
1183
13815p1
19179
99451
802644
2699
4del
2? 0.5
%fa_to_p1
ATP8B3
TRUE
1429
4550
11298p1
19241
02552
418136
7881
5del
1? 0.5
%fa_to_both
TMPRSS9
TRUE
SLC6A13 [mis
sense]
1429
4551
11298s1
19241
02552
418136
7881
5del
1? 0.5
%fa_to_both
TMPRSS9
TRUE
1431
1132
12161s1
19380
59463
834981
29035
17dup
2? 0.1
%fa_to_both
ZFR2
FALSE
1431
1131
12161p1
19380
71693
833776
26607
15dup
2? 0.1
%fa_to_both
ZFR2
FALSE
UBR3 [frames
hift], CARKD 
[nonsense]
1431
7051
12837p1
19381
66713
825405
8734
7del
2? 0.1
%fa_to_both
ZFR2
TRUE
SH3RF3 [miss
ense]
1431
7052
12837s1
19381
66713
825405
8734
7del
2? 0.1
%fa_to_both
ZFR2
TRUE
1442
8514
13296p1
19668
19516
686913
4962
8dup
1? 0.1
%mo_to_p1
C3
TRUE
1448
7030
12626s1
19689
04916
991143
100652
27dup
6? 0.5
%fa_to_both
EMR1, EMR4
P
TRUE
1448
4659
12780p1
19689
04917
083755
193264
41dup
6? 0.5
%fa_to_both
EMR1, MBD3
L2, EMR4P, Z
NF557
TRUE
1448
4660
12780s1
19689
04916
991143
100652
27dup
6? 0.5
%fa_to_both
EMR1, EMR4
P
TRUE
RABEP1 [mis
sense]
1448
1098
11364s1
19689
64087
083755
187347
40dup
6? 0.5
%fa_to_s1
EMR1, MBD3
L2, EMR4P, Z
NF557
FALSE
1448
7020
12462s1
19689
64087
083755
187347
40dup
6? 0.5
%mo_to_s1
EMR1, MBD3
L2, EMR4P, Z
NF557
TRUE
EPB41L3 [non
sense]
1448
7028
12626p1
19689
64086
991143
94735
26dup
6? 0.5
%fa_to_both
EMR1, EMR4
P
TRUE
GALNT9 [mis
sense]
1448
7112
13504s1
19689
64087
083755
187347
40dup
6? 0.5
%fa_to_both
EMR1, MBD3
L2, EMR4P, Z
NF557
TRUE
1448
7110
13504p1
19689
71596
989173
92014
24dup
6? 0.5
%fa_to_both
EMR1, EMR4
P
TRUE
1447
7029
12626p1
19707
50857
083755
8670
6dup
4? 0.5
%fa_to_both
ZNF557
TRUE
GALNT9 [mis
sense]
1447
7031
12626s1
19707
50857
083755
8670
6dup
4? 0.5
%fa_to_both
ZNF557
TRUE
1447
4661
12780s1
19707
50857
083755
8670
6dup
4? 0.5
%fa_to_both
ZNF557
TRUE
RABEP1 [mis
sense]
1447
7111
13504p1
19708
13707
083755
2385
3dup
4? 0.5
%fa_to_both
ZNF557
TRUE
1453
4644
12616p1
19815
93298
168571
9242
9dup
1? 0.1
%fa_to_both
FBN3
FALSE
1453
4645
12616s1
19815
93298
168571
9242
9dup
1? 0.1
%fa_to_both
FBN3
FALSE
1465
7006
12252s1
191143
607311
448075
12002
4dup
1? 0.1
%mo_to_s1
RAB3D, TSPA
N16
FALSER
AB3D
1466
1108
11479p1
191154
171811
548945
7227
6dup
1? 0.1
%mo_to_p1
PRKCSH, CC
DC151
TRUE
1472
1125
12106p1
191414
268414
153601
10917
6dup
4? 1
%fa_to_p1
IL27RA
TRUE
1472
7116
13512s1
191414
319714
153601
10404
5dup
4? 1
%fa_to_s1
IL27RA
TRUE
1472
7101
13487s1
191415
326414
153601
337
2del
4? 1
%mo_to_s1
IL27RA
FALSE
1481
4549
11298p1
191870
437518
704917
542
2dup
1? 0.1
%mo_to_p1
CRLF1
TRUE
SLC6A13 [mis
sense]
1482
7054
12838s1
191903
003119
033553
3522
7dup
1? 0.1
%mo_to_s1
DDX49, COPE
TRUE
FAM49A [fram
eshift]
1519
4580
11724p1
194019
983640
225713
25877
4dup
1? 0.1
%fa_to_p1
CLC, LGALS1
4
TRUE
TUBA1A [mis
sense]
1528
7010
12394s1
194392
003243
922549
2517
5del
3? 0.1
%fa_to_s1
TEX101
TRUE
1528
4678
13509p1
194392
003243
922549
2517
5del
3? 0.1
%mo_to_bot
hTEX101
TRUE
1528
4679
13509s1
194392
003243
922549
2517
5del
3? 0.1
%mo_to_bot
hTEX101
TRUE
1530
1191
14201s1
194528
416145
296877
12716
7dup
1? 0.1
%fa_to_s1
CBLC
TRUE
1531
1091
11029p1
194549
410745
495658
1551
3del
1? 0.1
%mo_to_bot
hCLPTM1
TRUE
1531
1092
11029s1
194549
410745
495658
1551
3del
1? 0.1
%mo_to_bot
hCLPTM1
TRUE
1532
4510
11085s1
194584
879945
899694
50895
44dup
2? 0.5
%mo_to_s1
ERCC2, KLC3
, PPP1R13L
TRUEER
CC2P
OGK [missen
se]
1532
1158
13116p1
194584
879945
901576
52777
47dup
2? 0.5
%fa_to_both
ERCC2, KLC3
, PPP1R13L
TRUEER
CC2
1532
1160
13116s1
194585
070445
901576
50872
45dup
2? 0.5
%fa_to_both
ERCC2, KLC3
, PPP1R13L
TRUEER
CC2S
RRM5 [misse
nse]
1534
7057
12840p1
194611
905946
120111
1052
3del
1? 0.1
%mo_to_bot
hEML2
TRUE
ATP1B1 [nons
ense], 
TM4SF19 [sp
lice]
1534
7058
12840s1
194611
905946
120111
1052
3del
1? 0.1
%mo_to_bot
hEML2
TRUE
1544
4554
11304s1
194948
118049
497180
16000
11dup
1? 0.1
%fa_to_s1
GYS1, RUVBL
2
TRUE
KLC2 [missen
se]
1546
7060
12843p1
194963
625749
675365
39108
37dup
1? 0.5
%mo_to_p1
TRPM4, PPFI
A3, HRC
FALSE
1547
7087
13296s1
194967
117349
675365
4192
6del
1? 0.5
%mo_to_s1
TRPM4
TRUE
1557
7043
12705p1
195222
717652
543836
316660
29dup
4? 0.1
%fa_to_both
ZNF613, ZNF
615, ZNF614,
 FPR1, FPR2,
 FPR3, 
ZNF432, ZNF
350, ZNF649,
 ZNF577
FALSE
DIP2C [frame
shift]
1557
7044
12705s1
195227
191152
550121
278210
30dup
4? 0.1
%fa_to_both
ZNF613, ZNF
615, ZNF614,
 FPR2, FPR3,
 ZNF432, 
ZNF350, ZNF
649, ZNF577
FALSE
PLXNA4 [miss
ense]
1557
7082
13215s1
195227
191152
579425
307514
33dup
4? 0.1
%mo_to_s1
ZNF613, ZNF
615, ZNF614,
 FPR2, FPR3,
 ZNF432, 
ZNF350, ZNF
649, ZNF577
TRUE
1568
4697
14110p1
195462
990254
631752
1850
3del
1? 0.1
%mo_to_p1
PRPF31
FALSE
PHF3 [missen
se]
1588
4674
13171p1
195587
206055
879739
7679
10del
1? 0.1
%mo_to_p1
IL11
TRUE
1596
7074
13162s1
195728
605557
293489
7434
3dup
1? 0.1
%fa_to_s1
ZIM2
TRUE
1598
1184
13815p1
195783
504957
932849
97800
15del
1? 0.1
%mo_to_p1
ZNF547, ZNF
304, ZNF17, Z
NF548, ZNF5
43
TRUE
1601
7045
12716p1
195863
924458
652113
12869
2dup
1? 0.1
%fa_to_both
ZNF329
TRUE
1789
8293
12409s1
20373
47043
739325
4621
4dup
5? 0.5
%mo_to_s1
C20orf27
TRUE
1789
1219
13335s1
20373
47043
736254
1550
3dup
5? 0.5
%mo_to_s1
C20orf27
FALSE
1789
4741
12650p1
20373
50433
739325
4282
3dup
5? 0.5
%mo_to_bot
hC20orf2
7
FALSE
1789
4742
12650s1
20373
50433
739325
4282
3dup
5? 0.5
%mo_to_bot
hC20orf2
7
FALSE
1792
7131
13101s1
20383
52713
893281
58010
12dup
1? 0.1
%fa_to_s1
MAVS, PANK
2
TRUEPA
NK2
1792
7992
12383p1
20383
52713
838456
3185
2dup
1? 0.5
%fa_to_both
MAVS
FALSE
1791
7993
12383s1
20383
52713
838456
3185
2dup
1? 0.5
%fa_to_both
MAVS
FALSE
1797
1193
11013p1
20786
42397
990962
126723
15dup
1? 0.1
%mo_to_bot
hHAO1, T
MX4
TRUE
TMPRSS2 [m
issense]
1797
1194
11013s1
20786
42397
990962
126723
15dup
1? 0.1
%mo_to_bot
hHAO1, T
MX4
TRUE
1799
7136
13418p1
201385
668313
869158
12475
7dup
1? 0.1
%mo_to_bot
hSEL1L2
FALSE
CGNL1 [miss
ense], 
DENND5B [m
issense], 
LRRC40 [miss
ense]
1799
7137
13418s1
201385
816513
869158
10993
6dup
1? 0.1
%mo_to_bot
hSEL1L2
FALSE
1806
7143
13589s1
202546
987925
472161
2282
3del
1? 0.1
%fa_to_s1
NINL
TRUE
1808
1207
11872p1
203053
360930
538190
4581
4dup
1? 0.5
%mo_to_p1
PDRG1
TRUE
CACNA1D [m
issense], 
KATNAL2 [sp
lice]
1824
4726
11501p1
204435
100644
354275
3269
3del
6? 2
%mo_to_p1
SPINT4
FALSE
1824
1224
13926s1
204435
100644
354275
3269
3del
6? 2
%mo_to_s1
SPINT4
TRUE
1828
7995
11304p1
204522
860945
235221
6612
2dup
2? 0.1
%fa_to_p1
SLC13A3
TRUE
1828
7122
12473p1
204522
860945
235221
6612
2dup
2? 0.1
%mo_to_p1
SLC13A3
TRUE
1830
1222
13815s1
204812
445648
166726
42270
11del
2? 0.5
%fa_to_s1
PTGIS
TRUE
1842
4712
11196s1
206285
324562
904953
51708
15dup
1? 0.5
%fa_to_both
PCMTD2, MY
T1
TRUE
1851
7175
13396p1
211962
882519
632603
3778
3del
1? 0.1
%mo_to_p1
CHODL
TRUE
1854
4778
11581p1
212829
637128
317408
21037
7dup
1? 0.1
%fa_to_both
ADAMTS5
TRUE
1854
4779
11581s1
212829
637128
317408
21037
7dup
1? 0.1
%fa_to_both
ADAMTS5
TRUE
1864
4808
12507p1
213574
277735
899047
156270
8dup
2? 0.5
%mo_to_bot
hKCNE2, 
RCAN1, KCN
E1, FAM165B
FALSE
1864
7173
13327p1
213574
277735
899047
156270
8dup
2? 0.5
%fa_to_p1
KCNE2, RCA
N1, KCNE1, F
AM165B
TRUE
1864
4810
12507s1
213589
038135
899047
8666
4dup
2? 0.5
%mo_to_bot
hRCAN1
FALSE
1868
1233
13593p1
213764
931837
665869
16551
10dup
2? 0.1
%mo_to_bot
hDOPEY2
FALSE
RGS22 [misse
nse]
1868
1234
13593s1
213764
931837
665869
16551
10dup
2? 0.1
%mo_to_bot
hDOPEY2
FALSE
1875
4784
11716p1
214280
399742
818068
14071
9dup
1? 0.1
%fa_to_p1
MX1
TRUE
1880
4799
12308p1
214483
662144
837654
1033
2dup
2? 2
%mo_to_p1
SIK1
TRUE
1885
7149
12481p1
214572
491145
826648
101737
48dup
1? 0.1
%fa_to_p1
PFKL, C21orf
2, TRPM2
FALSE
1887
4824
13795p1
214631
898146
323450
4469
4del
1? 0.1
%mo_to_bot
hITGB2
TRUEITG
B2
1887
4825
13795s1
214631
898146
323450
4469
4del
1? 0.1
%mo_to_bot
hITGB2
TRUEITG
B2S
TX3 [missens
e]
1897
4768
11196p1
214801
927548
084239
64964
12dup
1? 0.1
%mo_to_bot
hPRMT2,
 S100B
TRUE
1898
7216
12997p1
221726
450817
288963
24455
3dup
5? 1
%mo_to_p1
XKR3
TRUE
1898
7230
13418p1
221726
450817
288963
24455
3dup
5? 1
%mo_to_p1
XKR3
FALSE
CGNL1 [miss
ense], 
DENND5B [m
issense], 
LRRC40 [miss
ense]
1910
7191
12481p1
222136
946321
473538
104075
17dup
1? 0.1
%fa_to_both
SLC7A4, P2R
X6
FALSE
1910
7192
12481s1
222136
946321
384640
15177
15dup
1? 0.1
%fa_to_both
SLC7A4, P2R
X6
FALSE
1917
8016
12680p1
222212
349222
127271
3779
2dup
1? 0.1
%fa_to_both
MAPK1
TRUEMA
PK1
1927
4841
11090p1
222340
163924
967945
1566306
242dup
1? 0.1
%mo_to_p1
GUSBP11, M
MP11, SLC2A
11, SPECC1L
, UPB1, 
CHCHD10, ZN
F70, VPREB3
, RAB36, CAB
IN1, 
RTDR1, DDTL
, C22orf13, D
ERL3, C22orf
15, BCR, 
SUSD2, SMA
RCB1, ADOR
A2A, LOC284
889, DDT, 
GGT5, SNRP
D3, GSTT2, G
STT1, C22orf
43, IGLL1
TRUEUP
B1, ADORA2A, SMARCB1
1930
8025
11090p1
222403
421724
325797
291580
80dup
1? 0.1
%mo_to_p1
GUSBP11, SM
ARCB1, MMP
11, LOC2848
89, 
DDTL, SLC2A
11, DDT, DER
L3, C22orf15,
 
CHCHD10, ZN
F70, VPREB3
, GSTT2
TRUESM
ARCB1
1931
4854
11242s1
222403
421724
037206
2989
6dup
1? 0.5
%fa_to_s1
GUSBP11
TRUE
1927
8024
11090p1
222443
196124
967945
535984
98dup
1? 0.1
%mo_to_p1
ADORA2A, SN
RPD3, CABIN
1, SPECC1L, 
C22orf13, SU
SD2, GGT5, U
PB1
TRUEUP
B1, ADORA2A
1938
7194
12498s1
222909
288829
095925
3037
2del
1? 0.1
%mo_to_bot
hCHEK2
TRUE
GPR82 [misse
nse]
1938
8522
12498p1
222909
288829
095925
3037
2del
1? 0.1
%mo_to_bot
hCHEK2
TRUE
CPA4 [missen
se]
1947
1263
12810p1
223254
573932
651316
105577
26dup
1? 0.1
%fa_to_p1
C22orf42, No
ne, RFPL2, SL
C5A4
TRUE
1950
1283
14011p1
223278
822632
794087
5861
5del
1? 0.1
%fa_to_both
C22orf28
TRUE
1950
1284
14011s1
223278
822632
794087
5861
5del
1? 0.1
%fa_to_both
C22orf28
TRUE
1952
1240
11364p1
223571
386935
742965
29096
13dup
1? 0.1
%mo_to_bot
hTOM1
FALSE
1952
1241
11364s1
223571
386935
743202
29333
14dup
1? 0.1
%mo_to_bot
hTOM1
FALSE
1954
4856
11247p1
223670
197536
907833
205858
29dup
1? 0.1
%fa_to_p1
TXN2, FOXRE
D2, MYH9, EI
F3D
TRUE
1958
4963
13992s1
223847
789538
483271
5376
5dup
1? 0.1
%fa_to_s1
SLC16A8, BA
IAP2L2
TRUE
1966
7215
12937s1
224075
486740
762526
7659
9del
1? 0.1
%fa_to_s1
ADSL
TRUEAD
SLO
R2T3 [missen
se]
1979
4893
11828p1
224278
116642
899582
118416
8dup
2? 0.1
%mo_to_p1
NFAM1, SERH
L
TRUE
1978
8034
11654p1
224295
619142
962344
6153
2dup
8? 2
%fa_to_both
SERHL2
TRUE
1978
1267
13116p1
224295
619142
962344
6153
2del
8? 2
%fa_to_both
SERHL2
TRUE
1978
4948
13625p1
224295
619142
962344
6153
2del
8? 2
%fa_to_both
SERHL2
TRUE
1978
4950
13625s1
224295
619142
962344
6153
2del
8? 2
%fa_to_both
SERHL2
TRUE
1984
4921
12375s1
224432
281444
377341
54527
19dup
1? 0.1
%fa_to_s1
PNPLA3, SAM
M50
TRUE
1987
1255
12011s1
224488
350344
893436
9933
2dup
1? 0.1
%fa_to_both
LDOC1L
TRUE
1987
7445
12011p1
224488
350344
893436
9933
2dup
1? 0.1
%fa_to_both
LDOC1L
TRUE
FAM45A [mis
sense]
1988
1273
13606s1
224512
112345
198044
76921
8del
1? 0.1
%fa_to_s1
PRR5-ARHGA
P8, PRR5
TRUE
ZFYVE26 [mis
sense]
1994
7186
12409p1
224728
716147
308084
20923
3del
1? 0.1
%mo_to_bot
hTBC1D2
2A
TRUE
LRP2 [nonsen
se]
1994
8318
12409s1
224728
716147
308084
20923
3del
1? 0.1
%mo_to_bot
hTBC1D2
2A
TRUE
1998
4887
11716p1
225101
620351
018700
2497
6del
2? 0.1
%fa_to_both
CHKB-CPT1B
, CPT1B
TRUE
1998
4888
11716s1
225101
620351
018700
2497
6del
2? 0.1
%fa_to_both
CHKB-CPT1B
, CPT1B
TRUE
cnvrID
callID
familyI D
relC
hrom osome
Start (h
g19)
Stop (h
g19)l
ength (
bp)le
ng th (exo ns)st
at eFre
q uen cy in 411 qua ds
Genes
2363
21111
1079p1
319
5778812
1972733
14
1,494,50
2161d
el1
PCYT1A
, FBXO4
5, LRRC
33, WDR
53, NCB
P2, TM4
SF19-TC
TEX1D2
, 
DLG1, M
FI2, SEN
P5, RNF
168, ZD
HHC19,
 OSTalph
a, TFRC
, C3orf4
3, 
CEP19, 
PIGX, TC
TEX1D2
, PIGZ, U
BXN7, P
AK2, BD
H1
1080
40331
1090p1
16
2967504
93
0199897
524,848
187del
7DOC2
A, ASPH
D1, COR
O1A, TB
X6, KIF2
2, CDIPT
, QPRT, 
YPEL3, 
PPP4C, 
MAPK3,
 SPN, M
VP, FAM
57B, AL
DOA, IN
O80E, S
EZ6L2, T
AOK2, K
CTD13, 
MAZ, PR
RT2, GD
PD3, C1
6orf92, C
16orf53,
 TMEM2
19, C16o
rf54, HIR
IP3
2810
27371
1154p1
77
2848337
7412076
41
,272,427
209dup
1STX1A
, WBSC
R27, WB
SCR22, 
LAT2, LI
MK1, W
BSCR28
, RFC2, 
FZD9, 
VPS37D
, ABHD1
1, CLIP2
, CLDN3
, CLDN4
, BCL7B
, ELN, M
LXIPL, 
DNAJC3
0, GTF2
IRD1, BA
Z1B, TB
L2, EIF4
H, GTF2
I
888
38831
1265p1
15
2284576
12
3062319
216,558
57del
6NIPA2
, NIPA1,
 CYFIP1
, TUBGC
P5
1251
42891
1353p1
17
3489295
03
6104875
1,211,92
5159d
el1
LHX1, D
USP14, 
MRM1, 
ACACA,
 DDX52,
 DHRS1
1, SYNR
G, HNF1
B, AATF,
 
PIGW, T
ADA2A, 
GGNBP
2
1080
40691
1433p1
16
2980821
33
0199897
391,684
175del
7DOC2
A, ASPH
D1, COR
O1A, TB
X6, KIF2
2, CDIPT
, YPEL3
, PPP4C
, 
MAPK3,
 MVP, FA
M57B, A
LDOA, IN
O80E, S
EZ6L2, T
AOK2, K
CTD13, 
MAZ, 
PRRT2, 
GDPD3,
 C16orf9
2, C16or
f53, TME
M219, H
IRIP3
1180
43051
1532p1
17
6329934
7352107
1,022,17
3366d
up1
CLEC10
A, ACAD
VL, AIPL
1, C17or
f74, C17
orf100, S
PEM1, F
AM64A,
 
LOC100
506713,
 EIF5A, G
PS2, DL
G4, CTD
NEP1, M
IR497HG
, TEKT1
, 
RNASEK
-C17OR
F49, GA
BARAP, 
DVL2, S
LC16A1
1, SLC1
6A13, N
EURL4, 
FGF11, 
CLDN7,
 C17orf8
1, NLGN
2, C17or
f61-PLS
CR3, TN
K1, MED
31, 
SLC13A
5, ACAP
1, ASGR
1, ASGR
2, TMEM
102, TM
EM95, K
IAA0753
, 
PHF23, 
XAF1, P
LSCR3, 
TXNDC1
7, SLC2
A4, CHR
NB1, KC
TD11, 
PITPNM
3, BCL6
B, FBXO
39, YBX
2
1080
41141
2100p1
16
2967504
93
0199897
524,848
193del
7DOC2
A, ASPH
D1, COR
O1A, TB
X6, KIF2
2, CDIPT
, QPRT, 
YPEL3, 
PPP4C, 
MAPK3,
 SPN, M
VP, FAM
57B, AL
DOA, IN
O80E, S
EZ6L2, T
AOK2, K
CTD13, 
MAZ, PR
RT2, GD
PD3, C1
6orf92, C
16orf53,
 TMEM2
19, C16o
rf54, HIR
IP3
1057
41231
2162p1
16
2163625
12
1737979
101,728
26dup
5METT
L9, OTO
A
1967
49041
2224p1
22
4052182
14
0762526
240,705
38del
1ADSL,
 TNRC6
B
1080
41441
2308p1
16
2967504
93
0199897
524,848
193del
7DOC2
A, ASPH
D1, COR
O1A, TB
X6, KIF2
2, CDIPT
, QPRT, 
YPEL3, 
PPP4C, 
MAPK3,
 SPN, M
VP, FAM
57B, AL
DOA, IN
O80E, S
EZ6L2, T
AOK2, K
CTD13, 
MAZ, PR
RT2, GD
PD3, C1
6orf92, C
16orf53,
 TMEM2
19, C16o
rf54, HIR
IP3
749
37051
2343p1
13
4130163
74
1910892
609,255
53dup
1KBTB
D7, KBT
BD6, MT
RF1, MR
PS31, N
AA16, M
IR320D1
, ELF1, 
SLC25A
15, WBP
4
1120
67251
2691p1
16
6997102
07
2923861
2,952,84
1487d
el1
WWP2, 
HP, SF3B
3, DHX3
8, ZNF1
9, CLEC
18A, CL
EC18C, 
FUK, PM
FBP1, 
HYDIN, 
TAT, ATX
N1L, PD
PR, MAR
VELD3, 
DDX19A
, DDX19
B, VAC1
4, 
ST3GAL
2, None,
 MTSS1
L, HPR, 
CALB2, 
KIAA017
4, COG4
, IL34, Z
FHX3, 
ZNF23, 
PHLPP2
, CHST4
, AP1G1
, FTSJD
1, AARS
, DHODH
, TXNL4
B
1682
56731
2735p1
210
5438016
1092931
54
3,855,13
8163d
el1
POU3F3
, SULT1C
3, ST6G
AL2, NC
K2, MRP
S9, UXS
1, SULT1
C2, 
SULT1C
4, C2orf
40, FHL2
, LIMS1,
 C2orf49
, GCC2,
 SLC5A7
, GPR45
, 
TGFBRA
P1
1079
41861
2736p1
16
2946495
33
0212614
747,661
222dup
3DOC2
A, ASPH
D1, COR
O1A, TB
X6, PRR
T2, CDIP
T, QPRT
, YPEL3
, PPP4C
, 
SLX1B, 
MAPK3,
 SPN, B
OLA2B, 
MVP, FA
M57B, A
LDOA, IN
O80E, S
EZ6L2, 
TAOK2, 
KCTD13
, SLX1A
-SULT1A
3, MAZ, 
KIF22, G
DPD3, C
16orf92,
 
C16orf5
3, TMEM
219, C16
orf54, H
IRIP3
568
64091
2975p1
111
2134882
612
2852379
1,503,55
386d
el1
SORL1, 
CRTAM,
 UBASH
3B, BSX
, MIR100
HG, C11
orf63
1475
70651
3018p1
19
1505230
01
6038149
985,849
234dup
1EPHX3
, CYP4F
8, PGLY
RP2, CA
SP14, R
ASAL3, 
CYP4F3
, CYP4F
2, 
CCDC10
5, CYP4
F12, CY
P4F11, A
KAP8L, 
OR1I1, I
LVBL, SY
DE1, BR
D4, 
OR7C2,
 CYP4F2
2, NOTC
H3, OR1
0H2, OR
10H3, O
R10H1, 
AKAP8, 
OR10H5
, SLC1A
6, WIZ
143
84133
46p1
19
3545080
9571242
62
,167,346
229del
1TMED
5, GCLM
, RWDD
3, CNN3
, DR1, A
LG14, M
TF2, TM
EM56-R
WDD3, 
DNTTIP2
, SLC44
A3, ABC
A4, F3, M
IR760, T
MEM56,
 ARHGA
P29, 
CCDC18
, ABCD3
, BCAR3
, FNBP1
L
150
85133
46p1
1
993586
27
100983
836
1,625,20
9162 d
el1
CCDC76
, LRRC3
9, AGL, 
HIAT1, C
DC14A, 
PALMD,
 LPPR5,
 LPPR4,
 
SASS6, 
SLC35A
3, FRRS
1, MIR54
8D1, DB
T, RTCD
1
161
86133
46p1
111
2795166
1134719
30
676,764
87del
1WNT2
B, CAPZ
A1, MOV
10, FAM
19A3, SL
C16A1, 
PPM1J, 
RHOC, S
T7L, 
CTTNBP
2NL
891
39971
3355p1
15
2701754
82
7188573
171,025
12dup
1GABR
A5, GAB
RB3
504
73613
726p1
11
5694936
76
0233630
3,284,26
3348d
el1
DTX4, O
R5B21, 
LPXN, S
SRP1, O
R5B2, L
RRC55, 
UBE2L6
, OR1S1
, 
OR1S2, 
SERPIN
G1, PRG
3, PRG2
, MS4A7
, P2RX3
, MS4A5
, MS4A2
, 
MS4A3,
 MS4A1
, YPEL4
, OR5A1
, MIR130
A, OR10
Q1, OR4
D10, OR
4D11, 
MS4A14
, TIMM1
0, OR10
V1, OR5
B3, MED
19, PATL
1, MPEG
1, None,
 
OR4D6,
 OR10W
1, MRPL
16, OR4
D9, OR5
B12, OR
5B17, M
S4A6A, 
ZFP91-
CNTF, T
NKS1BP
1, TMX2
, ZDHHC
5, MS4A
4A, SLC
43A3, SM
TNL1, 
SLC43A
1, TMX2
-CTNND
1, OR9Q
2, OSBP
, OR9Q1
, APLNR
, PLAC1
L, GIF, 
CLP1, O
R5A2, S
TX3, GLY
ATL1, TC
N1, GLY
ATL2, G
LYAT, OR
5AN1, 
MS4A6E
, FAM11
1A, FAM
111B, R
TN4RL2
1127
99113
815p1
16
7631156
07
6513435
201,875
12del
1CNTN
AP4
2962
29931
3876p1
83
6641928
4305471
26
,412,784
598dup
1IDO2, 
TM2D2,
 IDO1, S
TAR, BA
G4, ASH
2L, AP3M
2, LETM
2, IKBKB
, 
ADAM18
, ADAM3
2, C8orf
4, HTRA
4, HGSN
AT, RNF
170, AN
K1, ZNF
703, 
CHRNB
3, DDHD
2, PPAP
DC1B, W
HSC1L1
, NKX6-3
, GPR12
4, KCNU
1, 
GOLGA7
, MYST3
, SGK19
6, POLB
, FNTA, 
HOOK3,
 LSM1, P
LAT, CH
RNA6, 
BRF2, A
DAM2, C
8orf40, C
8orf86, F
GFR1, S
LC20A2
, THAP1
, RAB11
FIP1, 
SFRP1, 
GOT1L1
, GINS4,
 ERLIN2
, TACC1
, DKK4, 
ADAM9,
 VDAC3,
 
AGPAT6
, ADRB3
, EIF4EB
P1, PLE
KHA2, P
ROSC, Z
MAT4
2964
29941
3876p1
84
3197329
4998689
26
,789,563
142dup
1None,
 CEBPD
, PRKDC
, KIAA01
46, MCM
4, SNAI2
, UBE2V
2, EFCA
B1, 
POTEA, 
C8orf22
1780
19111
1241s1
224
1615961
2417091
23
93,162
40dup
3AQP1
2B, AQP
12A, KIF
1A
1195
68431
2480s1
17
1034792
01
0356645
8,725
16del
1MYH4
114
93134
47s1
14
8688408
4870520
9
16,801
12dup
1SLC5A
9
3059
62531
3601s1
93
5649844
3575412
6
104,282
134del
1CA9, C
CDC107
, MSMP,
 CREB3
, TPM2, 
TLN1, C
9orf100,
 SIT1, G
BA2, 
RGP1
1643
2081
3629s1
26
1505299
6152858
9
23,290
14dup
1USP34
callID
sampleID
chr
start
stop
i1M_star
ti1M
_stop
probes
aCHG_m
eani1
M_PASS
aCGH_pa
ss
Notes
1749123
83.p1
chr1
3413218
3417328
260
.048704
False Pos
itive
5495126
18.p1
chr1
11134287
11155938
9-0.
762897
YES
1561114
33.s1
chr1
40204572
40312969
260
.442855
YES
1581116
67.s1
chr1
42693553
42744343
42693597
42837441
250
.404631Y
ES
YES
4011872
.p1
chr1
65730593
65831879
65666329
65823558
690
.466047Y
ES
YES
5555130
97.p1
chr1
65849875
65855310
430
.530438
YES
1499111
18.p1
chr1
66837995
67000051
30-0
.057537
False Pos
itive
6912810
.p1
chr1
87029343
87038403
87028669
87038695
72-0
.519149Y
ES
YES
8299132
96.s1
chr1
11324518
4113
264970
250
.355266
YES
5512161
.s1
chr1
18255035
9182
555941
18254901
9182
564062
43-0
.571282Y
ES
YES
3711715
.s1
chr1
18509780
0185
130057
450
.507937
YES
6621189
5.s1
chr10
225952
532470
60
0.54439
YES
3250123
17.s1
chr10
18289595
18331762
18276761
18415963
380
.401112Y
ES
YES
8347129
97.p1
chr10
54527896
54531395
54524658
54537447
26-0
.577977Y
ES
YES
8348129
97.s1
chr10
54527896
54531395
54524658
54536839
26
-0.2795Y
ES
YES
3175112
67.s1
chr10
72604229
72645686
72601813
72824456
20-0
.629198Y
ES
YES
6270125
10.p1
chr10
90703553
90707143
27
0.00287
PPG, ACT
A2
6271125
10.s1
chr10
90703553
90707143
270
.084178
PPG, ACT
A2
3253123
83.s1
chr10
10159413
6101
596047
0
FP/Not Te
sted due t
o lack of a
CGH prob
es
3201115
19.p1
chr10
12221681
7122
349014
12181455
3122
500375
1-0.
887886Y
ES
YES
6334131
62.s1
chr10
13516887
3135
179599
0
FP/Not Te
sted due t
o lack of a
CGH prob
es
6971156
9.p1
chr11
5862185
5878932
23-0
.261248
YES
3345116
67.p1
chr11
18727363
18729842
19-0
.433224
YES
3347116
67.s1
chr11
18727363
18729842
19-0
.224613
YES
7281281
0.p1
chr11
32697110
32781789
32699987
32815580
42-0
.566457Y
ES
YES
6402128
36.s1
chr11
59620447
59623531
23-0
.879108
YES
6401128
36.p1
chr11
59620675
59622308
11-0
.839071
YES
7101178
8.s1
chr11
63057637
63059115
120
.456157
YES
7466112
29.p1
chr11
10055840
9100
859532
108
0.047092
PPG, ARH
GAP42L
7715111
15.s1
chr11
10055840
9100
831720
94-0
.060677
PPG, ARH
GAP42L
3346116
67.s1
chr11
10825664
6108
264105
580
.369384
YES
7717116
67.p1
chr11
10825664
6108
264105
580
.518831
YES
6407128
51.s1
chr11
13417701
7134
257553
13416461
8134
346119
390
.494821Y
ES
YES
7722112
82.p1
chr12
4651049
4668159
36
0.1066
PPG, RAD
51AP1
7723115
19.p1
chr12
4651049
4668159
36-0
.003573
PPG, RAD
51AP1
3495112
82.s1
chr12
4655474
4668159
29
0.03143
PPG, RAD
51AP1
8313129
97.p1
chr12
25264705
25267804
23-0
.992749
YES
8314129
97.s1
chr12
25264705
25267804
23-0
.563687
YES
6506128
36.s1
chr12
49688983
49691056
170
.150829
Confirmed
 w/ manu
al inspect
ion
3463110
90.p1
chr12
49688983
49691056
170
.274504
YES
3540118
28.p1
chr12
51203238
51213562
14-0
.075046
PPG, ATF
1
7741189
5.s1
chr12
10929078
1109
293251
19-0
.361741
YES
3489112
41.p1
chr12
12087592
9120
884632
650
.508616
YES
6505128
36.p1
chr12
12087592
9120
884632
12087372
6120
888619
650
.390249Y
ES
YES
7477120
11.s1
chr13
20425494
20426320
20420175
20451410
60.
482955Y
ES
YES
3671114
12.p1
chr13
96508411
96515968
59-0
.863395
YES
3672114
12.s1
chr13
96508411
96515968
59-0
.582202
YES
6555128
29.s1
chr13
11413815
4114
175048
180
.532308
YES
3661111
96.s1
chr13
11500759
5115
091756
170
0.201715
False Pos
itive
8651230
4.p1
chr14
65016715
65019579
220
.321413
YES
8771281
0.p1
chr14
65016715
65019579
220
.389589
YES
8420125
82.p1
chr14
69919957
69969596
69922552
69978718
230
.504025Y
ES
YES
3782111
18.p1
chr14
99182528
99183611
99181386
99193587
9-0.
694542Y
ES
YES
3826123
17.s1
chr14
10040236
5100
405664
10039831
8100
407548
250
.274271Y
ES
YES
3954123
17.s1
chr15
40993261
41001314
45-0
.759746
YES
6651130
18.p1
chr15
43627893
43644143
190
.140645
PPG, ADA
L
9061122
9.s1
chr15
76426604
76641077
76394419
76640658
420
.400188Y
ES
YES
6751130
18.p1
chr16
4624972
4642368
4614859
4643452
21-0
.710982Y
ES
YES
4043111
18.p1
chr16
4871380
4871598
2-0.
019551
FP/Not Te
sted due t
o lack of a
CGH prob
es
6696124
41.p1
chr16
21655610
21763826
250
.513189
YES
9431122
9.p1
chr16
31477170
31488897
31478711
31489033
120
.403372Y
ES
YES
9451122
9.s1
chr16
31477170
31487883
35183650
35284399
11
0.25197Y
ES
YES
7863114
12.p1
chr16
84693365
84812688
0
FP/Not Te
sted due t
o lack of a
CGH prob
es
1036125
78.p1
chr17
6010
11981
8547
34203
460
.124371Y
ES
4306115
32.p1
chr17
3518630
3561469
3503527
3561396
68-0
.731878Y
ES
YES
4276112
67.s1
chr17
7258523
7259951
100
.094905
False Pos
itive
4295114
84.s1
chr17
29311634
29324349
29301608
29319956
170
.592237Y
ES
YES
4353125
61.s1
chr17
72874459
72877385
24-0
.200099
YES
6838124
24.p1
chr17
80151631
80153240
13-0
.592442
YES
4435112
52.p1
chr18
688573
697355
691173
695030
70-0
.693998Y
ES
YES
1079119
42.p1
chr18
32953419
32954256
6-0.
445434
YES
4485126
56.s1
chr18
70416262
70417855
70402753
70412150
110
.121603Y
ES
1132121
61.s1
chr19
3805946
3834981
990
.136737
Confirmed
 w/ manu
al inspect
ion
1131121
61.p1
chr19
3807169
3833776
980
.206042
Confirmed
 w/ manu
al inspect
ion
7052128
37.s1
chr19
3816671
3825405
69-0
.449766
YES
7006122
52.s1
chr19
11436073
11448075
140
.008268
PPG, RAB
3D
7054128
38.s1
chr19
19030031
19033553
280
.176093
Confirmed
 w/ manu
al inspect
ion
1091110
29.p1
chr19
45494107
45495658
12-0
.270947
YES
1092110
29.s1
chr19
45494107
45495658
12-0
.421898
YES
4554113
04.s1
chr19
49481180
49497180
760
.262305
YES
7087132
96.s1
chr19
49671173
49675365
32-0
.303428
YES
7074131
62.s1
chr19
57286055
57293489
58
0.0126
False Pos
itive
5709132
96.s1
chr2
1437209
1479843
1426346
1520676
23
0.44676Y
ES
YES
1898111
18.p1
chr2
44527109
44541090
44519142
44545576
180
.414627Y
ES
YES
1985122
28.p1
chr2
74129486
74166149
128
0.007104
False Pos
itive
5682128
51.s1
chr2
96780544
97784254
97845100
98202258
840
.401272Y
ES
YES
5682128
51.s1
chr2
96780544
97784254
96731109
97577661
840
.401272Y
ES
YES
1931114
84.s1
chr2
98263529
98275940
180
.240511
YES
7964110
45.p1
chr2
10240718
1102
416105
700
.437141
YES
2016125
52.s1
chr2
11334644
2113
404739
300
.008806
False Pos
itive
5683128
51.s1
chr2
19271116
8193
059250
19271116
2193
252329
70-0
.634688Y
ES
YES
5689129
97.p1
chr2
23063226
9230
724290
23063087
4230
712174
135
0.486573
YES
YES
1601165
9.s1
chr2
23207095
1232
072965
16-0
.557958
YES
8293124
09.s1
chr20
3734704
3739325
350
.021807
PPG, C20
orf27
4742126
50.s1
chr20
3735043
3739325
330
.081261
PPG, C20
orf27
7992123
83.p1
chr20
3835271
3838456
240
.270166
YES
7993123
83.s1
chr20
3835271
3838456
3822148
3830474
240
.113595Y
ES
1194110
13.s1
chr20
7864239
7990962
7549585
8317018
230
.636108Y
ES
YES
1207118
72.p1
chr20
30533609
30538190
36
0.2861
YES
4712111
96.s1
chr20
62853245
62904953
260
.368368
YES
7149124
81.p1
chr21
45724911
45826648
220
.512218
YES
7216129
97.p1
chr22
17264508
17288963
120
.529071
YES
7191124
81.p1
chr22
21369463
21473538
21374550
21465780
2-0.
040647Y
ES
4841110
90.p1
chr22
23401639
24967945
24182500
24999104
401
0.385952
YES
YES
4841110
90.p1
chr22
23401639
24967945
23648009
24163081
401
0.385952
YES
YES
80251109
0.p1
chr22
24034217
24325797
24182500
24999104
90.5
26062YES
YES
80251109
0.p1
chr22
24034217
24325797
23648009
24163081
90.5
26062YES
YES
80241109
0.p1
chr22
24431961
24967945
24182500
24999104
90.4
62384YES
YES
80241109
0.p1
chr22
24431961
24967945
23648009
24163081
90.4
62384YES
YES
12631281
0.p1
chr22
32545739
32651316
32530256
32703072
750
.289551Y
ES
YES
48931182
8.p1
chr22
42781166
42899582
42788316
42896474
0
YES
12551201
1.s1
chr22
44883503
44893436
800
.420071
YES
83181240
9.s1
chr22
47287161
47308084
47285396
47443386
10-0
.673402Y
ES
YES
57661258
8.s1
chr3
7594808
7782093
7516145
7802705
1-0.8
41887YES
YES
83771285
1.s1
chr3
9867483
9871079
0
FP/Not Te
sted due t
o lack of a
CGH prob
es
57611248
1.p1
chr3
12632296
12791331
12633293
12806142
0
YES
22321238
3.s1
chr3
35833873
35835450
35811601
35938795
120
.366609Y
ES
YES
57561225
2.s1
chr3
81627075
81640315
81594581
81644911
18-0
.836112Y
ES
YES
57971316
2.s1
chr3
113588353
113619993
113576614
113619872
15-0
.708712Y
ES
YES
23712304
.p1c
hr3
132277814
132280061
16-0
.510444
YES
23012106
.s1c
hr3
137781657
137803095
90.4
62725
YES
21521130
4.s1
chr3
141884463
142084021
141821227
142085310
108
0.430274
YES
YES
57691263
1.s1
chr3
141889166
142084208
141821227
142065934
108
0.44434Y
ES
YES
27812161
.p1c
hr4
6594899
6613005
6594947
6611813
25-0
.279563Y
ES
YES
27912161
.s1c
hr4
6594899
6613005
6594947
6611813
25-0
.454313Y
ES
YES
58221240
9.s1
chr4
89978088
90035703
89967292
90149589
28-
0.69108Y
ES
YES
23731237
0.p1
chr4
159590764
159616795
130
.487415
YES
31411659
.s1c
hr5
40931165
40937792
40927429
40943442
53-0
.685554Y
ES
YES
24421145
6.s1
chr5
81283389
81354421
350
.013013
PPG, ATG1
0
58901285
1.s1
chr5
81283389
81354421
350
.001057
PPG, ATG1
0
58761258
8.s1
chr5
110427986
110446977
260
.488264
YES
24901237
0.p1
chr5
110430617
110446977
110417428
110441533
220
.550374Y
ES
YES
24921237
0.s1
chr5
110430617
110446977
220
.335151
YES
33612161
.p1c
hr5
121309890
121358102
250
.260764
YES
33712161
.s1c
hr5
121309890
121362821
121297115
121366608
270
.424875Y
ES
YES
34612390
.s1c
hr5
141312823
141314151
9-0.2
77458
YES
24541182
8.p1
chr5
156456743
156479665
11
0.50319
YES
75121101
3.s1
chr6
49426794
49440571
410
.415165
YES
37811229
.p1c
hr6
51747890
51752043
33-0
.798963
YES
39412106
.s1c
hr6
56915571
56919661
56917538
56954550
310
.496498Y
ES
YES
26121155
1.p1
chr6
88315634
88318947
26-0
.759539
YES
59621265
5.s1
chr6
146870599
146875741
146870930
146875322
40-0
.510074Y
ES
YES
26551265
0.s1
chr6
162683556
162864505
162632171
163059714
0
YES
26101151
9.p1
chr6
169617915
169646376
169243539
169795975
0
YES
46512304
.p1c
hr7
98628206
98633339
0
FP/Not Te
sted due t
o lack of a
CGH prob
es
47012578
.p1c
hr7
150706017
150725697
150705842
150723467
0
YES
60671283
7.s1
chr7
151833916
152027824
151826268
152055852
10.4
83768YES
YES
29181119
6.s1
chr8
190895
382935
116
0.250819
YES
84901263
1.s1
chr8
27378399
27380025
13-0
.520584
YES
53011715
.s1c
hr8
57078801
57080828
57055054
57098250
160
.498051Y
ES
YES
29221125
2.p1
chr8
144295142
144450815
147
0.374638
YES
59312106
.s1c
hr9
214508
340321
174447
359712
620
.929548Y
ES
YES
61941265
5.p1
chr9
368017
677009
788730
864122
630
.485291Y
ES
YES
61941265
5.p1
chr9
368017
677009
347559
699065
630
.485291Y
ES
YES
59712161
.s1c
hr9
5968018
6015607
5966898
6135411
230
.429288Y
ES
YES
61771225
2.s1
chr9
33166755
33261167
33141752
33260632
490
.519623Y
ES
YES
60712741
.s1c
hr9
35228011
35237823
35171905
35238095
77-0
.957764Y
ES
YES
59912578
.p1c
hr9
35662942
35664489
12-0
.196435
YES
61911263
7.p1
chr9
35662942
35664489
12-0
.457227
YES
62011282
9.s1
chr9
35662942
35664489
12-
0.39486
YES
62331329
6.s1
chr9
125562401
125589066
130
.484615
YES
False P
os.
Validat
edT
otal Te
sted
FPR
Chi Sq
uare/F
isher E
xact
P value
Datase
tI
ossifov
 et al.
1
51
52
0.02
O?Roak
 et al.
0
44
44
0.00
Sander
s et al.
6
53
59
0.10
Affecte
dP
roband
s
3
68
71
0.04
Sibling
s
4
80
84
0.05
Size
2 exon
s
2
26
28
0.07
3+ exo
ns
5
122
127
0.04
Del/Du
pD
eletions
1
54
55
0.02
Duplica
tions
6
89
95
0.06
Total
7
148
155
0.05
Missed
Found
Total
FNR
Chi Sq
uare/F
isher E
xact
P value
Datase
tI
ossifov
 et al.
76
111
187
0.41
O?Roak
 et al.
21
77
98
0.21
Sander
s et al.
55
201
256
0.21
Offsprin
gP
roband
s
93
220
313
0.29
Sibling
s
59
169
228
0.26
Size
2 exon
s
39
24
63
0.62
3+ exo
ns
113
365
478
0.24
Total
152
389
541
0.28O
R=1.21
0.355
OR=0.1
9
< 1*10-
8
OR=0.5
33
0.611
OR=0.2
75
0.432
False N
egative
 Rate
X^2 = 2
2.3
< 1*10-
4
False P
ositive
 Rate
False P
ositive
 Rates
?^2 = 7.
26
0.026
OR=0.8
82
1
Samp
le
SRS S
core
FSIQ
SRS Disco
rdant
Sex
# of ra
re CNVs
# of d
e nov
o 
SNVs
# of d
e nov
o 
CNVs
11000.p
1
70
65
FALSE
male
1
0
0
11000.s
1
48
FALSE
female
0
0
0
11008.p
1
65
129
FALSE
male
1
1
0
11008.s
1
FALSE
male
1
0
0
11010.p
1
76
65
TRUE
male
0
0
0
11010.s
1
40
TRUE
male
0
0
0
11013.p
1
90
132
TRUE
male
1
0
0
11013.s
1
47
TRUE
male
2
0
0
11014.p
1
75
148
FALSE
male
1
0
0
11014.s
1
48
FALSE
male
0
0
0
11029.p
1
90
29
TRUE
female
1
0
0
11029.s
1
41
TRUE
male
1
0
0
11045.p
1
60
79
FALSE
female
1
0
0
11045.s
1
53
FALSE
male
0
0
0
11057.p
1
64
89
FALSE
male
0
0
0
11057.s
1
34
FALSE
male
1
0
0
11060.p
1
66
100
FALSE
male
1
0
0
11060.s
2
40
FALSE
male
0
0
0
11066.p
1
82
88
TRUE
male
1
0
0
11066.s
2
37
TRUE
male
0
0
0
11067.p
1
85
130
TRUE
male
1
0
0
11067.s
1
40
TRUE
female
2
0
0
11074.p
1
63
68
FALSE
male
0
1
0
11074.s
1
35
FALSE
male
0
0
0
11075.p
1
67
39
FALSE
male
2
0
0
11075.s
1
36
FALSE
male
2
0
0
11077.p
1
79
33
TRUE
male
0
1
0
11077.s
1
37
TRUE
male
0
0
0
11079.p
1
83
53
TRUE
male
0
0
1
11079.s
1
42
TRUE
female
0
0
0
11085.p
1
82
61
TRUE
male
0
0
0
11085.s
1
39
TRUE
female
1
0
0
11089.p
1
84
40
TRUE
male
0
0
0
11089.s
1
57
TRUE
male
0
0
0
11090.p
1
83
56
TRUE
male
4
0
1
11090.s
1
42
TRUE
male
0
0
0
11092.p
1
76
109
TRUE
male
1
0
0
11092.s
1
35
TRUE
female
0
0
0
11094.p
1
76
87
TRUE
male
2
0
0
11094.s
1
46
TRUE
male
0
0
0
11107.p
1
90
30
TRUE
male
3
0
0
11107.s
1
39
TRUE
male
2
0
0
11108.p
1
75
104
FALSE
male
2
0
0
11108.s
1
51
FALSE
male
1
0
0
11114.p
1
90
40
TRUE
female
1
1
0
11114.s
1
39
TRUE
female
1
0
0
11115.p
1
90
37
TRUE
female
1
0
0
11115.s
1
50
TRUE
female
1
0
0
11117.p
1
89
121
TRUE
female
0
0
0
11117.s
1
41
TRUE
female
1
0
0
11118.p
1
80
93
TRUE
female
4
0
0
11118.s
1
51
TRUE
male
1
0
0
11132.p
1
90
47
FALSE
male
0
1
0
11132.s
1
64
FALSE
male
0
0
0
11146.p
1
90
85
TRUE
male
2
0
0
11146.s
1
40
TRUE
male
2
0
0
11154.p
1
90
90
TRUE
male
0
0
1
11154.s
1
45
TRUE
female
1
0
0
11172.p
1
71
63
FALSE
male
0
0
0
11172.s
1
40
FALSE
female
0
0
0
11180.p
1
27
FALSE
female
2
0
0
11180.s
1
43
FALSE
male
3
0
0
11190.p
1
80
69
TRUE
male
1
0
0
11190.s
1
55
TRUE
female
2
0
0
11196.p
1
80
112
TRUE
male
5
0
0
11196.s
1
53
TRUE
male
3
0
0
11203.p
1
84
86
TRUE
female
0
0
0
11203.s
1
53
TRUE
female
2
0
0
11219.p
1
51
99
FALSE
male
2
0
0
11219.s
1
36
FALSE
female
1
0
0
11220.p
1
75
80
FALSE
female
1
0
0
11220.s
1
41
FALSE
female
3
0
0
11229.p
1
73
63
FALSE
male
3
0
0
11229.s
1
48
FALSE
male
2
0
0
11241.p
1
90
76
TRUE
male
1
0
0
11241.s
1
38
TRUE
male
1
0
1
11242.p
1
79
94
TRUE
male
1
0
0
11242.s
1
48
TRUE
male
1
0
0
11247.p
1
87
128
TRUE
male
1
0
0
11247.s
1
56
TRUE
female
0
0
0
11252.p
1
86
78
FALSE
male
2
0
0
11252.s
1
62
FALSE
male
2
0
0
11265.p
1
90
106
TRUE
male
0
0
1
11265.s
1
47
TRUE
female
0
0
0
11267.p
1
90
85
TRUE
female
2
0
0
11267.s
1
54
TRUE
male
2
0
0
11282.p
1
89
75
TRUE
male
1
0
0
11282.s
1
42
TRUE
male
1
0
0
11285.p
1
13
FALSE
male
0
0
0
11285.s
1
44
FALSE
male
1
0
0
11290.p
1
65
119
FALSE
male
0
0
0
11290.s
1
39
FALSE
female
0
0
0
11291.p
1
68
86
FALSE
male
0
1
0
11291.s
1
51
FALSE
female
0
0
0
11298.p
1
90
141
TRUE
male
3
0
0
11298.s
1
37
TRUE
male
2
0
0
11301.p
1
82
101
TRUE
male
1
0
0
11301.s
1
44
TRUE
female
1
0
0
11304.p
1
90
53
TRUE
male
1
0
0
11304.s
1
51
TRUE
female
2
0
0
11316.p
1
90
47
TRUE
female
1
0
0
11316.s
1
54
TRUE
female
0
0
0
11336.p
1
81
123
TRUE
male
3
0
0
11336.s
1
45
TRUE
female
3
0
0
11353.p
1
90
79
TRUE
female
1
0
1
11353.s
1
40
TRUE
male
0
0
0
11356.p
1
90
72
TRUE
female
3
1
0
11356.s
1
42
TRUE
male
1
0
0
11364.p
1
72
106
FALSE
male
1
0
0
11364.s
1
40
FALSE
female
3
0
0
11382.p
1
90
76
FALSE
male
0
0
0
11382.s
1
63
FALSE
female
0
1
0
11390.p
1
90
66
TRUE
female
0
0
0
11390.s
1
45
TRUE
female
0
0
0
11411.p
1
90
61
TRUE
male
0
0
0
11411.s
1
41
TRUE
female
1
0
0
11412.p
1
70
107
FALSE
male
2
0
0
11412.s
1
41
FALSE
female
1
0
0
11429.p
1
72
99
FALSE
male
1
0
0
11429.s
1
FALSE
male
2
0
0
11433.p
1
89
78
TRUE
male
1
0
1
11433.s
1
45
TRUE
female
1
0
0
11437.p
1
75
82
FALSE
male
1
0
0
11437.s
1
35
FALSE
male
1
0
0
11452.p
1
90
80
TRUE
male
0
1
0
11452.s
1
35
TRUE
male
1
0
0
11456.p
1
79
75
TRUE
male
0
0
0
11456.s
1
54
TRUE
male
1
0
0
11459.p
1
90
80
TRUE
male
2
0
0
11459.s
1
42
TRUE
male
2
0
0
11462.p
1
74
114
FALSE
male
0
0
0
11462.s
1
50
FALSE
male
0
0
0
11469.p
1
79
109
TRUE
male
1
0
0
11469.s
1
49
TRUE
female
1
0
0
11472.p
1
90
30
TRUE
female
2
0
0
11472.s
1
41
TRUE
female
1
0
0
11474.p
1
89
116
FALSE
male
1
0
0
11474.s
1
FALSE
male
1
0
0
11479.p
1
79
133
TRUE
male
3
0
0
11479.s
1
46
TRUE
female
2
0
0
11484.p
1
76
106
TRUE
male
1
0
0
11484.s
1
43
TRUE
male
2
0
0
11490.p
1
83
84
TRUE
male
0
0
0
11490.s
1
50
TRUE
female
0
1
0
11491.p
1
74
53
FALSE
male
0
0
0
11491.s
1
42
FALSE
male
0
0
0
11501.p
1
64
78
FALSE
male
2
0
0
11501.s
1
48
FALSE
male
2
0
0
11509.p
1
90
80
FALSE
male
0
0
0
11509.s
1
FALSE
male
2
0
0
11519.p
1
78
50
TRUE
male
3
0
0
11519.s
1
47
TRUE
female
1
0
0
11524.p
1
74
113
FALSE
male
0
1
0
11524.s
1
40
FALSE
male
0
0
0
11532.p
1
77
59
TRUE
male
1
0
1
11532.s
1
39
TRUE
female
0
0
0
11551.p
1
90
98
TRUE
male
1
0
0
11551.s
1
39
TRUE
female
0
0
0
11561.p
1
71
109
FALSE
male
1
0
0
11561.s
1
42
FALSE
male
1
0
0
11569.p
1
90
59
TRUE
female
1
0
0
11569.s
1
45
TRUE
male
1
0
0
11571.p
1
90
100
TRUE
male
1
0
0
11571.s
1
51
TRUE
female
0
0
0
11581.p
1
78
64
TRUE
male
3
0
0
11581.s
1
50
TRUE
male
1
0
0
11610.p
1
82
127
FALSE
male
1
1
0
11610.s
1
60
FALSE
male
2
0
0
11611.p
1
90
32
TRUE
female
1
0
0
11611.s
1
34
TRUE
male
1
0
0
11622.p
1
90
97
TRUE
male
3
0
0
11622.s
1
50
TRUE
female
3
0
0
11629.p
1
90
50
TRUE
male
3
0
0
11629.s
1
47
TRUE
female
2
0
0
11638.p
1
90
55
TRUE
male
0
0
0
11638.s
1
46
TRUE
male
0
0
0
11641.p
1
80
93
TRUE
male
0
0
0
11641.s
1
41
TRUE
male
0
0
0
11654.p
1
89
40
TRUE
female
1
0
0
11654.s
1
57
TRUE
female
0
0
0
11659.p
1
74
88
FALSE
female
1
0
0
11659.s
1
55
FALSE
female
2
0
0
11667.p
1
78
53
TRUE
male
2
0
0
11667.s
1
45
TRUE
male
3
0
0
11676.p
1
90
78
TRUE
female
2
0
0
11676.s
1
40
TRUE
female
1
0
0
11691.p
1
90
59
TRUE
male
0
0
0
11691.s
1
47
TRUE
male
0
0
0
11696.p
1
71
95
FALSE
male
2
0
0
11696.s
1
56
FALSE
male
0
0
0
11700.p
1
65
88
FALSE
male
1
0
0
11700.s
1
41
FALSE
female
0
0
0
11711.p
1
88
94
TRUE
male
2
0
0
11711.s
1
48
TRUE
male
2
0
0
11715.p
1
90
96
TRUE
male
1
1
0
11715.s
1
40
TRUE
female
2
0
0
11716.p
1
90
49
TRUE
male
4
0
0
11716.s
1
42
TRUE
male
1
0
0
11720.p
1
59
68
FALSE
male
0
0
0
11720.s
1
39
FALSE
female
0
0
0
11722.p
1
81
97
TRUE
male
2
0
0
11722.s
1
42
TRUE
male
0
0
0
11724.p
1
84
59
TRUE
male
1
0
0
11724.s
1
35
TRUE
female
0
0
0
11740.p
1
75
98
FALSE
male
1
0
0
11740.s
1
42
FALSE
male
0
0
0
11766.p
1
57
104
FALSE
male
1
0
0
11766.s
1
42
FALSE
female
1
1
0
11773.p
1
90
43
TRUE
male
1
0
0
11773.s
1
42
TRUE
male
1
0
0
11788.p
1
84
84
TRUE
male
1
0
0
11788.s
1
41
TRUE
male
1
0
0
11797.p
1
76
118
TRUE
male
1
0
0
11797.s
1
51
TRUE
female
2
0
0
11808.p
1
71
84
FALSE
female
1
0
0
11808.s
1
36
FALSE
female
0
0
0
11809.p
1
89
92
TRUE
male
0
0
0
11809.s
1
53
TRUE
female
0
0
0
11810.p
1
90
92
FALSE
male
4
0
0
11810.s
1
86
FALSE
female
2
1
0
11824.p
1
79
81
FALSE
male
1
0
0
11824.s
1
68
FALSE
male
1
0
0
11828.p
1
88
74
TRUE
male
3
0
0
11828.s
1
41
TRUE
female
2
0
0
11872.p
1
88
62
TRUE
female
2
1
0
11872.s
1
49
TRUE
female
0
0
0
11892.p
1
70
55
FALSE
male
0
1
0
11892.s
1
41
FALSE
female
0
0
0
11894.p
1
70
114
FALSE
male
1
0
0
11894.s
1
40
FALSE
male
0
0
0
11895.p
1
70
86
FALSE
male
1
0
0
11895.s
1
47
FALSE
male
2
0
0
11905.p
1
90
53
TRUE
male
1
0
0
11905.s
1
34
TRUE
male
0
0
0
11942.p
1
65
50
FALSE
male
1
0
0
11942.s
1
36
FALSE
male
0
0
0
11959.p
1
70
64
FALSE
male
0
0
0
11959.s
1
44
FALSE
female
1
0
0
11962.p
1
75
81
FALSE
male
0
0
0
11962.s
1
49
FALSE
male
0
0
0
11964.p
1
90
40
TRUE
female
2
0
0
11964.s
1
44
TRUE
female
0
0
0
12011.p
1
82
82
TRUE
male
2
0
0
12011.s
1
43
TRUE
male
2
0
0
12051.p
1
90
62
TRUE
male
0
0
0
12051.s
1
39
TRUE
male
0
0
0
12100.p
1
90
71
TRUE
male
2
0
1
12100.s
1
38
TRUE
female
1
0
0
12106.p
1
88
112
TRUE
male
3
0
0
12106.s
1
56
TRUE
female
3
0
0
12152.p
1
87
114
TRUE
male
0
0
0
12152.s
1
42
TRUE
male
1
0
0
12153.p
1
67
60
FALSE
male
0
0
0
12153.s
1
38
FALSE
female
0
1
0
12161.p
1
55
106
FALSE
female
3
2
0
12161.s
1
44
FALSE
female
5
0
0
12162.p
1
77
67
TRUE
male
1
0
1
12162.s
1
35
TRUE
female
0
0
0
12175.p
1
83
71
TRUE
male
1
0
0
12175.s
1
41
TRUE
female
1
0
0
12187.p
1
67
109
FALSE
male
0
0
0
12187.s
1
42
FALSE
female
0
0
0
12224.p
1
89
80
TRUE
male
1
1
1
12224.s
1
41
TRUE
male
1
0
0
12228.p
1
83
108
TRUE
male
1
0
0
12228.s
1
54
TRUE
female
0
0
0
12233.p
1
51
106
FALSE
male
0
0
0
12233.s
1
36
FALSE
female
0
0
0
12235.p
1
76
79
TRUE
male
0
0
0
12235.s
1
36
TRUE
male
0
0
0
12241.p
1
90
72
TRUE
female
0
0
0
12241.s
1
41
TRUE
male
0
0
0
12243.p
1
83
97
TRUE
male
0
0
0
12243.s
1
44
TRUE
male
0
0
0
12252.p
1
74
87
FALSE
male
2
0
0
12252.s
1
41
FALSE
male
3
0
0
12285.p
1
75
75
FALSE
female
2
0
0
12285.s
1
47
FALSE
male
2
0
0
12295.p
1
90
104
FALSE
male
1
0
0
12295.s
1
FALSE
male
1
0
0
12297.p
1
89
97
TRUE
male
1
0
0
12297.s
1
45
TRUE
male
2
0
0
12301.p
1
119
FALSE
male
0
0
0
12301.s
1
59
FALSE
female
0
0
0
12303.p
1
90
79
TRUE
male
2
0
0
12303.s
1
41
TRUE
female
0
0
0
12304.p
1
65
83
FALSE
male
3
0
0
12304.s
1
39
FALSE
male
2
0
0
12308.p
1
90
105
TRUE
female
3
0
1
12308.s
1
39
TRUE
male
0
0
0
12313.p
1
90
115
TRUE
female
1
0
0
12313.s
1
38
TRUE
female
1
0
0
12317.p
1
53
91
FALSE
male
2
0
0
12317.s
1
46
FALSE
female
3
0
0
12321.p
1
80
96
TRUE
male
0
0
0
12321.s
1
44
TRUE
female
0
0
0
12327.p
1
82
97
TRUE
male
0
0
0
12327.s
1
53
TRUE
female
0
0
0
12334.p
1
61
84
FALSE
male
6
0
0
12334.s
1
48
FALSE
male
3
2
0
12340.p
1
90
26
TRUE
female
1
1
0
12340.s
1
39
TRUE
female
0
1
0
12343.p
1
30
FALSE
female
0
0
1
12343.s
1
39
FALSE
female
0
0
0
12345.p
1
82
91
TRUE
male
0
0
0
12345.s
1
42
TRUE
female
0
0
0
12358.p
1
90
36
TRUE
female
1
0
0
12358.s
1
39
TRUE
male
0
0
0
12360.p
1
90
104
TRUE
male
0
0
0
12360.s
1
41
TRUE
female
0
0
0
12368.p
1
81
47
TRUE
male
1
0
0
12368.s
1
39
TRUE
male
1
0
0
12370.p
1
80
130
TRUE
male
2
0
0
12370.s
1
59
TRUE
female
1
0
0
12373.p
1
90
TRUE
male
2
0
0
12373.s
1
52
TRUE
female
1
0
0
12375.p
1
89
94
TRUE
male
1
0
0
12375.s
1
46
TRUE
female
1
0
0
12383.p
1
80
101
FALSE
male
2
0
0
12383.s
1
72
FALSE
male
3
0
0
12390.p
1
90
86
TRUE
male
0
0
0
12390.s
1
36
TRUE
male
1
0
0
12394.p
1
77
88
TRUE
male
2
0
0
12394.s
1
36
TRUE
male
2
0
0
12396.p
1
90
99
TRUE
male
2
0
0
12396.s
1
51
TRUE
female
1
0
0
12403.p
1
90
104
TRUE
male
1
0
0
12403.s
1
39
TRUE
female
0
0
0
12409.p
1
77
107
TRUE
male
2
1
0
12409.s
1
44
TRUE
female
3
0
0
12412.p
1
90
98
TRUE
male
0
0
0
12412.s
1
55
TRUE
male
0
0
0
12420.p
1
48
131
FALSE
male
3
0
0
12420.s
1
35
FALSE
female
3
0
0
12424.p
1
63
69
FALSE
male
1
0
0
12424.s
1
45
FALSE
female
0
0
0
12438.p
1
87
75
TRUE
male
0
1
0
12438.s
1
48
TRUE
male
0
0
0
12441.p
1
90
28
TRUE
male
1
0
0
12441.s
1
45
TRUE
female
0
0
0
12445.p
1
81
104
TRUE
male
1
0
0
12445.s
1
34
TRUE
male
1
0
0
12460.p
1
83
68
TRUE
male
1
0
0
12460.s
1
42
TRUE
male
0
0
0
12462.p
1
76
112
TRUE
male
0
0
0
12462.s
1
36
TRUE
female
1
1
0
12463.p
1
83
84
TRUE
male
3
1
0
12463.s
1
45
TRUE
female
0
0
0
12467.p
1
70
102
FALSE
male
1
0
0
12467.s
1
47
FALSE
female
1
0
0
12473.p
1
88
49
TRUE
male
2
0
0
12473.s
1
37
TRUE
female
0
0
0
12480.p
1
90
86
TRUE
male
1
0
0
12480.s
1
36
TRUE
male
0
0
1
12481.p
1
63
42
FALSE
male
3
0
0
12481.s
1
34
FALSE
male
2
0
0
12498.p
1
90
65
TRUE
male
2
0
0
12498.s
1
38
TRUE
female
2
0
0
12507.p
1
89
80
FALSE
male
2
0
0
12507.s
1
FALSE
female
2
0
0
12510.p
1
90
52
TRUE
male
1
0
0
12510.s
1
45
TRUE
male
1
0
0
12512.p
1
75
85
FALSE
male
0
0
0
12512.s
1
38
FALSE
female
2
0
0
12515.p
1
80
104
TRUE
female
0
0
0
12515.s
1
46
TRUE
male
0
0
0
12518.p
1
62
88
FALSE
male
1
0
0
12518.s
1
54
FALSE
female
3
0
0
12522.p
1
81
68
TRUE
male
0
0
0
12522.s
1
41
TRUE
male
0
0
0
12523.p
1
49
91
FALSE
male
2
0
0
12523.s
1
56
FALSE
female
1
0
0
12524.p
1
80
146
TRUE
female
1
0
0
12524.s
1
50
TRUE
female
0
0
0
12526.p
1
82
103
TRUE
male
0
0
0
12526.s
1
36
TRUE
female
1
0
0
12534.p
1
90
81
TRUE
female
3
0
0
12534.s
1
42
TRUE
female
0
0
0
12536.p
1
87
85
FALSE
male
0
0
0
12536.s
1
75
FALSE
female
0
0
0
12552.p
1
90
104
TRUE
male
0
0
0
12552.s
1
54
TRUE
male
1
0
0
12561.p
1
73
102
FALSE
male
2
1
0
12561.s
1
50
FALSE
female
1
0
0
12578.p
1
79
81
TRUE
male
3
0
0
12578.s
1
42
TRUE
female
1
0
0
12579.p
1
90
33
TRUE
male
2
0
0
12579.s
1
43
TRUE
female
2
0
0
12581.p
1
90
34
TRUE
female
1
0
0
12581.s
1
37
TRUE
male
1
0
0
12582.p
1
90
57
TRUE
male
1
0
0
12582.s
1
41
TRUE
male
0
0
0
12588.p
1
85
106
TRUE
male
1
0
0
12588.s
1
47
TRUE
male
2
1
0
12616.p
1
63
111
FALSE
male
1
0
0
12616.s
1
38
FALSE
male
1
0
0
12618.p
1
90
106
TRUE
male
1
0
0
12618.s
1
42
TRUE
male
1
0
0
12620.p
1
72
83
FALSE
male
0
0
0
12620.s
1
46
FALSE
female
0
0
0
12626.p
1
87
92
TRUE
male
2
0
0
12626.s
1
45
TRUE
female
2
0
0
12628.p
1
90
118
TRUE
male
1
0
0
12628.s
1
54
TRUE
female
1
0
0
12630.p
1
90
129
TRUE
male
0
0
0
12630.s
1
41
TRUE
female
1
0
0
12631.p
1
78
80
FALSE
male
2
0
0
12631.s
1
FALSE
male
2
0
0
12633.p
1
64
80
FALSE
male
0
0
0
12633.s
1
37
FALSE
female
0
0
0
12637.p
1
72
105
FALSE
male
1
0
0
12637.s
1
40
FALSE
male
1
0
0
12638.p
1
59
73
FALSE
male
0
0
0
12638.s
1
42
FALSE
female
0
0
0
12642.p
1
88
99
TRUE
male
0
0
0
12642.s
1
43
TRUE
male
0
0
0
12644.p
1
90
106
TRUE
male
1
0
0
12644.s
1
39
TRUE
male
1
0
0
12645.p
1
90
86
TRUE
male
1
1
0
12645.s
1
36
TRUE
male
0
0
0
12647.p
1
88
72
TRUE
male
2
0
0
12647.s
1
40
TRUE
male
1
0
0
12650.p
1
79
104
FALSE
male
1
0
0
12650.s
1
63
FALSE
female
2
0
0
12651.p
1
85
24
TRUE
male
0
0
0
12651.s
1
38
TRUE
male
1
0
0
12652.p
1
90
73
TRUE
male
0
1
0
12652.s
1
45
TRUE
female
1
0
0
12653.p
1
85
77
TRUE
male
0
2
0
12653.s
1
38
TRUE
female
0
0
0
12655.p
1
76
48
TRUE
male
1
0
0
12655.s
1
35
TRUE
male
1
0
0
12656.p
1
83
40
TRUE
male
2
0
0
12656.s
1
35
TRUE
male
1
0
0
12657.p
1
66
80
FALSE
female
0
0
0
12657.s
1
39
FALSE
female
0
0
0
12664.p
1
89
93
TRUE
male
0
0
0
12664.s
1
35
TRUE
male
0
0
0
12680.p
1
77
47
TRUE
male
1
0
0
12680.s
1
47
TRUE
male
0
0
0
12683.p
1
66
89
FALSE
male
1
1
0
12683.s
1
36
FALSE
male
1
0
0
12685.p
1
87
100
TRUE
male
0
1
0
12685.s
1
44
TRUE
female
0
0
0
12688.p
1
71
116
FALSE
male
0
0
0
12688.s
1
36
FALSE
male
0
0
0
12690.p
1
90
110
TRUE
male
1
0
0
12690.s
1
40
TRUE
male
0
0
0
12691.p
1
73
34
FALSE
female
2
0
1
12691.s
1
FALSE
male
1
0
0
12697.p
1
80
85
TRUE
male
3
0
0
12697.s
1
46
TRUE
female
0
0
0
12703.p
1
72
59
FALSE
male
0
0
0
12703.s
1
38
FALSE
female
0
0
0
12705.p
1
75
99
FALSE
male
1
1
0
12705.s
1
38
FALSE
male
1
0
0
12708.p
1
90
79
TRUE
male
0
0
0
12708.s
1
41
TRUE
female
0
0
0
12716.p
1
88
112
TRUE
male
1
0
0
12716.s
1
56
TRUE
female
1
0
0
12719.p
1
90
46
TRUE
male
1
0
0
12719.s
1
48
TRUE
female
1
0
0
12720.p
1
59
69
FALSE
male
0
0
0
12720.s
1
49
FALSE
male
0
0
0
12723.p
1
82
48
TRUE
male
1
0
0
12723.s
1
57
TRUE
female
1
0
0
12724.p
1
87
83
TRUE
female
0
0
0
12724.s
1
49
TRUE
male
0
0
0
12727.p
1
78
90
TRUE
male
1
0
0
12727.s
1
58
TRUE
male
2
1
0
12729.p
1
90
74
TRUE
female
2
0
0
12729.s
1
37
TRUE
male
2
0
0
12733.p
1
66
90
FALSE
male
1
0
0
12733.s
1
45
FALSE
female
2
0
0
12735.p
1
90
55
TRUE
male
2
0
1
12735.s
1
39
TRUE
male
0
0
0
12736.p
1
85
101
TRUE
male
1
0
1
12736.s
1
53
TRUE
female
1
0
0
12739.p
1
82
98
TRUE
male
0
0
0
12739.s
1
41
TRUE
male
0
0
0
12741.p
1
86
74
TRUE
male
2
0
0
12741.s
1
38
TRUE
male
1
0
0
12743.p
1
78
116
FALSE
male
1
0
0
12743.s
1
62
FALSE
male
0
0
0
12748.p
1
90
92
TRUE
male
0
0
0
12748.s
1
43
TRUE
male
0
0
0
12758.p
1
80
74
TRUE
male
1
0
0
12758.s
1
40
TRUE
female
0
0
0
12759.p
1
62
105
FALSE
male
0
0
0
12759.s
1
39
FALSE
female
0
0
0
12763.p
1
90
37
TRUE
male
1
0
0
12763.s
1
42
TRUE
male
0
0
0
12764.p
1
90
94
TRUE
female
0
1
0
12764.s
1
40
TRUE
female
0
0
0
12770.p
1
80
122
TRUE
male
1
0
0
12770.s
1
47
TRUE
male
1
0
0
12780.p
1
79
115
TRUE
male
1
0
0
12780.s
1
49
TRUE
male
2
0
0
12790.p
1
84
31
TRUE
male
1
0
0
12790.s
1
41
TRUE
female
0
0
0
12802.p
1
90
TRUE
male
1
0
0
12802.s
1
35
TRUE
male
1
0
0
12810.p
1
90
63
TRUE
male
4
0
0
12810.s
1
38
TRUE
female
1
0
0
12826.p
1
90
66
TRUE
female
3
1
0
12826.s
1
42
TRUE
male
2
0
0
12829.p
1
72
133
FALSE
male
1
0
0
12829.s
1
70
FALSE
male
2
0
0
12833.p
1
76
65
TRUE
male
1
0
0
12833.s
1
49
TRUE
female
1
0
0
12836.p
1
65
127
FALSE
male
2
0
0
12836.s
1
56
FALSE
male
2
0
0
12837.p
1
86
89
TRUE
male
4
0
0
12837.s
1
57
TRUE
female
2
0
0
12838.p
1
80
78
TRUE
male
0
0
0
12838.s
1
41
TRUE
female
1
1
0
12840.p
1
90
54
TRUE
male
2
2
0
12840.s
1
39
TRUE
female
2
0
0
12843.p
1
72
93
FALSE
male
1
0
0
12843.s
1
45
FALSE
female
0
0
0
12851.p
1
90
37
TRUE
male
2
0
0
12851.s
1
41
TRUE
male
5
0
0
12852.p
1
78
77
TRUE
male
0
0
0
12852.s
1
44
TRUE
male
0
0
0
12869.p
1
90
31
TRUE
female
3
0
0
12869.s
1
36
TRUE
female
1
0
0
12905.p
1
90
66
TRUE
male
0
0
0
12905.s
1
41
TRUE
male
0
0
0
12906.p
1
90
61
TRUE
male
1
0
0
12906.s
1
40
TRUE
female
1
0
0
12937.p
1
90
35
TRUE
male
0
0
0
12937.s
1
46
TRUE
male
2
0
0
12958.p
1
57
88
FALSE
male
0
0
0
12958.s
1
43
FALSE
male
0
0
0
12962.p
1
79
79
TRUE
male
0
0
0
12962.s
1
37
TRUE
female
1
0
0
12975.p
1
68
89
FALSE
male
0
0
1
12975.s
1
41
FALSE
female
0
0
0
12984.p
1
82
100
TRUE
male
0
0
0
12984.s
1
46
TRUE
male
0
0
0
12997.p
1
81
97
TRUE
male
4
0
0
12997.s
1
40
TRUE
female
2
0
0
13000.p
1
79
38
TRUE
male
0
0
0
13000.s
1
37
TRUE
female
1
0
0
13016.p
1
52
98
FALSE
male
0
0
0
13016.s
1
40
FALSE
female
0
0
0
13018.p
1
76
78
TRUE
male
2
1
1
13018.s
1
41
TRUE
male
1
0
0
13048.p
1
90
35
TRUE
female
1
0
0
13048.s
1
40
TRUE
female
1
0
0
13063.p
1
74
49
FALSE
male
1
0
0
13063.s
1
41
FALSE
female
1
0
0
13073.p
1
90
43
TRUE
male
0
0
0
13073.s
1
34
TRUE
male
0
0
0
13094.p
1
67
71
FALSE
male
2
1
0
13094.s
1
45
FALSE
male
1
0
0
13096.p
1
85
107
TRUE
male
0
1
0
13096.s
1
39
TRUE
male
1
0
0
13097.p
1
73
34
FALSE
male
1
0
0
13097.s
1
40
FALSE
female
3
0
0
13099.p
1
86
92
TRUE
male
1
0
0
13099.s
1
38
TRUE
male
1
0
0
13101.p
1
90
94
TRUE
female
0
0
0
13101.s
1
51
TRUE
female
1
0
0
13104.p
1
87
97
TRUE
male
0
0
0
13104.s
1
44
TRUE
female
0
0
0
13116.p
1
90
35
TRUE
female
3
0
0
13116.s
1
42
TRUE
female
3
0
0
13120.p
1
90
113
TRUE
male
0
0
0
13120.s
1
38
TRUE
male
0
0
0
13125.p
1
86
97
TRUE
male
2
1
0
13125.s
1
58
TRUE
female
2
0
0
13129.p
1
78
31
TRUE
male
1
0
0
13129.s
1
35
TRUE
female
2
0
0
13131.p
1
80
75
TRUE
male
0
0
0
13131.s
1
45
TRUE
male
0
1
0
13139.p
1
73
127
FALSE
male
2
0
0
13139.s
1
39
FALSE
female
1
0
0
13144.p
1
77
112
TRUE
male
1
0
0
13144.s
1
53
TRUE
male
1
1
0
13146.p
1
86
106
TRUE
male
0
0
0
13146.s
1
37
TRUE
female
0
0
0
13148.p
1
78
52
TRUE
male
2
0
0
13148.s
1
51
TRUE
female
2
0
0
13152.p
1
90
77
TRUE
male
0
0
0
13152.s
1
41
TRUE
female
0
0
0
13153.p
1
76
75
TRUE
male
2
0
0
13153.s
1
34
TRUE
male
1
0
0
13154.p
1
80
40
TRUE
male
0
0
0
13154.s
1
36
TRUE
male
0
0
0
13159.p
1
75
79
FALSE
male
0
0
0
13159.s
1
48
FALSE
male
0
0
0
13162.p
1
90
74
TRUE
male
3
1
0
13162.s
1
36
TRUE
female
3
0
0
13165.p
1
90
51
TRUE
male
0
0
0
13165.s
1
49
TRUE
male
1
0
0
13166.p
1
90
46
TRUE
male
0
0
0
13166.s
1
39
TRUE
male
1
0
0
13168.p
1
70
104
FALSE
female
0
1
0
13168.s
1
45
FALSE
female
0
0
0
13169.p
1
49
45
FALSE
male
0
0
0
13169.s
1
40
FALSE
male
3
0
0
13171.p
1
90
51
TRUE
female
1
0
0
13171.s
1
40
TRUE
female
0
0
0
13174.p
1
86
106
TRUE
male
0
0
0
13174.s
1
38
TRUE
male
0
0
0
13176.p
1
90
137
TRUE
female
1
1
0
13176.s
1
42
TRUE
female
2
0
0
13183.p
1
90
60
TRUE
male
1
1
0
13183.s
1
42
TRUE
female
1
0
0
13187.p
1
75
84
FALSE
male
0
0
0
13187.s
1
39
FALSE
male
0
0
0
13188.p
1
82
78
FALSE
male
0
0
0
13188.s
1
FALSE
female
0
0
0
13193.p
1
90
81
TRUE
male
0
0
0
13193.s
1
39
TRUE
male
0
0
0
13195.p
1
89
76
TRUE
male
1
0
0
13195.s
1
52
TRUE
male
1
0
0
13196.p
1
90
92
TRUE
male
1
0
0
13196.s
1
55
TRUE
male
1
0
0
13197.p
1
79
65
TRUE
male
0
1
0
13197.s
1
42
TRUE
female
1
0
0
13215.p
1
90
74
TRUE
male
2
0
0
13215.s
1
51
TRUE
female
2
0
0
13216.p
1
90
105
TRUE
male
1
0
0
13216.s
1
50
TRUE
male
2
0
0
13218.p
1
66
98
FALSE
male
0
0
0
13218.s
1
37
FALSE
male
0
0
0
13227.p
1
86
104
TRUE
male
0
0
0
13227.s
1
39
TRUE
female
1
0
0
13239.p
1
64
83
FALSE
female
0
0
0
13239.s
1
41
FALSE
male
1
0
0
13258.p
1
90
28
TRUE
male
0
0
0
13258.s
1
50
TRUE
female
0
0
0
13263.p
1
73
81
FALSE
male
0
0
0
13263.s
1
37
FALSE
female
1
0
0
13266.p
1
90
87
FALSE
male
0
0
0
13266.s
1
68
FALSE
female
0
0
0
13269.p
1
69
97
FALSE
male
0
0
0
13269.s
1
40
FALSE
male
1
0
0
13271.p
1
90
60
TRUE
male
1
0
0
13271.s
1
47
TRUE
female
0
0
0
13293.p
1
90
83
TRUE
male
2
0
0
13293.s
1
41
TRUE
female
2
0
0
13296.p
1
87
30
TRUE
male
4
0
0
13296.s
1
40
TRUE
female
4
0
0
13307.p
1
90
30
TRUE
male
0
0
0
13307.s
1
45
TRUE
male
1
0
0
13309.p
1
89
79
TRUE
male
0
0
0
13309.s
1
38
TRUE
male
0
0
0
13312.p
1
75
119
FALSE
male
0
0
0
13312.s
1
45
FALSE
female
1
0
0
13315.p
1
90
118
TRUE
male
0
0
0
13315.s
1
37
TRUE
male
0
0
0
13322.p
1
77
59
TRUE
male
3
0
0
13322.s
1
36
TRUE
female
1
0
0
13327.p
1
90
103
TRUE
male
3
0
0
13327.s
1
45
TRUE
female
2
0
0
13328.p
1
61
100
FALSE
male
1
0
0
13328.s
1
45
FALSE
male
0
0
0
13330.p
1
60
101
FALSE
male
1
0
0
13330.s
1
58
FALSE
female
0
0
0
13335.p
1
18
FALSE
female
2
0
0
13335.s
1
36
FALSE
male
5
0
0
13338.p
1
90
105
TRUE
male
2
0
0
13338.s
1
51
TRUE
female
2
0
0
13346.p
1
84
59
FALSE
female
1
1
3
13346.s
1
60
FALSE
female
2
0
0
13349.p
1
76
87
TRUE
male
0
1
0
13349.s
1
42
TRUE
female
0
0
0
13355.p
1
90
30
TRUE
male
2
0
1
13355.s
1
39
TRUE
male
1
0
0
13366.p
1
74
124
FALSE
male
1
0
0
13366.s
1
56
FALSE
female
2
0
0
13374.p
1
90
19
TRUE
male
0
0
0
13374.s
1
39
TRUE
female
0
0
0
13385.p
1
90
16
TRUE
male
1
0
0
13385.s
1
52
TRUE
female
0
0
0
13387.p
1
90
95
TRUE
male
0
0
0
13387.s
1
42
TRUE
male
2
0
0
13393.p
1
59
77
FALSE
female
2
0
0
13393.s
1
47
FALSE
female
2
0
0
13396.p
1
83
103
TRUE
male
3
0
0
13396.s
1
43
TRUE
female
2
0
0
13398.p
1
90
75
TRUE
male
1
1
0
13398.s
1
42
TRUE
female
1
0
0
13412.p
1
90
33
TRUE
male
2
0
0
13412.s
1
36
TRUE
female
0
0
0
13418.p
1
61
135
FALSE
male
4
0
0
13418.s
1
46
FALSE
female
1
0
0
13424.p
1
74
114
FALSE
male
0
0
0
13424.s
1
42
FALSE
female
0
0
0
13439.p
1
87
82
TRUE
male
0
1
0
13439.s
1
42
TRUE
female
0
0
0
13443.p
1
76
102
TRUE
male
1
0
0
13443.s
1
45
TRUE
female
0
0
0
13444.p
1
85
76
TRUE
male
0
0
0
13444.s
1
51
TRUE
female
0
0
0
13447.p
1
90
44
FALSE
female
0
1
0
13447.s
1
FALSE
female
1
0
1
13462.p
1
87
70
TRUE
male
0
0
0
13462.s
1
41
TRUE
male
0
0
0
13465.p
1
90
121
TRUE
male
1
0
0
13465.s
1
46
TRUE
female
1
0
0
13486.p
1
83
93
TRUE
male
0
0
0
13486.s
1
42
TRUE
male
0
0
0
13487.p
1
56
96
FALSE
male
2
0
0
13487.s
1
38
FALSE
female
2
0
0
13493.p
1
77
104
TRUE
male
2
0
0
13493.s
1
57
TRUE
male
1
0
0
13496.p
1
77
107
TRUE
male
0
0
0
13496.s
1
42
TRUE
male
0
0
1
13502.p
1
90
55
TRUE
male
1
0
0
13502.s
1
49
TRUE
female
1
0
0
13504.p
1
77
64
TRUE
male
4
0
0
13504.s
1
43
TRUE
female
3
0
0
13505.p
1
60
112
FALSE
male
0
0
0
13505.s
1
41
FALSE
female
0
0
0
13507.p
1
46
101
FALSE
male
1
0
0
13507.s
1
37
FALSE
male
2
1
0
13508.p
1
63
96
FALSE
male
1
0
0
13508.s
1
41
FALSE
female
2
0
0
13509.p
1
89
70
TRUE
female
2
0
0
13509.s
1
44
TRUE
female
1
0
0
13512.p
1
83
110
TRUE
male
4
0
0
13512.s
1
38
TRUE
female
1
0
0
13513.p
1
67
88
FALSE
male
1
1
0
13513.s
1
36
FALSE
female
1
0
0
13533.p
1
71
47
FALSE
male
2
0
0
13533.s
1
38
FALSE
male
2
0
0
13543.p
1
82
42
TRUE
male
1
0
0
13543.s
1
40
TRUE
female
0
0
0
13589.p
1
78
56
TRUE
male
2
0
0
13589.s
1
38
TRUE
male
3
0
0
13590.p
1
90
86
TRUE
male
3
2
0
13590.s
1
36
TRUE
male
2
0
0
13593.p
1
84
39
FALSE
male
1
0
0
13593.s
1
FALSE
male
2
0
0
13599.p
1
79
65
TRUE
male
2
0
0
13599.s
1
39
TRUE
male
2
0
0
13601.p
1
85
78
TRUE
female
3
0
0
13601.s
1
40
TRUE
male
4
0
1
13606.p
1
90
54
TRUE
male
0
1
0
13606.s
1
49
TRUE
female
1
0
0
13608.p
1
90
42
FALSE
female
2
1
0
13608.s
1
FALSE
male
1
0
0
13618.p
1
90
44
FALSE
female
0
0
0
13618.s
1
FALSE
female
0
0
0
13621.p
1
90
56
TRUE
female
0
0
0
13621.s
1
40
TRUE
male
1
0
0
13625.p
1
85
94
TRUE
male
2
0
0
13625.s
1
42
TRUE
male
1
0
0
13629.p
1
82
52
TRUE
male
1
0
0
13629.s
1
44
TRUE
female
2
0
1
13660.p
1
80
53
TRUE
male
0
0
0
13660.s
1
46
TRUE
male
1
0
0
13684.p
1
76
106
TRUE
male
1
0
0
13684.s
1
49
TRUE
female
0
0
0
13689.p
1
79
97
TRUE
male
0
0
0
13689.s
1
50
TRUE
female
1
1
0
13695.p
1
88
54
TRUE
male
0
0
0
13695.s
1
37
TRUE
female
0
0
0
13698.p
1
89
98
FALSE
male
1
0
0
13698.s
1
64
FALSE
male
1
0
0
13726.p
1
80
61
TRUE
male
1
0
1
13726.s
1
45
TRUE
male
1
0
0
13730.p
1
68
94
FALSE
female
2
0
0
13730.s
1
41
FALSE
male
0
0
0
13739.p
1
90
25
TRUE
female
1
0
0
13739.s
1
37
TRUE
male
2
0
0
13752.p
1
76
98
TRUE
female
0
0
0
13752.s
1
47
TRUE
female
0
0
0
13774.p
1
90
33
TRUE
female
1
0
0
13774.s
1
44
TRUE
female
0
0
0
13793.p
1
87
52
TRUE
male
1
0
0
13793.s
1
45
TRUE
female
2
0
0
13795.p
1
90
34
TRUE
female
2
0
0
13795.s
1
38
TRUE
female
1
0
0
13798.p
1
68
103
FALSE
female
2
0
0
13798.s
1
37
FALSE
female
0
0
0
13808.p
1
79
40
TRUE
male
2
0
0
13808.s
1
44
TRUE
male
2
0
0
13809.p
1
77
61
TRUE
male
2
0
0
13809.s
1
51
TRUE
female
0
0
0
13815.p
1
82
51
TRUE
male
3
0
1
13815.s
1
44
TRUE
female
2
0
0
13821.p
1
67
92
FALSE
female
0
0
0
13821.s
1
43
FALSE
female
0
0
0
13825.p
1
90
58
FALSE
female
3
0
0
13825.s
1
71
FALSE
female
4
0
0
13832.p
1
90
25
TRUE
male
0
0
0
13832.s
1
38
TRUE
female
1
0
0
13835.p
1
90
119
TRUE
female
1
0
0
13835.s
1
52
TRUE
female
1
0
0
13840.p
1
0
13840.s
1
0
13843.p
1
90
66
TRUE
female
3
0
0
13843.s
1
42
TRUE
female
1
0
0
13876.p
1
82
40
FALSE
male
1
0
2
13876.s
1
FALSE
female
1
0
0
13887.p
1
67
81
FALSE
female
0
0
0
13887.s
1
43
FALSE
male
0
0
0
13890.p
1
82
37
TRUE
female
2
1
0
13890.s
1
47
TRUE
female
2
0
0
13912.p
1
83
95
TRUE
female
0
0
0
13912.s
1
44
TRUE
female
1
0
0
13922.p
1
90
100
TRUE
female
1
0
0
13922.s
1
41
TRUE
male
2
0
0
13926.p
1
79
62
TRUE
female
1
0
0
13926.s
1
41
TRUE
female
3
0
0
13992.p
1
80
106
TRUE
female
0
0
0
13992.s
1
45
TRUE
female
1
0
0
14009.p
1
85
98
TRUE
female
0
0
0
14009.s
1
35
TRUE
female
0
0
0
14011.p
1
90
53
TRUE
female
1
0
0
14011.s
1
36
TRUE
female
1
0
0
14110.p
1
69
77
FALSE
male
3
0
0
14110.s
1
39
FALSE
female
3
0
0
14167.p
1
76
132
TRUE
male
0
0
0
14167.s
1
42
TRUE
male
0
0
0
14201.p
1
90
43
TRUE
male
2
0
0
14201.s
1
45
TRUE
female
2
0
0
Compa
rison N
ame
Group
# of qu
ads
Count
Enrich
ment
Binom
ial
Paired
 t-test
95% C
I 
(boots
trap)
Overall 
rare CN
Vs
Proband
s
458
Siblings
397
Overall 
genes in
 rare CN
Vs
Proband
s
921
Siblings
726
Catego
ry
# of Qu
ads
Proban
ds Cou
ntS
iblings
 Count
enrichm
ent
Binom
ial
Paired 
t-test
# of CN
Vs
High FS
IQ: Prob
and IQ s
core ? 7
0
267
273
236
1.16
0.11
0.018
Low FS
IQ: Prob
and IQ s
core < 7
0 
141
157
126
1.25
0.074
0.015
SRS dis
cordant
 proban
d-sibling
 pairs 
276
316
251
1.26
0.0071
0.00015
SRS co
ncordan
t proban
d-sibling
 pairs 
115
117
113
1.04
0.84
0.7
Discord
ant SRS
, Low IQ
109
138
104
1.33
0.034
0.0038
Concor
dant SR
S, Low 
IQ
22
19
22
0.86
0.76
0.53
Discord
ant SRS
, High I
Q
167
175
145
1.21
0.1
0.016
Concor
dant SR
S, High
 IQ
93
98
91
1.08
0.66
0.46
Genes 
Affecte
d
High IQ
267
537
472
1.14
0.044
0.14
Low IQ
141
384
252
1.52
1.90E-0
7
0.084
Discord
ant SRS
276
707
510
1.39
1.80E-0
8
0.02
Concor
dant SR
S
115
222
219
1.01
0.92
0.9
Discord
ant SRS
, Low IQ
109
348
207
1.68
2.30E-0
9
0.061
Concor
dant SR
S, Low 
IQ
22
36
45
0.8
0.37
0.52
Discord
ant SRS
, High I
Q
167
351
298
1.18
0.041
0.18
Concor
dant SR
S, High
 IQ
93
186
174
1.07
0.56
0.55
By CNV
 freque
ncy
Private
 CNVs 
only
411
271
245
1.11
0.27
0.1
All Rare
 CNVs
411
453
394
1.15
0.046
0.0044
By exp
ression
 profile
Genes 
with bra
in expre
ssion (a
verage)
411
19/317
6/224
2.24
NT
NT
... and in
 discord
ant SRS
 quads 
only
276
15/256
2/170
#VALU
E!
NT
NT
Genes 
with bra
in expre
ssion (a
ny regio
n)
411
73
43
1.7
NT
NT
By pre
vious p
athoge
nic gen
e asso
ciation
Genes w
ith previ
ous ass
ociation
411
83
59
1.41
NT
0.049
... and in
 discord
ant SRS
 quads 
only
276
66
35
1.89
NT
0.006
... and in
 concor
dant SR
S quads
 only
115
17
24
0.71
NT
0.069
By sex
, family
 size a
nd birt
h orde
r
411
1.15
0.04
0.004
1.09 - 1
.29
411
1.27
< 0.000
01
0.029
1.10 - 1
.52
CNVs
# of Qu
ads
Proban
ds Cou
ntSi
blings C
ount
Compa
rison
enrichm
entB
inomial
Paired t
-test
Male Pr
o
335
358
313
1.14
0.089
0.012
Female
 Pro
76
95
81
1.17
0.33
0.19
Male Si
b
191
204
186
1.1
0.39
0.2
Female
 Sib
220
249
208
1.2
0.061
0.007
Sex Co
ncord.
211
217
199
1.09
0.4
0.22
Sex Dis
cord.
200
236
195
1.21
0.054
0.0049
Both M
ale
163
163
152
1.07
0.57
0.37
Both Fe
male
48
54
47
1.15
0.55
0.39
Female
 Pro, M
ale Sib
28
41
34
1.21
0.49
0.32
Male Pr
o, Fema
le Sib
172
195
161
1.21
0.08
0.0083
Proban
d Older
224
232
209
1.11
0.29
0.12
Sibling 
Older
171
203
166
1.22
0.061
0.0068
1 sibling
247
264
242
1.09
0.35
0.15
2 sibling
s
117
131
110
1.19
0.2
0.092
3+ sibli
ng
47
58
42
1.38
0.13
0.017
0.53 (ch
i-sq)0.93 0.53 0.49 0.82 1 0.52
Tissue Name Ratio Brain? Proband FracitonProband CountProband genes (top 5% in each tissue) Sibling FractionSibling Count Sibling genes (top 5% in each tissue)temporal lobe 8.6328125 TRUE 0.05078125 13 DOC2A, PGBD5, MAPK3, XKR3, CRLF1, UCHL1, DNAJC6, GAN, NIPA1, C16orf45, NDRG2, SEZ6L2, CNDP1 0.00588235 1 TMEFF2pons 4.98046875 TRUE 0.05859375 15 FAM57B, KCNE2, PGBD5, KIF1A, GSTT1, CA2, DNAJC6, SLFN12, UCHL1, TRIM58, C16orf45, NDRG2, AGT, SEZ6L2, CNDP1 0.01176471 2 CORIN, SLC22A1cerebellum 4.6484375 TRUE 0.0546875 14 AQP4, PGBD5, MTSS1L, UCHL1, DNAJC6, OR2W3, NFAM1, CPLX1, IQSEC1, C16orf53, NDRG2, AGT, SEZ6L2, CNDP1 0.01176471 2 MAPKBP1, PDZK1parietal lobe 4.6484375 TRUE 0.0546875 14 RNASE1, AQP4, PGBD5, MTSS1L, CA2, DNAJC6, UCHL1, ZNF517, IQSEC1, C16orf45, NDRG2, AGT, SEZ6L2, CNDP1 0.01176471 2 TMEFF2, PPM1Jamygdala 4.31640625 TRUE 0.05078125 13 DOC2A, KCTD13, AQP4, PGBD5, CA2, DNAJC6, UCHL1, SGIP1, CPLX1, C16orf45, AGT, SEZ6L2, CNDP1 0.01176471 2 TMEFF2, MAPKBP1hypothalamus 3.76302083 TRUE 0.06640625 17 KCTD13, AQP4, PGBD5, GSTT2, GSTT1, RNASE6, CHODL, DNAJC6, ORC3, UCHL1, C3, CA2, C16orf45, NDRG2, AGT, SEZ6L2, CNDP1 0.01764706 3 TMEFF2, MAPKBP1, TANC1subthalamic nucleus3.76302083 TRUE 0.06640625 17 AQP4, PGBD5, KIF1A, CRLF1, MTSS1L, UCHL1, DNAJC6, GAN, FGFR3, ATP8B3, C16orf45, CPLX1, AKR1C4, PDE4B, ZNF17, SEZ6L2, CNDP1 0.01764706 3 OR4F15, TMEFF2, ZNF613BM-CD34+ 3.48632813 FALSE 0.08203125 21 XPO1, TACC3, RNASE3, RNASE2, SNRPD3, CHD1L, RPL8, GLIPR1, PPP4C, RNASE6, VPREB3, CLC, RAD51AP1, CORO1A, TRIM58, IGLL1, MYC, ZNF84, ATF1, PRC1, UCHL3 0.02352941 4 SLC20A1, KARS, MAVS, RUVBL2medulla oblongata3.3203125 TRUE 0.078125 20 KCTD13, ADORA2A, AQP4, PGBD5, GSTT2, CRLF1, CA2, DNAJC6, SLFN12, HCRTR1, CPLX1, RNASE1, UCHL1, C22orf43, C16orf45, ANXA10, NDRG2, SEZ6L2, CNDP1, KIF1A 0.02352941 4 TMEFF2, SLC22A1, KRT19, ZNF608caudate nucleus3.15429688 TRUE 0.07421875 19 KCTD13, ADORA2A, AQP4, PGBD5, CA2, DNAJC6, DDHD2, HMX3, UCHL1, CPLX1, PTCHD3, CEP72, RNASE1, ALDH5A1, NDRG2, AGT, SEZ6L2, CNDP1, BCR 0.02352941 4 TMEFF2, ZNF608, C20orf27, STARD7prefrontal cortex3.09895833 TRUE 0.0546875 14 KCTD13, AQP4, PGBD5, GSTT2, GSTT1, CRLF1, UCHL1, DNAJC6, IQSEC1, C16orf45, NDRG2, AGT, SEZ6L2, CNDP1 0.01764706 3 MYOM2, TMEFF2, MAPKBP1occipital lobe 2.87760417 TRUE 0.05078125 13 AQP4, PGBD5, CA2, DNAJC6, UCHL1, FGFR3, IQSEC1, SGIP1, C16orf45, NDRG2, AGT, SEZ6L2, CNDP1 0.01764706 3 TMEFF2, MAPKBP1, ZNF608PB-CD19+ Bcells2.87760417 FALSE 0.05078125 13 TACC3, SNRPD3, RPL8, GLIPR1, PPP4C, RNASE6, C22orf13, CORO1A, RSL24D1, C1orf131, MYC, ATF1, VPREB3 0.01764706 3 TXNIP, ANKRD39, ARID5Aadrenal gland 2.82226563 FALSE 0.06640625 17 RNASE1, METTL9, ZNF219, PGBD5, SIK1, GSTT2, GSTT1, QPRT, NQO2, MOCOS, ALDH2, TXN2, OR2W3, C3, ZNF70, VPREB3, ARHGEF40 0.02352941 4 RAB20, ERCC2, HGSNAT, STARD7whole brain 2.5234375 TRUE 0.07421875 19 DOC2A, KCTD13, AQP4, ASPHD1, SEZ6L2, GSTT2, CRLF1, CA2, DNAJC6, UCHL1, PGBD5, IQSEC1, CORO1A, SGIP1, C16orf45, NDRG2, AGT, COX6A1, CNDP1 0.02941176 5 MYOM2, TMEFF2, MAPKBP1, RUVBL2, DUSP2adipocyte 2.49023438 FALSE 0.05859375 15 HSD17B12, FAM89A, GSTT1, CRLF1, THBS2, PPARG, GLIPR1, NQO2, ETFDH, DDT, C3, GGT5, ALDOA, MYC, UCHL1 0.02352941 4 TNFAIP8L3, CLIC3, PTGIS, LIX1Llymph node 2.21354167 FALSE 0.0390625 10 ZNF140, TACC3, GLIPR1, RNASE6, CORO1A, IL32, GGT5, C3, MEGF6, VPREB3 0.01764706 3 SLC20A1, ARID5A, DUSP2fetal brain 2.21354167 TRUE 0.0390625 10 KCTD13, PGBD5, UCHL1, DNAJC6, ORC3, SGIP1, CORO1A, TP53BP1, C16orf45, SEZ6L2 0.01764706 3 PNPLA3, TMEFF2, MAPKBP1cerebellum peduncles1.9921875 TRUE 0.046875 12 AQP4, PGBD5, KCNG2, UCHL1, DNAJC6, FGFR3, C16orf45, C16orf53, NDRG2, AGT, SEZ6L2, CNDP1 0.02352941 4 MAPKBP1, TANC1, CNNM4, PDZK1pancreatic islets 1.9921875 FALSE 0.05859375 15 RAB27A, GSTT2, GSTT1, UCHL1, KCNG2, RPL8, CHST9, HCRTR1, IL32, ACADSB, C3, ANXA10, MAPKAPK5, SHPK, SEZ6L2 0.02941176 5 CBLC, KARS, KRT15, KRT19, CCNCcingulate cortex 1.9921875 TRUE 0.046875 12 KCTD13, PGBD5, KIF1A, CA2, DNAJC6, UCHL1, SGIP1, IQSEC1, C16orf45, AGT, SEZ6L2, CNDP1 0.02352941 4 MYOM2, ANKRD36, TMEFF2, SLC22A1leukemia lymphoblastic(molt4)1.89732143 FALSE 0.078125 20 QPRT, MYC, SNRPD3, CHD1L, BUB3, PPP4C, MAZ, CCT4, RAD51AP1, CORO1A, IL32, HNRNPA2B1, CEP72, RSL24D1, IGLL1, TACC3, MAPKAPK5, METTL17, PRC1, HIRIP3 0.04117647 7 KARS, CD1E, RUVBL2, SMYD3, NCAPH, ADSL, NINLtestis seminiferous tubule1.859375 FALSE 0.0546875 14 KCTD13, RNASE1, XKR3, UPB1, CHODL, TACC3, ORC3, ARL6, SLC26A8, CAPS2, IGLL1, PRC1, C9orf93, TRIP12 0.02941176 5 TEX101, RUVBL2, PPM1J, GEMIN4, NCAPHbone marrow 1.80245536 FALSE 0.07421875 19 ADORA2A, RNASE3, RNASE2, METTL9, CA2, PPP4C, MYH9, RNASE6, CLC, OR2W3, C22orf13, CORO1A, TACC3, TRIM58, RAB27A, IGLL1, MYC, PRC1, VPREB3 0.04117647 7 OR4F15, LAT2, HK3, FPR2, ANKRD35, NCAPH, TXNIPlymphoma Burkitts Raji1.77083333 FALSE 0.0625 16 ADORA2A, XKR3, RPL8, QPRT, TACC3, DDT, TNIP2, CORO1A, IL32, RSL24D1, AKR1C4, ALDH5A1, MYC, UCHL1, PRC1, VPREB3 0.03529412 6 LAT2, RUVBL2, ADRA2B, CCNC, PLAG1, DUSP2liver 1.77083333 FALSE 0.0625 16 QPRT, SHPK, UPB1, CA2, PPP4C, MOCOS, ALDH2, DDT, C3, IL32, PQLC1, AKR1C4, ANXA10, AGT, CNDP1, NQO2 0.03529412 6 RUVBL2, HFE2, PNPLA3, CES2, SLC22A1, ABCC6spinal cord 1.74316406 TRUE 0.08203125 21 HSD17B12, RNASE1, AQP4, ASPHD1, RNASE6, MTSS1L, CA2, DNAJC6, DDHD2, ORC3, UCHL1, PGBD5, C3, C1orf198, AKR1C4, ANXA10, NDRG2, AGT, SEZ6L2, CNDP1, PACS2 0.04705882 8 ARHGEF10, TNFAIP8L3, FNTA, GABARAPL2, TMEFF2, TANC1, SCCPDH, SEMA4Clymphoma Burkitts Daudi1.74316406 FALSE 0.08203125 21 XPO1, TACC3, PGBD5, SNRPD3, RPL8, BUB3, RAD51AP1, KCNG2, RBFA, CCT4, DDT, IGLL1, CORO1A, HNRNPA2B1, DGUOK, AKR1C4, ALDH5A1, MYC, RSL24D1, PRC1, VPREB3 0.04705882 8 KARS, SLC20A1, RUVBL2, CCNC, ANKRD36, NCAPH, ADSL, DUSP2ciliary ganglion 1.7265625 FALSE 0.05078125 13 LCN6, RNASE8, PRPH, ZNF300, CHST9, OR4D10, PTCHD3, CAPS2, ANXA10, C1orf131, ADAMTS18, ZNF70, UCHL1 0.02941176 5 ALG10, PLAG1, LIX1L, USP50, OR10A6globus pallidus 1.7265625 TRUE 0.05078125 13 AQP4, LCN6, MTSS1L, UCHL1, DNAJC6, LCN8, OR4E2, PGBD5, C16orf45, NDRG2, ZNF17, SEZ6L2, CNDP1 0.02941176 5 C3orf33, PNPLA3, SLC22A1, ST6GALNAC1, USP50leukemia promyelocytic(hl60)1.66015625 FALSE 0.078125 20 RPL8, ADORA2A, TACC3, SNRPD3, EIF3M, BUB3, RBFA, ORC3, CCT4, RAD51AP1, CORO1A, SHPK, HNRNPA2B1, RSL24D1, ALDH5A1, MYC, UCHL1, PRC1, UCHL3, VPREB3 0.04705882 8 KARS, RUVBL2, TANC1, EIF2A, SAMM50, NCAPH, ADSL, DUSP2testis germ cell 1.66015625 FALSE 0.05859375 15 KCTD13, RNASE1, LCN6, XKR3, ACTG2, CHODL, TACC3, ORC3, CHST9, UPB1, PGBD5, SLC26A8, PRC1, EDDM3B, TRIP12 0.03529412 6 RUVBL2, MAPKBP1, GEMIN4, TEX101, NCAPH, ABCC1721 B lymphoblasts1.61272321 FALSE 0.06640625 17 QPRT, SNRPD3, BUB3, RNASE6, UCHL1, PPP4C, TACC3, DDT, RAD51AP1, CORO1A, IL32, RSL24D1, MYC, MX1, ATF1, PRC1, HIRIP3 0.04117647 7 KARS, ARID5A, RUVBL2, C15orf41, NCAPH, ANKRD39, DUSP2BM-CD105+ endothelial1.59375 FALSE 0.09375 24 RPL8, CRLF1, TRIM58, PRC1, VPREB3, XPO1, QPRT, CHD1L, PPP4C, RAD51AP1, C1orf131, MYC, SHPK, ARV1, UCHL3, SNRPD3, GSTT1, CA2, DNAJC6, TACC3, ATP8B3, CEP72, IGLL1, ATF1 0.05882353 10 KARS, SLC20A1, C20orf27, RUVBL2, SAMM50, NCAPH, TTC27, PLAG1, C15orf41, DUSP2thalamus 1.4609375 TRUE 0.04296875 11 RNASE1, PGBD5, GSTT1, CA2, DNAJC6, UCHL1, C3, C16orf45, AGT, SEZ6L2, CNDP1 0.02941176 5 TMEFF2, TANC1, PPM1J, ST6GALNAC1, ZNF608dorsal root ganglion1.43880208 FALSE 0.05078125 13 FAM57B, MTNR1A, OR2G3, DNAJC6, OR4E2, PRPH, CLC, ATP8B3, DGKI, TBX6, ARL6, UCHL1, NIPA1 0.03529412 6 ZNF613, OR4F6, TMEFF2, ZNF649, SPINK5, STARD7kidney 1.328125 FALSE 0.0625 16 QPRT, GSTT1, RTDR1, CA2, SLC13A3, NQO2, ALDH2, DDT, NFAM1, CGNL1, CEP72, GGT5, UPB1, IL32, CDH3, AGT 0.04705882 8 PNPLA3, TEK, ATP6V0A4, CES2, ST6GALNAC1, ABCC6, KRT19, PDZK1cardiac myocytes 1.328125 FALSE 0.0703125 18 TACC3, RNASE3, MYH9, GSTT2, CRLF1, PPP4C, GLIPR1, THBS2, VPREB3, TUBGCP5, BMPER, AKR1C4, ANXA10, MYC, F11, UCHL1, PRC1, GALNT2 0.05294118 9 TMEM127, SLC20A1, CLIC3, PTGIS, KRT15, TEK, NCAPH, RHOC, C15orf41testis interstitial 1.328125 FALSE 0.0390625 10 KCTD13, XKR3, CHODL, TACC3, ORC3, SLC26A8, IGLL1, PRC1, C9orf93, TRIP12 0.02941176 5 TEX101, RUVBL2, PPM1J, GEMIN4, NCAPHolfactory bulb 1.328125 FALSE 0.0546875 14 QPRT, SLC13A3, SIK1, RNASE6, CA2, EHBP1, UCHL1, OR2W3, C1orf198, KANK1, C3, AGT, ATF1, CNDP1 0.04117647 7 ARHGEF10, ARID5A, TANC1, SAG, PIAS3, SEMA4C, STARD7whole blood 1.23325893 FALSE 0.05078125 13 RAB27A, RNASE2, YPEL3, GLIPR1, PPP4C, MYH9, RNASE6, CLC, TACC3, CORO1A, IL32, RAF1, ARHGEF40 0.04117647 7 TMEM127, ARID5A, LAT2, HK3, FPR2, PPP1R12A, TXNIPBM-CD71+ early erythroid1.1953125 FALSE 0.0703125 18 TRMT1L, TACC3, MAZ, SNRPD3, CA3, CA2, PPP4C, KIF22, DDT, RAD51AP1, C22orf13, RFPL2, TRIM58, OR2W3, PRC1, ADCK1, ATF1, PQLC1 0.05882353 10 RIOK3, SLC20A1, C20orf27, GABARAPL2, RUVBL2, PRR5, ANKRD35, TERF2IP, NCAPH, ANKRD39adrenal cortex 1.1953125 FALSE 0.03515625 9 QPRT, SIK1, RNASE6, NQO2, HCRTR1, ACADSB, GGT5, C3, ARHGEF40 0.02941176 5 PNPLA3, RAB20, KRT15, SLC22A1, USP50BM-CD33+ myeloid1.16210938 FALSE 0.0546875 14 METTL9, RNASE3, RNASE2, SIK1, NLRP3, GLIPR1, PPP4C, RNASE6, ATP8B3, CORO1A, TACC3, RAB27A, ATF1, MON1B 0.04705882 8 TMEM127, C20orf27, LAT2, HK3, FPR2, CARS2, HGSNAT, DUSP2trigeminal ganglion1.16210938 FALSE 0.0546875 14 FAM57B, RNASE3, OR2G3, NIPA1, PRPH, ARL6, ATP8B3, RAD51AP1, ZFP37, ACADSB, ANXA10, EDDM3B, UCHL1, VPREB3 0.04705882 8 OR4F15, CORIN, OR10A6, USP50, C3orf33, SLC22A1, NCAPH, ALG10appendix 1.13839286 FALSE 0.046875 12 TACC3, ACTG2, CRLF1, CA3, RNASE6, SLFN12, CHST9, SLC39A9, RFPL2, C3, ANXA10, SLC39A2 0.04117647 7 CORIN, SDPR, SNRNP200, FPR2, ATP6V0A4, CENPQ, MCTP2testis Leydig cell1.10677083 FALSE 0.05859375 15 KCTD13, LCN6, ACTG2, CHODL, TACC3, LCN8, ORC3, SLC26A8, UPB1, HCRTR1, PRC1, EDDM3B, C9orf93, TRIP12, HIRIP3 0.05294118 9 OR4F15, RUVBL2, PPM1J, GEMIN4, SCCPDH, CORIN, NCAPH, TEX101, KRT19leukemia chronic myelogenous(k562)1.10677083 FALSE 0.05859375 15 QPRT, EIF3M, MAPKAPK5, RAB27A, TACC3, NQO2, ORC3, CCT4, RAD51AP1, XKR3, CEP72, RSL24D1, MYC, ARV1, PRC1 0.05294118 9 KARS, SLC20A1, RUVBL2, TFB2M, SMYD3, ST6GALNAC1, NCAPH, ADSL, KRT19smooth muscle 1.10677083 FALSE 0.05859375 15 SNRPD3, RPL8, GSTT1, GLIPR1, MYH9, THBS2, C3, IL32, DGUOK, ALDOA, ANXA10, MYC, UCHL1, PRC1, GALNT2 0.05294118 9 SLC20A1, LTBP1, RUVBL2, ERCC2, PTGIS, GYS1, ANKRD23, RHOC, IL1APB-CD4+ Tcells 1.04352679 FALSE 0.04296875 11 RAB27A, TACC3, RPL8, BUB3, USP34, PPP4C, MYH9, CORO1A, IL32, MYC, ATF1 0.04117647 7 SLC20A1, CLIC3, ARID5A, C9orf142, TXNIP, PLAG1, DUSP2pituitary gland 0.99609375 FALSE 0.03515625 9 RAB27A, PGBD5, UCHL1, DNAJC6, ORC3, TP53BP1, C16orf53, ZNF10, SEZ6L2 0.03529412 6 CBLC, USP8, CLIC3, LMAN2L, ERCC2, PLAG1trachea 0.99609375 FALSE 0.046875 12 RNASE1, HMX3, SIK1, ACTG2, CA2, ZNF70, CHST9, C3, NFAM1, ATP2C2, CDH3, MYC 0.04705882 8 TANC1, TXNIP, ATP12A, KRT19, KRT15, HGSNAT, ALG10, ST6GALNAC1tonsil 0.99609375 FALSE 0.046875 12 LCN6, RNASE6, TACC3, MX1, ZNF517, CORO1A, IL32, VPREB3, C3, CDH3, MYC, PRC1 0.04705882 8 CBLC, CLIC3, LAT2, MAPKBP1, KRT15, NCAPH, SPINK5, KRT19placenta 0.99609375 FALSE 0.046875 12 RNASE1, QPRT, MYH9, MMP11, ACTG2, LGALS14, PPP4C, CRLF1, PPARG, TRIM58, FAM89A, BCR 0.04705882 8 SLC20A1, CLIC3, LTBP1, CYP19A1, TEK, SEMA4C, BCAR1, KRT19lung 0.94075521 FALSE 0.06640625 17 RNASE1, QPRT, CYFIP1, GSTT1, ACTG2, CRLF1, RPL8, PPP4C, MYH9, ALDH2, DDT, PPARG, CORO1A, IL32, RNASE6, C3, SIK1 0.07058824 12 TMEM127, CLIC3, ARID5A, RUVBL2, HK3, TXNIP, ST6GALNAC1, TRPM4, NXN, BCAR1, KRT19, DUSP2PB-BDCA4+ dentritic cells0.91308594 FALSE 0.04296875 11 TACC3, SIK1, CHD1L, NLRP3, GLIPR1, PPP4C, RNASE6, MX1, CORO1A, ATF1, UCHL3 0.04705882 8 TMEM127, SLC20A1, CLIC3, ARID5A, HK3, PPM1J, C9orf142, DUSP2ovary 0.91308594 FALSE 0.04296875 11 XKR3, ACTG2, PPARG, OR2G3, OR2G2, KIAA1958, C3, OR4D10, CPLX1, AKR1C4, ZNF70 0.04705882 8 OR10A3, OR4F6, ZNF615, PTGIS, SYT10, XRN1, ALG10, ASTL
fetal liver 0.91308594 FALSE 0.04296875 11 QPRT, TACC3, CLC, RAD51AP1, TRIM58, ACADSB, C3, ANXA10, PRC1, AGT, VPREB3 0.04705882 8 MUT, CYP19A1, HFE2, PNPLA3, NCAPH, ABCC6, C15orf41, PDZK1atrioventricular node0.88541667 FALSE 0.03125 8 SPN, XKR3, OR4E2, SLFN12, OR4D10, DGKI, PTCHD3, CAPS2 0.03529412 6 CLIC3, BAIAP2L2, STARD7, PNPLA3, TRPM7, C15orf41heart 0.88541667 FALSE 0.046875 12 RNASE1, QPRT, GSTT1, ACTG2, RPL8, DDT, TXN2, CORO1A, IL32, CEP72, ALDOA, AGT 0.05294118 9 MYOM2, TMEM127, CNNM4, OR10A6, RUVBL2, FAM96B, GYS1, RHOC, PPP1R13Lsalivary gland 0.88541667 FALSE 0.046875 12 SIK1, RNASE6, CA2, ZNF84, ATP8B3, HCRTR1, OR4D10, GGT5, C3, CDH3, EDDM3B, VPREB3 0.05294118 9 CORIN, BAIAP2L2, PPM1J, ATP6V0A4, TRPM4, KRT15, XRN1, ALG10, KRT19testis 0.81163194 FALSE 0.04296875 11 KCTD13, RNASE1, CAPS2, RTDR1, CHODL, PPP4C, TACC3, CEP72, SLC26A8, PRC1, HIRIP3 0.05294118 9 CNNM4, RUVBL2, ERCC2, MAPKBP1, PPM1J, GEMIN4, TEX101, KRT15, NCAPHPB-CD8+ Tcells 0.81163194 FALSE 0.04296875 11 RAB27A, BUB3, USP34, PPP4C, TACC3, ZNF84, CORO1A, IL32, MAPKAPK5, ATF1, MYC 0.05294118 9 SLC20A1, CLIC3, ARID5A, CD160, ANKRD36, C9orf142, TXNIP, PLAG1, DUSP2bronchial epithelial cells0.77473958 FALSE 0.0546875 14 SNRPD3, CYFIP1, RPL8, PPP4C, MYH9, CCT4, TMEM40, PPARG, UCHL3, RSL24D1, SIK1, CDH3, MYC, PRC1 0.07058824 12 CBLC, KARS, SLC20A1, CLIC3, RUVBL2, TANC1, KRT15, RHOC, IL1A, NXN, PLAG1, KRT19skin 0.77473958 FALSE 0.0546875 14 SIK1, SPN, XKR3, OR2G3, THBS2, ARL6, GAN, OR4D10, SLC5A4, CAPS2, ADAMTS18, ZNF70, FGFR3, ZNF396 0.07058824 12 C3orf33, FAM57A, PNPLA3, USP50, ATP6V0A4, VPS53, KRT15, ZNF649, CORIN, SPINK5, PPP1R13L, KRT19superior cervical ganglion0.77473958 FALSE 0.02734375 7 SLFN12, RAD51AP1, PTCHD3, TRIM58, ANXA10, EDDM3B, UCHL1 0.03529412 6 ZNF613, OR4F6, BAIAP2L2, FPR2, PTGIS, SLC22A1thymus 0.76622596 FALSE 0.05859375 15 ADORA2A, TACC3, SNRPD3, RNASE6, PPP4C, MYH9, MX1, IGLL1, CORO1A, IL32, C3, CDH3, MYC, PRC1, HIRIP3 0.07647059 13 CD1E, CLIC3, ARID5A, RUVBL2, KRT15, ANKRD36, ANKRD23, NCAPH, TXNIP, LMAN2L, NINL, KRT19, DUSP2prostate 0.76622596 FALSE 0.05859375 15 RAB27A, CYFIP1, GSTT1, ACTG2, CA3, SEZ6L2, PPP4C, RPL8, CLC, ATP2C2, SIK1, CDH3, DDT, MEGF6, GSTT2 0.07647059 13 CBLC, CD1E, CLIC3, RUVBL2, TMEFF2, TANC1, CARS2, KRT15, ZNF432, TRPM4, NXN, KRT19, DUSP2pancreas 0.74707031 FALSE 0.03515625 9 RNASE1, HMX3, RPL8, PPARG, GAN, OR4D10, IL32, C3, MYC 0.04705882 8 CBLC, OR10A3, OR10A6, DZIP1L, ZNF613, KRT15, STARD7, KRT19fetal lung 0.71940104 FALSE 0.05078125 13 RNASE1, SIK1, ACTG2, CRLF1, RNASE6, MYH9, IL32, C3, SNRPD3, CDH3, AGT, MEGF6, PRC1 0.07058824 12 OR4F15, SDPR, CLIC3, RUVBL2, PLAG1, HFE2, TANC1, KRT15, TXNIP, LMAN2L, KRT19, PDZK1uterus corpus 0.6640625 FALSE 0.0390625 10 CHST9, KCNE1, ACTG2, CA3, TEC, THBS2, ATP8B3, GGT5, C16orf45, ZNF268 0.05882353 10 SNRNP200, TNFAIP8L3, SAG, PTGIS, ATP12A, ZNF557, SLC22A1, VPS53, PLAG1, KRT19PB-CD14+ monocytes0.6640625 FALSE 0.03125 8 RAB27A, RNASE2, NLRP3, GLIPR1, PPP4C, RNASE6, CORO1A, TACC3 0.04705882 8 TMEM127, LAT2, EMR1, FPR2, CARS2, HK3, TXNIP, DUSP2fetal thyroid 0.6640625 FALSE 0.0390625 10 GSTT1, CRLF1, CA3, PPARG, ATP8B3, C3, ZNF34, GGT5, AKR1C4, ARHGEF40 0.05882353 10 RAB20, CLIC3, LMAN2L, TANC1, PNPLA3, HGSNAT, TXNIP, IL1A, PLAG1, KRT19colorectal adenocarcinoma0.60369318 FALSE 0.0390625 10 METTL9, BUB3, GSTT1, TACC3, RAD51AP1, CEP72, CDH3, ATF1, MYC, PRC1 0.06470588 11 RAB20, SLC20A1, RUVBL2, ARHGDIA, TANC1, GEMIN4, KRT15, NCAPH, NXN, KRT19, DUSP2tongue 0.59765625 FALSE 0.03515625 9 CRLF1, CA3, ZNF10, GAN, TRIM58, CAPS2, SLC39A2, MEGF6, FGFR3 0.05882353 10 MYOM2, CLIC3, HFE2, GYS1, KRT15, CES2, IL1A, SPINK5, PPP1R13L, KRT19skeletal muscle 0.59027778 FALSE 0.03125 8 IGLL1, XKR3, CA3, SLC2A11, HCRTR1, IL32, ALDOA, ZNF268 0.05294118 9 MYOM2, HFE2, CD160, CYP19A1, GYS1, KRT15, ANKRD23, SLC22A1, NXNthyroid 0.58105469 FALSE 0.02734375 7 GSTT1, CA3, PPARG, TRIM58, C16orf53, MTNR1A, ATF1 0.04705882 8 MYOM2, CNNM4, RUVBL2, MAPKBP1, TANC1, CCNC, KRT19, CLIC3uterus 0.48697917 FALSE 0.04296875 11 RNASE3, CYFIP1, GSTT2, ACTG2, MEGF6, THBS2, OR2G2, KANK1, AKR1C4, MTNR1A, ATF1 0.08823529 15 SLC37A3, SEMA4C, TNFAIP8L3, FAM19A3, ARID5A, RUVBL2, TANC1, LIX1L, PTGIS, CORIN, PIAS3, PPP1R12A, ZNF649, NXN, KRT19PB-CD56+ NKCells0.44270833 FALSE 0.03125 8 RAB27A, GLIPR1, PPP4C, TACC3, CORO1A, IL32, PPP2R5E, ATF1 0.07058824 12 MYOM2, TMEM127, SLC20A1, CLIC3, ARID5A, CD160, POLR3GL, PPP1R12A, HK3, TXNIP, ANKRD39, DUSP2