Post doc Yunpeng Wang
Unravel the missing heritability of Psychiatry disorders by advanced statistical methods
Yunpeng Wang, PhD in Biostatistics
Psychiatric disorders, such as Schizophrenia (SCZ), are highly heritable with estimated heritability 50%~80%, namely, large proportion of the observed phenotype variation is due to inherited genetic factors. Such evidence encourages scientists to devote their efforts in identifying the underlying causal genetic variants. After more than a decade of research with the state of the art methodologies, the general picture is that only a fraction of the heritability estimated by phenotypic-centric method (pedigree based) can be explained by the so far identified genetic variants. Further, psychiatric disorders, or in general complex diseases, are the consequences of large number of variants each of which contributes a small risk to the diseases, along with other non-genetic factors. In order to uncover the missing architectures of complex diseases, very large sample needed (typically at the order of hundreds of thousands). On the other hand, improvement of statistical methods can also increase the yield of such effort in a cost effective way.
Currently I am working on developing new statistical methods with improved power compared to the classical genome wide association study (GWAS) methodology. Our first strategy is to incorporate auxiliary prior information of genetic variants (mainly, single nucleotide polymorphisms, SNPs) in a model under the empirical Bayesian framework. By using pleiotropy, a phenomenon that common causal variants tagged by SNPs affect more than one diseases/traits, we successfully identified large number genetic variants affecting SCZ, BD, blood pressure, Alzheimer’s diseases[5, 6], etc. Moreover, with the same framework we extend the priors to include genomic annotation categories, such as 5’UTR, 3’UTR, Exon, Intron, etc., of SNPs. A relative enrichment score of each SNP can be constructed leveraging such information. We observed a significant improvement in power by treating SNPs differentially on the base of their enrichment scores.
The second strategy we employ is to accurately model the distribution of effect sizes of SNPs[8, 9]. The traditional statistical genetics model assume that the effect of genetic variants on phenotypes either follows a single normal distribution (the so-called infinitesimal model) or a mixture of a true null (no effect at all) and a non-null model. Whereas, we observed that neither of these can capture the behavior summary statistics of GWAS. We developed a scale mixture normals model which allows for a small effect component mixed with a large effect component. We demonstrate that modeling the effect sizes accurately can improves the estimates of the polygenicity of diseases/traits, identify variants with small effect, and, thus uncovers more hidden heritability.
We then combined the two strategies in a covariate modulated mixture model framework to further improve the yield. Genome wide SNPs are modeled differentially on the basis of their enrichment scores by the mixture model. We also proposed to use the predicted replication probability as a metric of claiming discoveries. The results show that the significant threshold used in the classic GWAS method incurs large loss in power if SNPs located in the most enriched category and generates large number of false positives if SNPs located in the least enriched category.