Inferring population history in large-scale genomic studies

The ability to measure genetic variation on a genome-scale reliably and inexpensively in research settings has fueled the interrogation of genotypic consequences and impact on human evolution, history, health and disease. Complex patterns of genetic diversity observed in contemporary global populations are the product of many layers of demographic and evolutionary events acting on different timescales, including colonizations, migrations, admixture, bottlenecks, selection, population isolation and expansions. We are involved in several large-scale sequencing projects seeking to infer human demographic history:

  1. The impact of demography on the spectrum of rare variants in European- and African-Americans - NHLBI Exome Sequencing Project (Tennessen, 2012, Science; Fu, 2013, Nature )
  2. Local ancestry estimation, population structure and sequence diversity in admixed populations from the America’s (figure above) - the 1000 Genomes Project (Mc Vean, 2012, Nature; Kenny, 2012, ASHG 3360T)
  3. Continental and sub-continental population structure in North America - NIA Human Health and Retirement Study
  4. Global genetic diversity in the patient population from East Harlem, New York - Biobank, Icahn School of Medicine at Mount Sinai
  5. Population structure and genetic diversity in South American populations (Muzzio, 2012, ASHG 3362W; Zakharia, 2012, ASHG 3380W; Moreno, 2012, ASHG 3390T)

Multi- and trans-ethnic medical genomics

As we sequence large numbers of ethnically diverse human genomes, we can begin to study the properties of human genetic variation in global populations. Strikingly, we find that the majority of the genetic variants in the human genome are rare, population private, and/or geographically restricted (Tennessen, 2012, Science; Fu, 2013, Nature). These observations call into question the existing practice of focusing a great deal of effort on a few “target populations” in medical genomics with the hopes that variants found in these populations will generalize to other groups. If genetic variants that influence disease risk follow the general trends for most of the genome, we should expect that many associations are likely to be population-private and that broadening inclusion in medical genomics is urgent for most people to benefit from investments in medical genomics. To improve our intuition about the scale and scope of multi- and trans-ethnic medical genomic studies we focus on populations that exhibit a high degree of populations structure, namely isolated and admixed populations, with the aim of bridging the gap between discovery and functional genomics:

  1. Obesity, dyslipidemia, diabetes and pigmentation in South Pacific Islanders (Kenny, 2010, HMG, Kenny, 2009, PNAS, Lowe, 2009, PLoS Genet, Gusev, 2012, Genetics , Gusev, 2011, AJHG and Burkhardt, 2009, ATVB and Kenny, 2012, Science).
  2. Autoimmune disorders in Ashkenazi Jews and Europeans (Kenny, 2012, PLoS Genet , Vacic, 2013 (in review) and Farco, 2013, PLoS Genet (accepted) )
  3. Mapping biomedical traits and diseases from Electronic Medical Records in the patient population from East Harlem, New York - Biobank, Icahn School of Medicine at Mount Sinai

Improved statistical methods and genomic resources for population and medical genomics

We utilize and develop a litany of statistical approaches for characterizing the scale and scope of human population structure and its implications for the design of trans- and multi-ethnic medical genomic studies. We currently focus on a number of methods:

  1. Linear mixed-model (LMM) mapping methods that can account for both population stratification and relatedness in a unified framework (Kenny, 2010)
  2. Joint identity-by-descent and LMM approach for mapping shared genetic segments to uncover recently arisen variation (Kenny, 2009, PNAS, Gusev, 2011, ASHG, Gusev, 2012, Genetics, Vacic, 2012, in review)
  3. Local ancestry estimation in admixed individuals (Mc Vean, 2012, Nature; Kenny, 2012, ASHG 3360T; Maples, 2012, ASHG 3563F)
  4. Representational genome sequencing or ‘genotype-by-sequencing’ as a cost effective and ascertainment bias free alternative to traditional genotyping arrays for GWAS for global population currently under represented in standard reference panels (Cooke, 2012, ASHG Platform 293).