Population Genetics I (Introduction + Neutral Theory)

Post on 21-Dec-2021

4 views 0 download

Transcript of Population Genetics I (Introduction + Neutral Theory)

Population Genetics I (Introduction + Neutral Theory)

Gurinder Singh “Mickey” Atwal Center for Quantitative Biology

23rd Oct 2015

Summary and definitions •  Basic definitions/concepts

•  Neutral theory of single loci

•  Natural Selection •  Haplotype analyses

PART 1

PART 2

DNA Sequence Variation : Single Nucleotide Polymorphisms

CAGCCAGACTGCCTTCCGGGTCACTGCCATGGAGGAGCCGCAGTCAGATCCTAGCGTCGAG

CCCCCTCTGAGTCAGGAAACATTTTCAGACCTATGGAAACTGTGAGTGGATCCATTGGAAGG

GCAGGCCACCACCCCGACCCCAACCCCAGCCCCCTAGCAGAGACCTGTGGGAAGCGAAAA

TTCATGGGACTGACTTTCTGCTCTTGTCTTTCAGACTTCCTGAAAACAACGTTCTGGTAAGGA

CAAGGGTTGGGCTGGGACCTGGAGGGCTGGGGGGGCTGGGGGGCTGGGACCTGGTCCTC

TGACTGCTCTTTTCACCCATCTACAGTCCCCCTTGCCGTCCCAAGCAATGGATGATTTGATGC

TGTCCCCGGACGATATTGAACAATGGTTCACTGAAGACCCAGGTCCAGATGAAGCTCCCAGA

ATGCCAGAGGCTGCTCCCCGCGTGGCCCCTGCACCAGCAGCTCCTACACCGGCGGCCCCT

GCACCAGCCCCCTCCTGGCCCCTGTCATCTTCTGTCCCTTCCCAGAAAACCTACCAGGGCA

GCTACGGTTTCCGTCTGGGCTTCTTGCATTCTGGGACAGCCAAGTCTGTGACTTGCACG

Part of human p53 gene (exons 2-4) • Chromosome 17

C T

C T

G C

C A

C A

G A

C T G

C

EXONS / INTRONS

Correlations in Genomic Studies

GCTCCCCGCGTGGCCCCTGCACC GENOTYPE

1.  Correlations amongst alleles

PHENOTYPE e.g. onset of cancer, apoptosis rates

2. Genotype-phenotype correlations

many possible correlation statistics (D, D’, r2, δ,Q)

many possible tests of association (Χ2, fisher exact, cochran-armitage)

Goal of population genetics

•  Understand forces that produce and maintain inherited genetic variation

•  Forces – Mutation – Recombination – Natural Selection – Population Structure – Random birth/death (drift)

Hardy Weinberg Law •  Consider 2 alleles (A,a) with frequency •  Allele frequency of A = p •  Allele frequency of a = q = 1-p •  Randomly-mating large diploid population with

no mutation, migration, selection and drift

Genotype AA Aa aa

Hardy-Weinberg Frequency

p2

2pq

q2

Hardy Weinberg Law •  Only need few rounds of random matings to get

HW equilibrium. (How many exactly for hermaphrodite and dioecious populations?)

•  Fast time scale

•  Deviation from HW equilibrium mainly due to –  Strong Selection –  Inbreeding –  Population Subdivision –  *Genotyping Errors *

Population Subdivision Genotype AA Aa aa

Frequency

p2(1-FST)+pFST

2pq(1-FST)

q2(1-FST)+qFST

• Wahlund effect • Effect gets bigger the more different the subpopulations • 0<FST<1, degree of subdivision • Heterozygosity less than expected

Population Inbreeding Genotype AA Aa aa

Frequency

p2(1-FI)+pFI

2pq(1-FI)

q2(1-FI)+qFI

• Effect gets bigger the more related the population • 0<FI<1, inbreeding coefficient • FI=probability that 2 alleles in an individual are identical by descent • Heterozygosity less than expected

Neutral Drift

•  What happens when we consider a finite population size ?

•  Allele frequencies can change even if there is no natural selection.

Evolution of a neutral mutant allele

Wright-Fisher Process

N in

divi

dual

s 2N

alle

les

mutation Derived allele extinction!

generation

Ancestral allele Derived allele

death time

Stochastic birth/death process (Moran model)

• Overlapping generations • Distribution of time to replication

Evolution of a neutral mutant allele

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12

mutation

alle

le fr

eque

ncy

time/generations

Derived allele

fixation !

N in

divi

dual

s

DIFFUSION Kimura diffusion theory

Natural Selection is more effective in larger populations

Genetic Drift dominates in smaller populations

N, population size

Darwinian evolution Genetic Drift

Neutral drift

Generations/Time

Allele frequency

~4N

Most new mutations are eventually lost Only a small fraction (1/2N) eventually fixate in the population

r = u

Neutral Molecular Evolution

Substitution rate Mutation rate

• Rate of new fixations equals the mutation rate and does not depend on N • Implies substitution rate is constant • Gives a molecular clock for neutral molecular evolution • Molecular divergence between 2 species should be proportional to number of generations since last common recent ancestor

Effective Population Size, Neff

1Neff

=1T

1N1

+1N2

+...+ 1NT

!

"#

$

%& Discrete time steps

T total time steps Ni=Population at

time step i

Human Population Expansion •  Neff~10,000 (European Hapmap) •  Nonadiabatic expansion

Heterozygosity, H

•  Homozygosity, G=1-H

•  Probability that 2 alleles drawn at random are different

•  E.g. if biallelic then H=2p(1-p)

G=p2+(1-p)2

Heterozygosity decay

•  Wright-Fisher

•  Moran

⎟⎠

⎞⎜⎝

⎛−=NtHHt exp0

⎟⎠

⎞⎜⎝

⎛−= 202expNtHHt

Different microscopic models are equivalent upto rescaling of time

Mutation-Drift Balance

•  Drift decreases H •  Mutation increases H •  Two forces cancel out to give equilibrium

variation in population

NuG

411

+=

NuNuH414+

=

Homozygosity

Heterozygosity

Mutation-Drift Balance

•  Time scale of mutations ~ 1/u •  Time scale of drift ~ 4N •  Remember, drift eliminates variation and

mutations create variation

•  If 4N<<1/u, population mostly devoid of variation

•  If 4N>>1/u, population with much variation

4µN>>1

4µN<<1

Human SNP frequency distribution Distribution of allele frequencies in Chromosome 1

f

Non-coding (intergenic)

- 180 Northern European samples (HapMap consortium)

Empirical data

Allele frequency

Coalescent

Present

Time

22 individuals 18 ancestors

16 ancestors

14 ancestors

12 ancestors

9 ancestors 8 ancestors

8 ancestors

7 ancestors

7 ancestors

5 ancestors 5 ancestors

3 ancestors

3 ancestors

3 ancestors 2 ancestors

2 ancestors

1 ancestor

Present

Time

P(k coalesce to k-1)= k(k-1)/4N

P(pair coalesce)=1/2N

Bifurcating Tree

After t generations ?

Present

Time

Most recent common ancestor (MRCA)

Many different trees can produce the present population !

Properties of coalescent

•  Random tree with random coalescent interval times ~ Wright-Fisher model

•  Time to coalescence gets longer the further we go back in time

•  The larger the population size the slower the rate of coalescence

Mutation ?

Present

Time

Most recent common ancestor (MRCA)

Present

Time

mutation

Most recent common ancestor (MRCA)

TCGAGGTATTAAC TCTAGGTATTAAC

Present

Time

Most recent common ancestor (MRCA)

TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC

Present

Time

Most recent common ancestor (MRCA)

TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC TCTAGGTATCAAC * ** * *

Efficient computer simulations of neutral mutation

1.  Generate random genealogy of individuals back in time

2.  Superimpose mutation