Post on 18-Jan-2016
Maximum-likelihood estimation of admixture proportions from
genetic data
Jinliang Wang
P0
P1 P2
PhP1 P2
n1 n2
Php1 p2
Nh N2N1
ShS1 S2
ξ
ψ
t1 = ξ/2n1 t2 = ξ/2n2
T1 = ψ/2N1
Th = ψ/2Nh
T2 = ψ/2N2
Ω = {p1, t1,t2,T1,Th,T2}
P0
P1 P2
PhP1 P2
n1 n2
Php1 p2
Nh N2N1
ShS1 S2
ξ
ψ
t1 = ξ/2n1
t2 = ξ/2n2
T1 = ψ/2N1
Th = ψ/2Nh
T2 = ψ/2N2
Ω = {p1, t1,t2,T1,Th,T2}
w
c1 ch c2
x1 xh x2
y1 yh y2
C = (c1,c2,c3)
Likelihood function
d)Pr(
),,|,Pr(
),,,,,|,,Pr(
),,|,,Pr()Pr(
2121
2112121
2121
w
wttxx
TTTpxxyyy
yyyccc
hh
hhC
Likelihood function
d)Pr(
),,|,Pr(
),,,,,|,,Pr(
),,|,,Pr()Pr(
2121
2112121
2121
w
wttxx
TTTpxxyyy
yyyccc
hh
hhC
Random sampling
Admixture and genetic drift
Genetic driftGenetic drift
Prior on Prior on ww
Allele frequencies in P0
)Pr(w
P0
w
Genetic drift after population split
P0
P1 P2
n1 n2
ξw
x1 x2
),,|,Pr( 2121 wttxx t1 = ξ/2n1
t2 = ξ/2n2
Genetic drift in independent populations
Genetic drift: the diffusion approximation
2
12121 ),|Pr(),,|,Pr(
iii wtxwttxx
ti = ξ/2ni
ii
aii
n
aaxaaH
waaHaaawwwtx
4
)1(exp),2,2,1(
),2,2,1()12)(1()1(),Pr(1
Crow and Kimura (1970) p. 382
P0
P1 P2
x1
Php1 p2xh x2
The admixture event
2211 xpxpxh
),,,,,|,,Pr( 2112121 hh TTTpxxyyy
P0
P1 P2
PhP1 P2
Ph
Nh N2N1
ψ
T1 = ψ/2N1
Th = ψ/2Nh
T2 = ψ/2N2
x1 xh x2
y1 yh y2
Genetic drift since admixture event
),,,,,|,,Pr( 2112121 hh TTTpxxyyy
2211 xpxpxh
PhP1 P2
ShS1 S2c1 ch c2
y1 yh y2
C = (c1,c2,c3)
Random sampling
hi
iihh ycyyyccc.2.1
2121 )|Pr(),,|,,Pr(
Likelihood function
d)Pr(
),|Pr(
),|Pr(
)|Pr()Pr(
2
1
1
1
w
twx
Txy
yc
iii
h
jjjj
h
jjjC
Random sampling
Admixture and genetic drift
Genetic driftGenetic drift
Prior on Prior on ww
0
5
10
15
20
25
30
NewOrleans
New York Pittsburg Maywood nrChicago
Houston Detroit Baltimore Philadelphia2
Philadelphia1
Charleston,South
Carolina
Jamaica
Eu
rop
ea
n a
nc
es
try
African-American Admixture Proportions
Profile log-likelihoods for New York
Proportion of European ancestry
Drift before admixture event
Drift since admixture event
Application to canid populations:Grey wolf and coyote in North America
CommonAncestor
Grey Wolf Coyote
Wolf-like
HybridGrey Wolf Coyote
Coyote-like
Hybrid0
10
20
30
40
50
60
70
Grey wolf-like hybrid Coyote-like hybrid
Wo
lve
rin
e a
nc
es
try
Discussion
Suitable data
Assumptions of the method given the model
Comparing the model to other scenarios
Aspects of the data used for inference
DiscussionSuitable data
Human data
Genotypes of 10 nuclear loci. Chosen because they are either African or European specific or highly differentiated between the two.
Canid data
10 microsatellite loci. Neither species-specific nor highly differentiated between wolves and coyotes.
DiscussionAssumptions of method given the model
Alleles are inherited independently across loci in the admixture event
Drift acts independently on alleles across loci
Alleles in a sampled individual are independent across loci
DiscussionAssumptions of method given the model
The prior distribution on w is flat, not U-shaped
Admixture occurs instantaneously
The effect of mutation on perturbing allele frequency is negligible
DiscussionComparing the model to other scenarios
Modern ‘pure’ populations need to be sampled
Thus the ‘structure’ of the population is assumed to be known
If we cannot sample modern ‘pure’ populations assumes we cannot make inference on the admixture proportions
DiscussionAspects of the data used for inference Inference proceeds solely on the basis of allele
frequencies
Linkage disequilibrium is Firstly, not used for inference Secondly, assumed to be negligible
LD might be exploited Enhance inference when modern ‘pure’ populations are
sampled Relax the necessity to sample modern ‘pure’
populations at all