Post on 22-Jul-2020
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
There are two principal philosophies in statistical data
analysis:The classical or frequentist and the Bayesian.
The frequentist defines the probability of an event as the expected frequency of occurrence of that event in
repeated draws from a real or imaginary population.
The performance of an inference procedure is judged by its properties in repeated sampling from the data-
generating model, with the parameters fixed.
Important concepts include bias and variance of an
estimator, confidence intervals, and p values.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
There are two principal philosophies in statistical data
analysis:The classical or frequentist and the Bayesian.
Bayesian statistics is not mentioned in most biostatistics course.
The key feature of Bayesian methods is the notion of a
probability distribution for the parameter.
( )( ) ( )
( )BP
ABPAPBAP
×=
( ) ( ) ( ) ( )ABPAPBAPBP ×=×
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
The key feature of Bayesian methods is the notion of a
probability distribution for the parameter.
Here probability cannot be interpreted as the frequency
in random draws from a population but instead is used to represent uncertainty about the parameter.
In classical statistics, parameters are unknown
constants and cannot have distributions.
Bayesian proponents argue that since the value of the parameter is unknown, it is sensible to specify a
probability distribution to describe its possible values.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
The distribution of the parameter before the data are
analyzed is called the prior distribution.
This can be specified by using either an objective
assessment of prior evidence concerning the parameter or the researcher’s subjective opinion.
The Bayes theorem is then used to calculate the
posterior distribution of the parameter, that is, the conditional distribution of the parameter given the data.
All inferences about the parameter are based on the posterior.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Suppose the occurrence of a certain B may depend on
whether another event A has occurred. Then the probability that B occurs is given by the law of total
probabilities
Here Ā stands for ‘non A’ or ‘A does not occur’.
Bayes’s theorem, also known as the inverse-probability theorem, gives the conditional probability that B occurs
given that A occurs.
( ) ( ) ( ) ( ) ( )ABPAPABPAPBP ×+×=
( ) ( ) ( )( )
( ) ( )( ) ( ) ( ) ( )ABPAPABPAP
ABPAP
BP
ABPAPBAP
×+×
×=
×=
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Example. (False positives of a clinical test)
Suppose a new test has been developed to screen for
an infection in the population. If a person has the infection, the test accurately reports a positive 99% of
the time, and if a person does not have the infection, the test falsely reports a positive only 2% of the time.
Suppose that 0.1% of the population have the
infection. What is the probability that a person who has tested positive actually has the infection?
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Example. (False positives of a clinical test)
Suppose a new test has been developed to screen for
an infection in the population. If a person has the infection, the test accurately reports a positive 99% of
the time, and if a person does not have the infection, the test falsely reports a positive only 2% of the time.
Suppose that 0.1% of the population have the
infection. What is the probability that a person who has tested positive actually has the infection?
Let A be the event that a person has the infection,
and Ā no infection. Let B stand for test-positive. Then
P(A) = P(Ā) =P(B|A) = P(B|Ā) =
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Example. (False positives of a clinical test)
Suppose a new test has been developed to screen for
an infection in the population. If a person has the infection, the test accurately reports a positive 99% of
the time, and if a person does not have the infection, the test falsely reports a positive only 2% of the time.
Suppose that 0.1% of the population have the
infection. What is the probability that a person who has tested positive actually has the infection?
Let A be the event that a person has the infection,
and Ā no infection. Let B stand for test-positive. Then
P(A) =0.001 P(Ā) = 0.999P(B|A) = 0.99 P(B|Ā) = 0.02
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Example. (False positives of a clinical test)
Let A be the event that a person has the infection,
and Ā no infection. Let B stand for test-positive. ThenP(A) =0.001 P(Ā) = 0.999
P(B|A) = 0.99 P(B|Ā) = 0.02
The probability that a random person from the
population tests positive is
( ) 02097.002.0999.099.0001.0 =×+×=BP
( ) ( ) ( ) ( ) ( )ABPAPABPAPBP ×+×=
This is close to the proportion among the noninfected
individuals of the population.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Example. (False positives of a clinical test)
Let A be the event that a person has the infection,
and Ā no infection. Let B stand for test-positive. ThenP(A) =0.001 P(Ā) = 0.999
P(B|A) = 0.99 P(B|Ā) = 0.02
The probability that a random person from the
population tests positive is
The probability that a person who has tested positive
has the infection is
( ) 02097.002.0999.099.0001.0 =×+×=BP
( ) ( ) ( )( )
0472.002097.0
99.0001.0=
×=
×=
BP
ABPAPBAP ?
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
When Bayes’s theorem is used in Bayesian statistics, A
and Ā correspond to different hypotheses H1 and H2, while B corresponds to the observed data (X).
Bayes’s theorem then specifies the conditional probability of hypothesis H1 given the data as
( ) ( ) ( )( )
( ) ( )( ) ( ) ( ) ( )
2211
1111
1HXPHPHXPHP
HXPHP
XP
HXPHPXHP
×+×
×=
×=
( ) ( )XHPXHP12
1−= ?
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
When Bayes’s theorem is used in Bayesian statistics, A
and Ā correspond to different hypotheses H1 and H2, while B corresponds to the observed data (X).
Bayes’s theorem then specifies the conditional probability of hypothesis H1 given the data as
Here P(H1) and P(H2) are called prior probabilities
They are probabilities assigned to the hypotheses
before the data are observed or analyzed
( ) ( ) ( )( )
( ) ( )( ) ( ) ( ) ( )
2211
1111
1HXPHPHXPHP
HXPHP
XP
HXPHPXHP
×+×
×=
×=
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
When Bayes’s theorem is used in Bayesian statistics, A
and Ā correspond to different hypotheses H1 and H2, while B corresponds to the observed data (X).
Bayes’s theorem then specifies the conditional probability of hypothesis H1 given the data as
Here P(H1) and P(H2) are called prior probabilities
The conditional probabilities P(H1|X) and P(H2|X) are called posterior probabilities
P(X|H1) and P(X|H2) are the likelihood under each hypothesis
( ) ( ) ( )( )
( ) ( )( ) ( ) ( ) ( )
2211
1111
1HXPHPHXPHP
HXPHP
XP
HXPHPXHP
×+×
×=
×=
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Note that in the disease testing example discussed
above, P(A) and P(Ā) are frequencies of infected and noninfected individuals in the population.
There is no controversy concerning the use of
Bayes’s theorem in such problems.
However, in Bayesian statistics, the prior probabilities
P(H1) and P(H2) often do not have such a frequentistinterpretation. The use of Bayes’s theorem in such a
context is controversial.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
When the hypothesis concerns unknown continuous
parameters, probability densities are used instead of probabilities.
Bayes’s theorem then takes the following form:
Here f(θ) is the prior distribution
f(θ|X) is the posterior distribution
f(X|θ) is the likelihood (the probability of data X given parameter θ)The marginal probability of the data, f(X), is a
normalizing constant, to make f(θ|X) integrate to 1.
( )( ) ( )
( )( ) ( )
( ) ( ) θθθ
θθθθθ
dXff
Xff
Xf
XffXf
∫==
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
An important strength of the Bayesian approach is that
it provides a natural way of dealing with nuisance parameters through integration or marginalization.
Let θ = (λ, η), with λ to be the parameters of interest and the η nuisance parameters. The joint posterior
density of λ and η is:
From which the (marginal) posterior density of λ can be obtained as
( )( ) ( )
( )( ) ( )
( ) ( ) ηληληλ
ηληληληληλ
ddXff
Xff
Xf
XffXf
∫==
,,
,,,,,
( ) ( ) ηηλλ dXfXf ∫= ,
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Example:
Consider the use of the JC69 model to estimate the
distance θ between the human and orangutan 12s rRNA genes from the mitochondrial genome.
The data are summarized as x = 90 differences out of n
= 948 sites.
To apply the Bayesian approach, one has to specify a prior. Uniform priors are commonly used in Bayesian
analysis. In this case, one could specify, say, U(0,100),
with a large upper bound. However, the uniform prior is not very reasonable since sequence distances
estimated from real data are often small (say, < 1).
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Example:
Consider the use of the JC69 model to estimate the
distance θ between the human and orangutan 12s rRNA genes from the mitochondrial genome.
The data are summarized as x = 90 differences out of n
= 948 sites.
We instead use an exponential prior
with mean µ = 0.2.
The posterior distribution of θ is:
( ) µθ
µθ −= ef
1
( )( ) ( )
( )( ) ( )
( ) ( ) θθθ
θθθθθ
dxff
xff
xf
xffxf
∫==
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Example:
Consider the use of the JC69 model to estimate the
distance θ between the human and orangutan 12s rRNA genes from the mitochondrial genome.
The data are summarized as x = 90 differences out of n
= 948 sites.
We consider the data to have a binomial distribution,
with probability for a difference and
1 – p for an identity. The likelihood is thus:
34
4
3
4
3 θ−−= ep
( ) ( )xnx
xnxeeppxf
−
−−−
+
−=−= 3434
4
3
4
1
4
3
4
31
θθθ
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Example:
Consider the use of the JC69 model to estimate the
distance θ between the human and orangutan 12s rRNA genes from the mitochondrial genome.
The data are summarized as x = 90 differences out of n
= 948 sites.
( ) ( )xnx
xnxeeppxf
−
−−−
+
−=−= 3434
4
3
4
1
4
3
4
31
θθθ
( )( ) ( )
( )( ) ( )
( ) ( ) θθθ
θθθθθ
dxff
xff
xf
xffxf
∫==
( ) 2.0
2.0
1 θθ −= ef ( ) ( ) ( ) 1311016776.5
−×== ∫ θθθ dxffxf
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Example:
Consider the use of the JC69 model to estimate the
distance θ between the human and orangutan 12s rRNA genes from the mitochondrial genome.
The data are summarized as x = 90 differences out of n
= 948 sites.
The mean of the posterior distribution is found by
numerical integration to be
Which is very similar to the MLE of = 0.1015,despite their different interpretations.
( )( ) ( )
( )( ) ( )
( ) ( ) θθθ
θθθθθ
dxff
xff
xf
xffxf
∫==
( ) ( ) 10213.0== ∫ θθθθ dxfxE
θ̂
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Criticisms of frequentist statistics:
A major Bayesian criticism of classical statistics is that
it does not answer the right question.
Classical methods provide probability statements about
the data or the method for analyzing the data, but not about the parameter, even though the data have
already been observed and our interest is in the parameter.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Criticisms of frequentist statistics:
Suppose x = 9 heads and r = 3 tails are observed in n =
12 independent tosses of a coin, and we wish to test
the null hypothesis H0: θ = ½ against the alternative H1: θ > ½, where θ is the true probability of heads.
Suppose the number of tosses n is fixed, so that x has
a binomial distribution, with probability
The probability of the observed data x = 9, which is
0.05371, is not the p value. The p value is 0.075.
( ) ( ) xnx
x
nxf
−−
= θθθ 1
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Criticisms of frequentist statistics:
The p values are also criticized for violating the likelihood
principle, which says that the likelihood function contains
all information in the data about θ and the same inference should be made from two experiments that have the same
likelihood.
Considering a different experimental design, in which the number of tails r was fixed beforehand; in other words, the
coin was tossed until r = 3 tails were observed, at which
point x = 9 heads were observed.The data x then have a negative binomial distribution, with
probability ( ) ( ) xnx
x
xrxf
−−
−+= θθθ 1
1
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Criticisms of frequentist statistics:
The p values are also criticized for violating the
likelihood principle, which says that the likelihood
function contains all information in the data about θ and the same inference should be made from two
experiments that have the same likelihood.
If we use this model, the p value becomes
0325.02
1
2
113
9
3
=
−+=∑
∞
=j
j
j
jp
( ) ( ) xnx
x
xrxf
−−
−+= θθθ 1
1
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Criticisms of Bayesian methods:
All criticisms of Bayesian methods are levied on the
prior or the need for it
Bayesians come into two flavors:
the objective and the subjective
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Criticisms of Bayesian methods:
All criticisms of Bayesian methods are levied on the
prior or the need for it
Bayesians come into two flavors:
the objective and the subjective
The objective Bayesians consider the prior to be a representation of prior objective information about the
parameter.
The approach runs into trouble when no prior
information is available about the parameter and the prior is supposed to represent total ignorance.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Criticisms of Bayesian methods:
All criticisms of Bayesian methods are levied on the
prior or the need for it
Bayesians come into two flavors:
the objective and the subjective
For a continuous parameter, a uniform distribution over the range of the parameter might be assigned.
However, such so-called flat or noninformative priors
lead to contradictions.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Criticisms of Bayesian methods:
All criticisms of Bayesian methods are levied on the
prior or the need for it
Bayesians come into two flavors:
the objective and the subjective
If x has a uniform distribution, x2 cannot have a uniform distribution.
Similarly, a uniform prior for the probability of different sites p is very different from a uniform prior for
sequence distance θ under the JC69 model.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Criticisms of Bayesian methods:
All criticisms of Bayesian methods are levied on the
prior or the need for it
Bayesians come into two flavors:
the objective and the subjective
Such difficulties in representing total ignorance have
caused the objective Bayesian approach to fall out of
favor.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Criticisms of Bayesian methods:
All criticisms of Bayesian methods are levied on the
prior or the need for it
Bayesians come into two flavors:
the objective and the subjective
The subjective Bayesians consider the prior to represent the researcher’s subjective belief about the
parameter before seeing or analyzing the data.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Criticisms of Bayesian methods:
All criticisms of Bayesian methods are levied on the
prior or the need for it
Bayesians come into two flavors:
the objective and the subjective
One cannot really argue against somebody else’s subjective beliefs, but ‘classical’ statisticians reject the
notion of subjective probabilities and of letting personal
prejudices influence scientific inference.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Criticisms of Bayesian methods:
All criticisms of Bayesian methods are levied on the
prior or the need for it
Bayesians come into two flavors:
the objective and the subjective
Even though the choice of the likelihood model involves certain subjectivity as well, the model can nevertheless
be checked against the data, but no such validation is
possible for the prior.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Bayesian phylogenetics
It is straightforward to formulate the problem of
phylogeny reconstruction in the general framework of
Bayesian inference.
Let X be the sequence data. Let θ include all parameters in the substitution model, with a prior
distribution f(θ). Let τi be the ith tree topology, i = 1, 2,…,Ts, where Ts is the total number of tree topologies
for s species.
Usually a uniform prior f(τi) = 1/Ts is assumed, although this means nonuniform prior probabilities for clades.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Bayesian phylogenetics
Let X be the sequence data. Let θ include all parameters in the substitution model, with a prior
distribution f(θ). Let τi be the ith tree topology, i = 1, 2,…,Ts, where Ts is the total number of tree topologies
for s species.
Let bi be the vector of branch lengths on tree τi, with prior probability f(bi). MrBayes 3 assumes that branch
lengths have independent uniform or exponential priors
with the parameter (upper bound for the uniform or mean for the exponential) set by the user.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Bayesian phylogenetics
Let X be the sequence data. Let θ include all parameters in the substitution model, with a prior
distribution f(θ). Let τi be the ith tree topology, i = 1, 2,…,Ts, where Ts is the total number of tree topologies
for s species.
Let bi be the vector of branch lengths on tree τi, with prior probability f(bi).
( )( ) ( ) ( ) ( )
( ) ( ) ( ) ( )∑∫∫
∫∫
=
=sT
j
jjjjjj
iiiiii
i
ddbbXfbfff
ddbbXfbfffXP
1
,,,
,,,
θτθτθθτθ
θτθτθθτθτ
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Bayesian phylogenetics
This is a direct application of ,
treating τ as the parameter of interested and all other parameters as nuisance parameters.
Note that the dominator, the marginal probability of the
data f(X), is a sum over all possible tree topologies and,
for each tree topology τj, an integral over all branch
lengths bj and substitution parameters θ.
( )( ) ( ) ( ) ( )
( ) ( ) ( ) ( )∑∫∫
∫∫
=
=sT
j
jjjjjj
iiiiii
i
ddbbXfbfff
ddbbXfbfffXP
1
,,,
,,,
θτθτθθτθ
θτθτθθτθτ
( ) ( ) ηηλλ dXfXf ∫= ,
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Bayesian versus Likelihood
In terms of computational efficiency, stochastic tree
using the program MrBayes appears to be more
efficient than heuristic tree search under likelihood using PAUP program.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Bayesian versus Likelihood
Nevertheless, the running time of the MCMC algorithm
is proportional to the number of iterations the algorithm
is run for. In general, longer chains are needed to achieve convergence in larger data sets due to the
increased number of parameters to be averaged over.
However, many users run shorted chains for larger data sets because larger trees require more computation per
iteration. As a result, it is not always clear whether the
MCMC algorithm has converged in analyses of large data sets.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Bayesian versus Likelihood
Furthermore, significant improvements to heuristic tree
search under likelihood are being made.
It seems that for obtaining a point estimate, likelihood
heuristic search using numerical optimization can be faster than Bayesian stochastic search using MCMC.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Bayesian versus Likelihood
However, no one knows how to use the information in
the likelihood tree search to attach a confidence interval
or some other measure of the sampling error in the ML tree.
As a result, one must currently resort to bootstrapping.
Bootstrapping under likelihood is an expensive
procedure, and appears slower than Bayesian MCMC.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Bayesian versus Likelihood
To many, Bayesian inference of molecular phylogenies
enjoys a theoretical advantage over ML with
bootstrapping.
Posterior probability for a tree or clade has an easy interpretation: it is the probability that the tree or clade
is correct given that data, model, and prior.
In contrast, the interpretation of the bootstrap in
phylogenetics has been controversial.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Bayesian versus Likelihood
It has been noted that Bayesian posterior probabilities
calculated from real data sets are often extremely high.
One may observe that while bootstrap support values
are published only if they are > 50% (as otherwise the relationships may not be considered trustworthy),
posterior clade probabilities are sometimes reported only if they are < 100% (as most of them are 100%!).
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Bayesian versus Likelihood
The difference between the two measures of support
does not itself suggest anything inappropriate about the
Bayesian probabilities, especially given the difficulties in the interpretation of the bootstrap.
However, it has been observed that different models
may produce conflicting trees when applied to the same data, each with high posterior probabilities.
Similarly different genes for the same set of species can produce conflicting trees or clades, again each with
high posterior probabilities.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Bayesian versus Likelihood
Bayesian posterior probability for a tree or clade is the
probability that the tree or clade is true given the data,
the likelihood model and the prior.
Thus there can be only three possible reasons for spuriously high clade probabilities:
1.Computer program bugs or problems in running the MCMC algorithms
2.Misspecification of the likelihood (substitution) model
3.Misspecification and sensitivity of the prior
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Bayesian versus Likelihood
Note that high posterior probabilities were observed in
simulated data sets where the substitution model is
correct and in analyses of small data sets that did not use MCMC.
In those cases, the first two factors do not apply.
The third factor, the sensitivity of Bayesian inference to
prior specification, is more fundamental and difficult to
deal with.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Bayesian versus Likelihood
Assume independent exponential priors with means µ0
and µ1 for internal and external branch lengths, respectively.
The posterior probabilities of trees might be unduly
influenced by the prior mean µ0 on the internal branch lengths.
It is easy to see that high posterior probabilities for
trees will decrease if µ0 is small; if µ0 = 0, all trees and clades will have posterior probabilities near zero.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Bayesian versus Likelihood
It was observed that in large data sets, the posterior
clade probabilities are sensitive to µ0 only if µ0 is very small.
In an analysis of 40 land plant species, the sensitive region was found to be (10-5, 10-3). Such branch
lengths seem unrealistically small if we consider estimated internal branch lengths in published trees.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Bayesian versus Likelihood
However, branch lengths in wrong or poorly supported
trees are typically small and often zero.
As the prior is specified to represent our prior
knowledge of internal branch lengths in all binary trees, the majority of which are wrong or poor trees, a very
small µ0 appears necessary.
Yang (2006) Computational Molecular Evolution
國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師
Bayesian versus Likelihood
While posterior clade probabilities are sensitive to the
mean of the prior for internal branch lengths, it is in
general unclear how to formulate sensible priors that are acceptable to most biologists.
The problem merits further investigation.
Yang (2006) Computational Molecular Evolution