There are two principal philosophies in statistical data...

國立交通大學國立交通大學國立交通大學國立交通大學生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所林勇欣老師林勇欣老師林勇欣老師林勇欣老師

There are two principal philosophies in statistical data

analysis:The classical or frequentist and the Bayesian.

The frequentist defines the probability of an event as the expected frequency of occurrence of that event in

repeated draws from a real or imaginary population.

The performance of an inference procedure is judged by its properties in repeated sampling from the data-

generating model, with the parameters fixed.

Important concepts include bias and variance of an

estimator, confidence intervals, and p values.

Yang (2006) Computational Molecular Evolution

There are two principal philosophies in statistical data

analysis:The classical or frequentist and the Bayesian.

Bayesian statistics is not mentioned in most biostatistics course.

The key feature of Bayesian methods is the notion of a

probability distribution for the parameter.

( )( ) ( )

ABPAPBAP

( ) ( ) ( ) ( )ABPAPBAPBP ×=×

The key feature of Bayesian methods is the notion of a

probability distribution for the parameter.

Here probability cannot be interpreted as the frequency

in random draws from a population but instead is used to represent uncertainty about the parameter.

In classical statistics, parameters are unknown

constants and cannot have distributions.

Bayesian proponents argue that since the value of the parameter is unknown, it is sensible to specify a

probability distribution to describe its possible values.

The distribution of the parameter before the data are

analyzed is called the prior distribution.

This can be specified by using either an objective

assessment of prior evidence concerning the parameter or the researcher’s subjective opinion.

The Bayes theorem is then used to calculate the

posterior distribution of the parameter, that is, the conditional distribution of the parameter given the data.

All inferences about the parameter are based on the posterior.

Suppose the occurrence of a certain B may depend on

whether another event A has occurred. Then the probability that B occurs is given by the law of total

probabilities

Here Ā stands for ‘non A’ or ‘A does not occur’.

Bayes’s theorem, also known as the inverse-probability theorem, gives the conditional probability that B occurs

given that A occurs.

( ) ( ) ( ) ( ) ( )ABPAPABPAPBP ×+×=

( ) ( ) ( )( )

( ) ( )( ) ( ) ( ) ( )ABPAPABPAP

ABPAPBAP

Example. (False positives of a clinical test)

Suppose a new test has been developed to screen for

an infection in the population. If a person has the infection, the test accurately reports a positive 99% of

the time, and if a person does not have the infection, the test falsely reports a positive only 2% of the time.

Suppose that 0.1% of the population have the

infection. What is the probability that a person who has tested positive actually has the infection?

Let A be the event that a person has the infection,

and Ā no infection. Let B stand for test-positive. Then

P(A) = P(Ā) =P(B|A) = P(B|Ā) =

and Ā no infection. Let B stand for test-positive. Then

P(A) =0.001 P(Ā) = 0.999P(B|A) = 0.99 P(B|Ā) = 0.02

and Ā no infection. Let B stand for test-positive. ThenP(A) =0.001 P(Ā) = 0.999

P(B|A) = 0.99 P(B|Ā) = 0.02

The probability that a random person from the

population tests positive is

( ) 02097.002.0999.099.0001.0 =×+×=BP

( ) ( ) ( ) ( ) ( )ABPAPABPAPBP ×+×=

This is close to the proportion among the noninfected

individuals of the population.

and Ā no infection. Let B stand for test-positive. ThenP(A) =0.001 P(Ā) = 0.999

P(B|A) = 0.99 P(B|Ā) = 0.02

The probability that a random person from the

population tests positive is

The probability that a person who has tested positive

has the infection is

( ) 02097.002.0999.099.0001.0 =×+×=BP

( ) ( ) ( )( )

0472.002097.0

99.0001.0=

ABPAPBAP ?

When Bayes’s theorem is used in Bayesian statistics, A

and Ā correspond to different hypotheses H1 and H2, while B corresponds to the observed data (X).

Bayes’s theorem then specifies the conditional probability of hypothesis H1 given the data as

( ) ( ) ( )( )

( ) ( )( ) ( ) ( ) ( )

1HXPHPHXPHP

HXPHPXHP

( ) ( )XHPXHP12

1−= ?

Here P(H1) and P(H2) are called prior probabilities

They are probabilities assigned to the hypotheses

before the data are observed or analyzed

( ) ( ) ( )( )

( ) ( )( ) ( ) ( ) ( )

1HXPHPHXPHP

HXPHPXHP

Here P(H1) and P(H2) are called prior probabilities

The conditional probabilities P(H1|X) and P(H2|X) are called posterior probabilities

P(X|H1) and P(X|H2) are the likelihood under each hypothesis

( ) ( ) ( )( )

( ) ( )( ) ( ) ( ) ( )

1HXPHPHXPHP

HXPHPXHP

Note that in the disease testing example discussed

above, P(A) and P(Ā) are frequencies of infected and noninfected individuals in the population.

There is no controversy concerning the use of

Bayes’s theorem in such problems.

However, in Bayesian statistics, the prior probabilities

P(H1) and P(H2) often do not have such a frequentistinterpretation. The use of Bayes’s theorem in such a

context is controversial.

When the hypothesis concerns unknown continuous

parameters, probability densities are used instead of probabilities.

Bayes’s theorem then takes the following form:

Here f(θ) is the prior distribution

f(θ|X) is the posterior distribution

f(X|θ) is the likelihood (the probability of data X given parameter θ)The marginal probability of the data, f(X), is a

normalizing constant, to make f(θ|X) integrate to 1.

( )( ) ( )

( ) ( ) θθθ

θθθθθ

An important strength of the Bayesian approach is that

it provides a natural way of dealing with nuisance parameters through integration or marginalization.

Let θ = (λ, η), with λ to be the parameters of interest and the η nuisance parameters. The joint posterior

density of λ and η is:

From which the (marginal) posterior density of λ can be obtained as

( )( ) ( )

( ) ( ) ηληληλ

ηληληληληλ

( ) ( ) ηηλλ dXfXf ∫= ,

Example:

Consider the use of the JC69 model to estimate the

distance θ between the human and orangutan 12s rRNA genes from the mitochondrial genome.

The data are summarized as x = 90 differences out of n

= 948 sites.

To apply the Bayesian approach, one has to specify a prior. Uniform priors are commonly used in Bayesian

analysis. In this case, one could specify, say, U(0,100),

with a large upper bound. However, the uniform prior is not very reasonable since sequence distances

estimated from real data are often small (say, < 1).

Example:

= 948 sites.

We instead use an exponential prior

with mean µ = 0.2.

The posterior distribution of θ is:

( ) µθ

µθ −= ef

( )( ) ( )

( ) ( ) θθθ

θθθθθ

Example:

= 948 sites.

We consider the data to have a binomial distribution,

with probability for a difference and

1 – p for an identity. The likelihood is thus:

3 θ−−= ep

( ) ( )xnx

xnxeeppxf

−−−

−=−= 3434

θθθ

Example:

= 948 sites.

( ) ( )xnx

xnxeeppxf

−−−

−=−= 3434

θθθ

( )( ) ( )

( ) ( ) θθθ

θθθθθ

( ) 2.0

1 θθ −= ef ( ) ( ) ( ) 1311016776.5

−×== ∫ θθθ dxffxf

Example:

= 948 sites.

The mean of the posterior distribution is found by

numerical integration to be

Which is very similar to the MLE of = 0.1015,despite their different interpretations.

( )( ) ( )

( ) ( ) θθθ

θθθθθ

( ) ( ) 10213.0== ∫ θθθθ dxfxE

Criticisms of frequentist statistics:

A major Bayesian criticism of classical statistics is that

it does not answer the right question.

Classical methods provide probability statements about

the data or the method for analyzing the data, but not about the parameter, even though the data have

already been observed and our interest is in the parameter.

Suppose x = 9 heads and r = 3 tails are observed in n =

12 independent tosses of a coin, and we wish to test

the null hypothesis H0: θ = ½ against the alternative H1: θ > ½, where θ is the true probability of heads.

Suppose the number of tosses n is fixed, so that x has

a binomial distribution, with probability

The probability of the observed data x = 9, which is

0.05371, is not the p value. The p value is 0.075.

( ) ( ) xnx

−−

= θθθ 1

The p values are also criticized for violating the likelihood

principle, which says that the likelihood function contains

all information in the data about θ and the same inference should be made from two experiments that have the same

likelihood.

Considering a different experimental design, in which the number of tails r was fixed beforehand; in other words, the

coin was tossed until r = 3 tails were observed, at which

point x = 9 heads were observed.The data x then have a negative binomial distribution, with

probability ( ) ( ) xnx

−−

−+= θθθ 1

The p values are also criticized for violating the

likelihood principle, which says that the likelihood

function contains all information in the data about θ and the same inference should be made from two

experiments that have the same likelihood.

If we use this model, the p value becomes

0325.02

−+=∑

( ) ( ) xnx

−−

−+= θθθ 1

Criticisms of Bayesian methods:

All criticisms of Bayesian methods are levied on the

prior or the need for it

Bayesians come into two flavors:

the objective and the subjective

The objective Bayesians consider the prior to be a representation of prior objective information about the

parameter.

The approach runs into trouble when no prior

information is available about the parameter and the prior is supposed to represent total ignorance.

For a continuous parameter, a uniform distribution over the range of the parameter might be assigned.

However, such so-called flat or noninformative priors

lead to contradictions.

If x has a uniform distribution, x2 cannot have a uniform distribution.

Similarly, a uniform prior for the probability of different sites p is very different from a uniform prior for

sequence distance θ under the JC69 model.

Such difficulties in representing total ignorance have

caused the objective Bayesian approach to fall out of

favor.

The subjective Bayesians consider the prior to represent the researcher’s subjective belief about the

parameter before seeing or analyzing the data.

One cannot really argue against somebody else’s subjective beliefs, but ‘classical’ statisticians reject the

notion of subjective probabilities and of letting personal

prejudices influence scientific inference.

Even though the choice of the likelihood model involves certain subjectivity as well, the model can nevertheless

be checked against the data, but no such validation is

possible for the prior.

Bayesian phylogenetics

It is straightforward to formulate the problem of

phylogeny reconstruction in the general framework of

Bayesian inference.

Let X be the sequence data. Let θ include all parameters in the substitution model, with a prior

distribution f(θ). Let τi be the ith tree topology, i = 1, 2,…,Ts, where Ts is the total number of tree topologies

for s species.

Usually a uniform prior f(τi) = 1/Ts is assumed, although this means nonuniform prior probabilities for clades.

for s species.

Let bi be the vector of branch lengths on tree τi, with prior probability f(bi). MrBayes 3 assumes that branch

lengths have independent uniform or exponential priors

with the parameter (upper bound for the uniform or mean for the exponential) set by the user.

for s species.

Let bi be the vector of branch lengths on tree τi, with prior probability f(bi).

( )( ) ( ) ( ) ( )

( ) ( ) ( ) ( )∑∫∫

∫∫

jjjjjj

iiiiii

ddbbXfbfff

ddbbXfbfffXP

θτθτθθτθ

θτθτθθτθτ

This is a direct application of ,

treating τ as the parameter of interested and all other parameters as nuisance parameters.

Note that the dominator, the marginal probability of the

data f(X), is a sum over all possible tree topologies and,

for each tree topology τj, an integral over all branch

lengths bj and substitution parameters θ.

( )( ) ( ) ( ) ( )

( ) ( ) ( ) ( )∑∫∫

∫∫

jjjjjj

iiiiii

ddbbXfbfff

ddbbXfbfffXP

θτθτθθτθ

θτθτθθτθτ

( ) ( ) ηηλλ dXfXf ∫= ,

Bayesian versus Likelihood

In terms of computational efficiency, stochastic tree

using the program MrBayes appears to be more

efficient than heuristic tree search under likelihood using PAUP program.

Nevertheless, the running time of the MCMC algorithm

is proportional to the number of iterations the algorithm

is run for. In general, longer chains are needed to achieve convergence in larger data sets due to the

increased number of parameters to be averaged over.

However, many users run shorted chains for larger data sets because larger trees require more computation per

iteration. As a result, it is not always clear whether the

MCMC algorithm has converged in analyses of large data sets.

Furthermore, significant improvements to heuristic tree

search under likelihood are being made.

It seems that for obtaining a point estimate, likelihood

heuristic search using numerical optimization can be faster than Bayesian stochastic search using MCMC.

However, no one knows how to use the information in

the likelihood tree search to attach a confidence interval

or some other measure of the sampling error in the ML tree.

As a result, one must currently resort to bootstrapping.

Bootstrapping under likelihood is an expensive

procedure, and appears slower than Bayesian MCMC.

To many, Bayesian inference of molecular phylogenies

enjoys a theoretical advantage over ML with

bootstrapping.

Posterior probability for a tree or clade has an easy interpretation: it is the probability that the tree or clade

is correct given that data, model, and prior.

In contrast, the interpretation of the bootstrap in

phylogenetics has been controversial.

It has been noted that Bayesian posterior probabilities

calculated from real data sets are often extremely high.

One may observe that while bootstrap support values

are published only if they are > 50% (as otherwise the relationships may not be considered trustworthy),

posterior clade probabilities are sometimes reported only if they are < 100% (as most of them are 100%!).

The difference between the two measures of support

does not itself suggest anything inappropriate about the

Bayesian probabilities, especially given the difficulties in the interpretation of the bootstrap.

However, it has been observed that different models

may produce conflicting trees when applied to the same data, each with high posterior probabilities.

Similarly different genes for the same set of species can produce conflicting trees or clades, again each with

high posterior probabilities.

Bayesian posterior probability for a tree or clade is the

probability that the tree or clade is true given the data,

the likelihood model and the prior.

Thus there can be only three possible reasons for spuriously high clade probabilities:

1.Computer program bugs or problems in running the MCMC algorithms

2.Misspecification of the likelihood (substitution) model

3.Misspecification and sensitivity of the prior

Note that high posterior probabilities were observed in

simulated data sets where the substitution model is

correct and in analyses of small data sets that did not use MCMC.

In those cases, the first two factors do not apply.

The third factor, the sensitivity of Bayesian inference to

prior specification, is more fundamental and difficult to

deal with.

Assume independent exponential priors with means µ0

and µ1 for internal and external branch lengths, respectively.

The posterior probabilities of trees might be unduly

influenced by the prior mean µ0 on the internal branch lengths.

It is easy to see that high posterior probabilities for

trees will decrease if µ0 is small; if µ0 = 0, all trees and clades will have posterior probabilities near zero.

It was observed that in large data sets, the posterior

clade probabilities are sensitive to µ0 only if µ0 is very small.

In an analysis of 40 land plant species, the sensitive region was found to be (10-5, 10-3). Such branch

lengths seem unrealistically small if we consider estimated internal branch lengths in published trees.

However, branch lengths in wrong or poorly supported

trees are typically small and often zero.

As the prior is specified to represent our prior

knowledge of internal branch lengths in all binary trees, the majority of which are wrong or poor trees, a very

small µ0 appears necessary.

While posterior clade probabilities are sensitive to the

mean of the prior for internal branch lengths, it is in

general unclear how to formulate sensible priors that are acceptable to most biologists.

The problem merits further investigation.

There are two principal philosophies in statistical data...

Documents

Transcript of There are two principal philosophies in statistical data...

There is there-

2008 632 587ﺹ ﺱﺩﺎﺴﻟﺍ (ﺔﻴﻨﺎﺴﻨﻹﺍgulfkids.com/pdf/Dakaa_aazeem.pdfﺭﺩﺼﻤﻟﺍ ﻡﻴﻅﻌﻟﺍ ﺩﺒﻋ.ﺩ 588 1-There is a significant statistical

THAINGUYEN STATISTICAL YEARBOOK 2016 · THAINGUYEN STATISTICAL YEARBOOK 2016

Statistical Analysis of the CAPM II. Black CAPM · Statistical Analysis of the CAPM II. Black CAPM ... • Thus, there is a portfolio Z, i.e., the zero{beta portfolio with respect

QUANTITATI VE TECHNIQUES FOR B USINESS · Quantitative Techniques for Business 5 CHAPTER ... Statistical techniques ... problems of decision making where there are large number of

` ． =弔 7 !!?y ,‥‥‥‥‥‥ - tncs.world.coocan.jptncs.world.coocan.jp/SMBD6.pdf · “There are three great philosophies in the world today ... men are brothers and that

afrophone philosophies: possibilities and practice. the reflexion of ...

Jean Wahl Les Philosophies de Lexistence

DPABI: Quality Control, Statistical Quality Control ...d.rnet.co/Course/V3.0EN/9_StatisticalAnalysis.pdf · •Quality Control •Statistical Analysis •Results Viewing Statistical

La philosophies des mathématiques de E. W. Beth

Les Philosophies Partagées Projet Franco-Américain

Les Grandes Philosophies - Dominique Folscheid

Combining Taguchi and Response Surface Philosophies : A Dual Response Approach

Lecture -1 Steel Structures Design Philosophies

TESE FINAL PARTE 1 2010 - incor.usp.br · Tabela 1 Principais causas de hiperidrose ... of follow-up; despite this, there was no statistical difference regarding the intensity (severity)

Topic 4: Statistical Inference. Outline Statistical inference –confidence intervals –significance tests Statistical inference for β 1 Statistical inference.

There is or There are?. ……………many apples in the basket. There are There is There are.

Philosophies d'Ailleurs Tib

1. Etymologie / Définitions 2. Notions/Concepts : - Des philosophies dévalorisantes aux philosophies valorisantes - De la perception à limagination - De.

There is There are