Probability, Statistics and Errors in High Energy Physics Wen-Chen Chang Institute of Physics,...

Post on 13-Dec-2015

217 views 0 download

Transcript of Probability, Statistics and Errors in High Energy Physics Wen-Chen Chang Institute of Physics,...

Probability, Statistics and Errorsin High Energy Physics

Probability, Statistics and Errorsin High Energy Physics

Wen-Chen ChangInstitute of Physics, Academia Sinica

章文箴中央研究院 物理研究所

OutlineOutline

• Errors

• Probability distribution: Binomial, Poisson, Gaussian

• Confidence Level

• Monte Carlo Method

Why do we do experiments?Why do we do experiments?

1. Parameter determination: determine the numerical value of some physical quantity.

2. Hypothesis testing: test whether a particular theory is consistent with our data.

Why estimate errors?Why estimate errors?

• We are concerned not only with the answer but also with its accuracy.

• For example, speed of light 2.998x108 m/sec– (3.090.15) x108:– (3.090.01) x108:– (3.092) x108:

Source of ErrorsSource of Errors

• Random (Statistic) error: the inability of any measuring device to give infinitely accurate answers.

• Systematic error: uncertainty.

Systematic ErrorsSystematic Errors

Systematic effects is a general category which includes effects such as background, scanning efficiency, energy resolution, angle resolution, variation of counter efficiency with beam position and energy, dead time, etc. The uncertainty in the estimation of such as systematic effect is called a systematic error

Orear

Systematic Error: reproducible inaccuracy introduced by faulty equipment, calibration, or technique

Bevington Error=mistake?

Error=uncertainty?

Experimental ExamplesExperimental Examples

• Energy in a calorimeter E=aD+ba & b determined by calibration expt

• Branching ratio B=N/(NT) found from Monte Carlo studies

• Steel rule calibrated at 15C but used in warm lab

If not spotted, this is a mistakeIf temp. measured, not a problem

If temp. not measured guess uncertaintyRepeating measurements doesn’t help

0

0.2

0.4

0 1 2 3 4 5

r

The BinomialThe Binomial

n trials r successesIndividual success probability p

rnr pprnr

npnrP

)1(

)!(!

!),;(

Variance

V=<(r- )2>=<r2>-<r>2

=np(1-p)

Mean

=<r>=rP( r )

= np

1-p p q

A random process with exactly two possible outcomes which occur with fixed probabilities.

Binomial ExamplesBinomial Examples

0

0.5

1

r

0

0.1

0.2

0.3

r

0

0.1

0.2

0.3

0.4

r

0

0.1

0.2

0.3

0.4

r

n=10 p=0.2 p=0.5 p=0.8

0

0.1

0.2

0.3

r

p=0.1 n=20

0

0.05

0.1

0.15

0.2

r

n=50 n=5

PoissonPoisson

‘ Events in a continuum’The probability of observing r

independent events in a time interval t, when the counting rate is and the expected number events in the time interval is .

0

0.1

0.2

0.3

r

!!

)();(

re

r

terP

rrt

Mean

=<r>=rP( r )

= Variance

V=<(r- )2>=<r2>-<r>2

=

=2.5

tconstNp

p

N

0

More about PoissonMore about Poisson

• The approach of the binomial to the Poisson distribution as N increases.

• The mean value of r for a variable with a Poisson distribution is and so is the variance. This is the basis of the well known nn formula that applies to statistical errors in many situations involving the counting of independent events during a fixed interval.

• As , the Poisson distribution tends to a Gaussian one.

Poisson ExamplesPoisson Examples

0

0.1

0.2

0.3

r0

0.2

0.4

0.6

0.8

r

0

0.1

0.2

0.3

0.4

r

0

0.1

0.2

r

0

0.2

r

0

0.1

r

=25=10=5.0

=0.5 =2.0=1.0

ExamplesExamples

• The number of particles detected by a counter in a time t, in a situation where the particle flux and detector are independent of time, and where counter dead-time is such that <<1.

• The number of interactions produced in a thin target when an intense pulse of N beam particles is incident on it.

• The number of entries in a given bin of a histogram when the data are accumulated over a fixed time interval.

Binomial and PoissonBinomial and Poisson

From an exam paperA student is standing by the road, hoping to hitch a lift. Cars pass

according to a Poisson distribution with a mean frequency of 1 per minute. The probability of an individual car giving a lift is 1%. Calculate the probability that the student is still waiting for a lift

(a) After 60 cars have passed

(b) After 1 hour

a) 0.9960=0.5472 b) e-0.6 * 0.60 /0! =0.5488

Gaussian (Normal)Gaussian (Normal)

Probability Density

ex

xP22 2/)(

2

1),;(

Mean

=<x>=xP( x ) dx

=Variance

V=<(x- )2>=<x2>-<x>2

=

Different GaussiansDifferent Gaussians

There’s only one!

Normalisation (if required)

Location change

Width scaling factor

Falls to 1/e of peak at x=

Probability ContentsProbability Contents

68.27% within 195.45% within 299.73% within 3

90% within 1.645 95% within 1.960 99% within 2.576 99.9% within

3.290These numbers apply to Gaussians and only Gaussians

Other distributions have equivalent values which you could use of you wanted

Central Limit TheoremCentral Limit Theorem

Or: why is the Gaussian Normal?If a variable x is produced by the

convolution of variables x1,x2…xN

I) <x>=1+2+…N

II) V(x)=V1+V2+…VN

III) P(x) becomes Gaussian for large N

Multidimensional GaussianMultidimensional Gaussian

e yxyxyyxx yxyx

yx

yxyxyxP

/))((2/)(/)()1(2

1

2

22222

12

1

),,,,;,(

ee yyxx yx

yxyxyxyxP

2222 2/)(2/)(

2

1),,,;,(

Chi squaredChi squared

Sum of squared discrepancies, scaled by expected error

Integrate all but 1-D of multi-D Gaussian

n

i i

iix

1

2

2

2/22/

2 2

)2/(

2);(

e

nnP n

n

About EstimationAbout Estimation

Theory Data

Statistical

Inference

TheoryData

Probability

Calculus

Given these distribution parameters, what can we

say about the data? Given this data, what can we say about the properties or parameters or correctness of

the distribution functions?

What is an estimator?What is an estimator?

An estimator (written with a hat) is a function of the data whose value, the estimate, is intended as a meaningful guess for the value of the parameter . (from PDG) 2)ˆ(

1}{ˆ

iix

NxV

i

ixN

x1

}{̂

2)ˆ(1

1}{ˆ

iix

NxV

2

}{ˆ minmax xxx

Minimum Variance Bound

What is a good estimator?What is a good estimator?

A perfect estimator is:• Consistent• Unbiassed

• Efficient

minimum

aaLimitN

ˆ

adxdxaxPaxPaxPxxaa ...)...;();();(,...),(ˆ...ˆ 2132121

2ˆˆ)ˆ( aaaV

One often has to work with less-than-perfect estimators

2

2 ln

1)ˆ(da

LdaV

The Likelihood FunctionThe Likelihood Function

Set of data {x1, x2, x3, …xN}

Each x may be multidimensional – never mind

Probability depends on some parameter a

a may be multidimensional – never mind

Total probability (density)

P(x1;a) P(x2;a) P(x3;a) …P(xN;a)=L(x1, x2, x3, …xN ;a)

The Likelihood

Maximum Likelihood Estimation

Maximum Likelihood Estimation

In practice usually maximise ln L as it’s easier to calculate and handle; just add the ln P(xi)

ML has lots of nice properties

aadA

dL

Given data {x1, x2, x3, …xN} estimate a by maximising the likelihood L(x1, x2, x3, …xN ;a)

a

Ln L

â

Properties of ML estimationProperties of ML estimation

• It’s consistent (no big deal)

• It’s biased for small NMay need to worry

• It is efficient for large NSaturates the Minimum Variance Bound

• It is invariantIf you switch to using u(a), then û=u(â)

a

Ln L

â u

Ln L

û

More about MLMore about ML

• It is not ‘right’. Just sensible.

• It does not give the ‘most likely value of a’. It’s the value of a for which this data is most likely.

• Numerical Methods are often needed

• Maximisation / Minimisation in >1 variable is not easy

• Use MINUIT but remember the minus sign

ML does not give goodness-of-fit

ML does not give goodness-of-fit

• ML will not complain if your assumed P(x;a) is rubbish

• The value of L tells you nothing

Fit P(x)=a1x+a0

will give a1=0; constant P

L= a0N

Just like you get from fitting

Least SquaresLeast Squares

• Measurements of y at various x with errors and prediction f(x;a)

• Probability• Ln L

• To maximise ln L, minimise 2

22 2/));(( axfye 2

);(

2

1

ii

ii axfy

x

y

So ML ‘proves’ Least Squares. But what ‘proves’ ML? Nothing

Least Squares: The Really nice thing

Least Squares: The Really nice thing

• Should get 21 per data point• Minimise 2 makes it smaller – effect is 1 unit

of 2 for each variable adjusted. (Dimensionality of MultiD Gaussian decreased by 1.)

Ndegrees Of Freedom=Ndata pts – N parameters

• Provides ‘Goodness of agreement’ figure which allows for credibility check

Chi Squared ResultsChi Squared Results

Large 2 comes from

1. Bad Measurements

2. Bad Theory

3. Underestimated errors

4. Bad luck

Small 2 comes from

1. Overestimated errors

2. Good luck

Fitting HistogramsFitting Histograms

Often put {xi} into bins

Data is then {nj}

nj given by Poisson,

mean f(xj) =P(xj)x4 Techniques

Full MLBinned MLProper 2

Simple 2

x

x

What you maximise/minimiseWhat you maximise/minimise

j j jjjjj ffnfnPoissonL ln);(lnln

j

j

jj

f

fn 2

j

j

jj

n

fn 2

• Full ML

• Binned ML

• Proper 2

• Simple 2

i i axPL );(lnln

Confidence Level:Meaning of Error Estimates

Confidence Level:Meaning of Error Estimates

• How often we expect to include “the true fixed value of our paramter” P0, within our quoted range, pp, for a repeated series of experiments?

• For the actual value P0, the probability that a measurement will give us an answer in a specific range of p is given by the area under the relevant part of Gaussian curve. A conventional choice of this probability is 68%.

The Straightforward ExampleThe Straightforward Example

Apples of different weights

Need to describe the distribution

= 68g = 17 g

50 100

All weights between 24 and 167 g (Tolerance)

90% lie between 50 and 100 g

94% are less than 100 g

96% are more than 50 g

Confidence level

statements

Confidence LevelsConfidence Levels

• Can quote at any level

(68%, 95%, 99%…)• Upper or lower or two-sided

(x<U x<L L<x<U)• Two-sided has further choice

(central, shortest…)

U

L U’

Maximum Likelihood and Confidence Levels

Maximum Likelihood and Confidence Levels

ML estimator (large N) has variance given by MVB

At peak For large N

Ln L is a parabola (L is a Gaussian)

2

2 ln

12ˆ )ˆ(

da

Lda aV

aada

LdaaLL

ˆ2

22

max

ln

2

)ˆ(ln

a

Ln Laa

da

Ld

da

Ld

ˆ2

2

2

2 lnln

2

max 2

)ˆ(ln

a

aaLL

Falls by ½ at aaa ˆˆ Falls by 2 at aaa ˆ2ˆ

Read off 68% , 95% confidence regions

Monte Carlo CalculationsMonte Carlo Calculations

• The Monte Carlo approach provides a method of solving probability theory problems in situations where the necessary integrals are too difficult to perform.

• Crucial element: random number generator.

An ExampleAn Example

1.-0 range in the

ddistributeuniformly numbers random of series a

of members are where)(*.2

/)(*)5.0(.1

)(

)(

1

iii

i

n

ii

b

a

rabrax

nabiax

xyn

ab

dxxyI

ReferencesReferences

• Lectures and Notes on Statistics in HEP, http://www.ep.ph.bham.ac.uk//group/locdoc/lectures/stats/index.html

• Lecture notes of Prof. Roger Barlow, http://www.hep.man.ac.uk/u/roger/

• Louis Lyons, “Statistics for Nuclear and Particle Physicists”, Cambridge 1986.

• Particle Data Group, http://pdg.lbl.gov/2004/reviews/contents_sports.html#mathtoolsetc