Probability, Statistics and Errors in High Energy Physics Wen-Chen Chang Institute of Physics,...

Probability, Statistics and Errorsin High Energy Physics

Wen-Chen ChangInstitute of Physics, Academia Sinica

章文箴中央研究院物理研究所

OutlineOutline

• Errors

• Probability distribution: Binomial, Poisson, Gaussian

• Confidence Level

• Monte Carlo Method

Why do we do experiments?Why do we do experiments?

1. Parameter determination: determine the numerical value of some physical quantity.

2. Hypothesis testing: test whether a particular theory is consistent with our data.

Why estimate errors?Why estimate errors?

• We are concerned not only with the answer but also with its accuracy.

• For example, speed of light 2.998x108 m/sec– (3.090.15) x108:– (3.090.01) x108:– (3.092) x108:

Source of ErrorsSource of Errors

• Random (Statistic) error: the inability of any measuring device to give infinitely accurate answers.

• Systematic error: uncertainty.

Systematic ErrorsSystematic Errors

Systematic effects is a general category which includes effects such as background, scanning efficiency, energy resolution, angle resolution, variation of counter efficiency with beam position and energy, dead time, etc. The uncertainty in the estimation of such as systematic effect is called a systematic error

Systematic Error: reproducible inaccuracy introduced by faulty equipment, calibration, or technique

Bevington Error=mistake?

Error=uncertainty?

Experimental ExamplesExperimental Examples

• Energy in a calorimeter E=aD+ba & b determined by calibration expt

• Branching ratio B=N/(NT) found from Monte Carlo studies

• Steel rule calibrated at 15C but used in warm lab

If not spotted, this is a mistakeIf temp. measured, not a problem

If temp. not measured guess uncertaintyRepeating measurements doesn’t help

0 1 2 3 4 5

The BinomialThe Binomial

n trials r successesIndividual success probability p

rnr pprnr

Variance

V=<(r- )2>=<r2>-<r>2

=np(1-p)

=<r>=rP( r )

1-p p q

A random process with exactly two possible outcomes which occur with fixed probabilities.

Binomial ExamplesBinomial Examples

n=10 p=0.2 p=0.5 p=0.8

p=0.1 n=20

n=50 n=5

PoissonPoisson

‘ Events in a continuum’The probability of observing r

independent events in a time interval t, when the counting rate is and the expected number events in the time interval is .

=<r>=rP( r )

= Variance

V=<(r- )2>=<r2>-<r>2

tconstNp

More about PoissonMore about Poisson

• The approach of the binomial to the Poisson distribution as N increases.

• The mean value of r for a variable with a Poisson distribution is and so is the variance. This is the basis of the well known nn formula that applies to statistical errors in many situations involving the counting of independent events during a fixed interval.

• As , the Poisson distribution tends to a Gaussian one.

Poisson ExamplesPoisson Examples

=25=10=5.0

=0.5 =2.0=1.0

ExamplesExamples

• The number of particles detected by a counter in a time t, in a situation where the particle flux and detector are independent of time, and where counter dead-time is such that <<1.

• The number of interactions produced in a thin target when an intense pulse of N beam particles is incident on it.

• The number of entries in a given bin of a histogram when the data are accumulated over a fixed time interval.

Binomial and PoissonBinomial and Poisson

From an exam paperA student is standing by the road, hoping to hitch a lift. Cars pass

according to a Poisson distribution with a mean frequency of 1 per minute. The probability of an individual car giving a lift is 1%. Calculate the probability that the student is still waiting for a lift

(a) After 60 cars have passed

(b) After 1 hour

a) 0.9960=0.5472 b) e-0.6 * 0.60 /0! =0.5488

Gaussian (Normal)Gaussian (Normal)

Probability Density

xP22 2/)(

=<x>=xP( x ) dx

=Variance

V=<(x- )2>=<x2>-<x>2

Different GaussiansDifferent Gaussians

There’s only one!

Normalisation (if required)

Location change

Width scaling factor

Falls to 1/e of peak at x=

Probability ContentsProbability Contents

68.27% within 195.45% within 299.73% within 3

90% within 1.645 95% within 1.960 99% within 2.576 99.9% within

3.290These numbers apply to Gaussians and only Gaussians

Other distributions have equivalent values which you could use of you wanted

Central Limit TheoremCentral Limit Theorem

Or: why is the Gaussian Normal?If a variable x is produced by the

convolution of variables x1,x2…xN

I) <x>=1+2+…N

II) V(x)=V1+V2+…VN

III) P(x) becomes Gaussian for large N

Multidimensional GaussianMultidimensional Gaussian

e yxyxyyxx yxyx

yxyxyxP

/))((2/)(/)()1(2

),,,,;,(

ee yyxx yx

yxyxyxyxP

2222 2/)(2/)(

1),,,;,(

Chi squaredChi squared

Sum of squared discrepancies, scaled by expected error

Integrate all but 1-D of multi-D Gaussian

About EstimationAbout Estimation

Theory Data

Statistical

Inference

TheoryData

Probability

Calculus

Given these distribution parameters, what can we

say about the data? Given this data, what can we say about the properties or parameters or correctness of

the distribution functions?

What is an estimator?What is an estimator?

An estimator (written with a hat) is a function of the data whose value, the estimate, is intended as a meaningful guess for the value of the parameter . (from PDG) 2)ˆ(

2)ˆ(1

}{ˆ minmax xxx

Minimum Variance Bound

What is a good estimator?What is a good estimator?

A perfect estimator is:• Consistent• Unbiassed

• Efficient

minimum

aaLimitN

adxdxaxPaxPaxPxxaa ...)...;();();(,...),(ˆ...ˆ 2132121

2ˆˆ)ˆ( aaaV

One often has to work with less-than-perfect estimators

1)ˆ(da

The Likelihood FunctionThe Likelihood Function

Set of data {x1, x2, x3, …xN}

Each x may be multidimensional – never mind

Probability depends on some parameter a

a may be multidimensional – never mind

Total probability (density)

P(x1;a) P(x2;a) P(x3;a) …P(xN;a)=L(x1, x2, x3, …xN ;a)

The Likelihood

Maximum Likelihood Estimation

In practice usually maximise ln L as it’s easier to calculate and handle; just add the ln P(xi)

ML has lots of nice properties

Given data {x1, x2, x3, …xN} estimate a by maximising the likelihood L(x1, x2, x3, …xN ;a)

Properties of ML estimationProperties of ML estimation

• It’s consistent (no big deal)

• It’s biased for small NMay need to worry

• It is efficient for large NSaturates the Minimum Variance Bound

• It is invariantIf you switch to using u(a), then û=u(â)

More about MLMore about ML

• It is not ‘right’. Just sensible.

• It does not give the ‘most likely value of a’. It’s the value of a for which this data is most likely.

• Numerical Methods are often needed

• Maximisation / Minimisation in >1 variable is not easy

• Use MINUIT but remember the minus sign

ML does not give goodness-of-fit

• ML will not complain if your assumed P(x;a) is rubbish

• The value of L tells you nothing

Fit P(x)=a1x+a0

will give a1=0; constant P

L= a0N

Just like you get from fitting

Least SquaresLeast Squares

• Measurements of y at various x with errors and prediction f(x;a)

• Probability• Ln L

• To maximise ln L, minimise 2

22 2/));(( axfye 2

ii axfy

So ML ‘proves’ Least Squares. But what ‘proves’ ML? Nothing

Least Squares: The Really nice thing

• Should get 21 per data point• Minimise 2 makes it smaller – effect is 1 unit

of 2 for each variable adjusted. (Dimensionality of MultiD Gaussian decreased by 1.)

Ndegrees Of Freedom=Ndata pts – N parameters

• Provides ‘Goodness of agreement’ figure which allows for credibility check

Chi Squared ResultsChi Squared Results

Large 2 comes from

1. Bad Measurements

2. Bad Theory

3. Underestimated errors

4. Bad luck

Small 2 comes from

1. Overestimated errors

2. Good luck

Fitting HistogramsFitting Histograms

Often put {xi} into bins

Data is then {nj}

nj given by Poisson,

mean f(xj) =P(xj)x4 Techniques

Full MLBinned MLProper 2

Simple 2

What you maximise/minimiseWhat you maximise/minimise

j j jjjjj ffnfnPoissonL ln);(lnln

• Full ML

• Binned ML

• Proper 2

• Simple 2

i i axPL );(lnln

Confidence Level:Meaning of Error Estimates

• How often we expect to include “the true fixed value of our paramter” P0, within our quoted range, pp, for a repeated series of experiments?

• For the actual value P0, the probability that a measurement will give us an answer in a specific range of p is given by the area under the relevant part of Gaussian curve. A conventional choice of this probability is 68%.

The Straightforward ExampleThe Straightforward Example

Apples of different weights

Need to describe the distribution

= 68g = 17 g

50 100

All weights between 24 and 167 g (Tolerance)

90% lie between 50 and 100 g

94% are less than 100 g

96% are more than 50 g

Confidence level

statements

Confidence LevelsConfidence Levels

• Can quote at any level

(68%, 95%, 99%…)• Upper or lower or two-sided

(x<U x<L L<x<U)• Two-sided has further choice

(central, shortest…)

L U’

Maximum Likelihood and Confidence Levels

ML estimator (large N) has variance given by MVB

At peak For large N

Ln L is a parabola (L is a Gaussian)

12ˆ )ˆ(

Lda aV

LdaaLL

)ˆ(ln

Ln Laa

2 lnln

)ˆ(ln

Falls by ½ at aaa ˆˆ Falls by 2 at aaa ˆ2ˆ

Read off 68% , 95% confidence regions

Monte Carlo CalculationsMonte Carlo Calculations

• The Monte Carlo approach provides a method of solving probability theory problems in situations where the necessary integrals are too difficult to perform.

• Crucial element: random number generator.

An ExampleAn Example

1.-0 range in the

ddistributeuniformly numbers random of series a

of members are where)(*.2

/)(*)5.0(.1

rabrax

nabiax

ReferencesReferences

• Lectures and Notes on Statistics in HEP, http://www.ep.ph.bham.ac.uk//group/locdoc/lectures/stats/index.html

• Lecture notes of Prof. Roger Barlow, http://www.hep.man.ac.uk/u/roger/

• Louis Lyons, “Statistics for Nuclear and Particle Physicists”, Cambridge 1986.

• Particle Data Group, http://pdg.lbl.gov/2004/reviews/contents_sports.html#mathtoolsetc

Probability, Statistics and Errors in High Energy Physics Wen-Chen Chang Institute of Physics,...

Documents

Transcript of Probability, Statistics and Errors in High Energy Physics Wen-Chen Chang Institute of Physics,...

x`UV - TMU PHYSICS...全学理学研究科委員会委員 平成16年度全学理学研究科委員 全学 理学研究科 一般教育委員 浜津 理学研究科長 佐藤 学力検査委員

中国科学院高能物理研究所 INSTITUTE OF HIGH ENERGY PHYSICS

速記科学研究会、速記・言語科学研究会合同研究会 5th ...caneco.my.coocan.jp/001j/pitman_system.pdf速記科学研究会、速記・言語科学研究会合同研究会

Wireless Internet 研究群 研究能量與研發成果

課題研究Q6：量子光学研究室 - scphys.kyoto-u.ac.jp · “Many-body physics with ultracold gases” I. Bloch et al.,, Rev. Mod. Phys. 80,885(2008) “Atomic Physics” (Oxford

Institute for the Physics and Mathematics of the … › pdf › news10 › J_KeiichiMaeda.pdfInstitute for the Physics and Mathematics of the Universe 数物連携宇宙研究機構

及体科学研究勰匰磁務印及動研究分野 ⾼奈研究室 · 及体科学研究勰匰磁務印及動研究分野 ⾼奈研究室 主な研究内容の紹介 及体科学研究勰

研究背景與動機 研究之重要性 研究問題 文獻回顧 研究架構

（別紙1−3 mercari R4D 共同研究パートナー・研究 …（別紙1−3） 【mercari R4D 共同研究パートナー・研究テーマ】 研究テーマ 概要 共同研究パートナー

Employment Ratio of Graduates Fiscal Year 2015...Econ. 理工学研究科創成科学研究科技術経営研究科 Inn.*1 東アジア研究科医学系研究科 Med. 理工学研究科創成科学研究科連合獣医学研究科

ソフトウェア分析学研究室の 研究テーマanalytics.jpn.org/ResearchTopics20180222a.pdfソフトウェア分析学研究室の 研究テーマ 岡山大学大学院自然科学研究科

研究 范式和研究 方法介绍

20131209 CSEC ITリスク研究会 デジタルフォレンジック研究会合同研究会

ビルマネジメントビジネス研究会 略称：ＢＭＢ研究会 ２００４

一、 緒論 本章僅就本研究的研究背景、研究動機、研究目標、研 … · 一、 緒論 本章僅就本研究的研究背景、研究動機、研究目標、研究的重要性及本研究的

研究内容（又は主要成果）（研究期間：H21～H24） 北大・北農研連携による研究内容 研究目的 研究内容（又は主要成果） 研究担当者 北大大学院農学研究院

中国科学院理化技术研究所 Technical Institute of Physics and Chemistry, CAS Technical Institute of Physics and Chemistry, CAS Introduction of Large Cryogenic Engineering.

（国研）建築研究所...国立研究開発法人建築研究所 1 （国研）建築研究所 建築生産研究グループ 主任研究員 石原直 2018.3.2（金） 建築研究所講演会

Itdti fIntroduction of Non-StandardPhysicsStandard …Itdti fIntroduction of Non-StandardPhysicsStandard Physics inNeutrinosin Neutrinos 第20回「宇宙ニトリノニュートリノ」研究会

How to Broaden Our Horizon--- Simply Read Vocabulary 1. researcher n. [C] 研究員 research n. [U] 研究 v. [T] 研 究 research n. [U] 研究 v. [T] 研 究 例： The.

x`UV - TMU PHYSICS...全学理学研究科委員会委員平成16年度全学理学研究科委員全学理学研究科一般教育委員浜津理学研究科長佐藤学力検査委員

Wireless Internet 研究群研究能量與研發成果

及体科学研究勰匰磁務印及動研究分野⾼奈研究室 · 及体科学研究勰匰磁務印及動研究分野⾼奈研究室主な研究内容の紹介及体科学研究勰

研究背景與動機研究之重要性研究問題文獻回顧研究架構

（別紙1−3 mercari R4D 共同研究パートナー・研究 …（別紙1−3）【mercari R4D 共同研究パートナー・研究テーマ】研究テーマ概要共同研究パートナー

ソフトウェア分析学研究室の研究テーマanalytics.jpn.org/ResearchTopics20180222a.pdfソフトウェア分析学研究室の研究テーマ岡山大学大学院自然科学研究科

研究范式和研究方法介绍

20131209 CSEC ITリスク研究会デジタルフォレンジック研究会合同研究会

ビルマネジメントビジネス研究会略称：ＢＭＢ研究会２００４

一、緒論本章僅就本研究的研究背景、研究動機、研究目標、研 … · 一、緒論本章僅就本研究的研究背景、研究動機、研究目標、研究的重要性及本研究的

研究内容（又は主要成果）（研究期間：H21～H24）北大・北農研連携による研究内容研究目的研究内容（又は主要成果）研究担当者北大大学院農学研究院

（国研）建築研究所...国立研究開発法人建築研究所 1 （国研）建築研究所建築生産研究グループ主任研究員石原直 2018.3.2（金）建築研究所講演会

How to Broaden Our Horizon--- Simply Read Vocabulary 1. researcher n. [C] 研究員 research n. [U] 研究 v. [T] 研究 research n. [U] 研究 v. [T] 研究例： The.