Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5,...

40
Statistics ---Summary Aug. 5, 2020 来嶋 秀治 (Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計特論 (Probability & Statistics) Lesson 13

Transcript of Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5,...

Page 1: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Statistics ---Summary

Aug. 5, 2020

来嶋 秀治 (Shuji Kijima)

Dept. Informatics,

Graduate School of ISEE

確率統計特論 (Probability & Statistics)

Lesson 13

Page 2: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

2

Final exam (期末試験)

Date/time: August 12 (8/12), 13:00- 14:30

Place (場所): at moodle.

Submit electronic files (incl. photo: recommended). ≤10MB.

Keep your “original data” (I may ask to submit them later).

電子ファイルを提出 (写真可: 推奨).10MB以内.

紙/データを手元に保存しておくこと

(後日提出を求める場合がある).

Topics (範囲):

Probability and Statistics.

check the course page (講義ページを参照のこと)

http://tcs.inf.kyushu-u.ac.jp/~kijima/

Books, notes, google, etc. are allowed to use (持ち込み可).

Communication (e-mail, SNS, BBS) is prohibited (相談不可).

Page 3: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Statistics I

July 1, 2020

来嶋 秀治 (Shuji Kijima)

Dept. Informatics,

Graduate School of ISEE

Todays topics

• estimating population mean

• estimating population variance

• consistent estimator (一致推定量)

• unbiased estimator (不偏推定量)

確率統計特論 (Probability & Statistics)

Lesson 8

Page 4: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Statistics Inference (統計的推論)

Estimation (推定) 8,9,12th

Statistical test (統計検定) 10th

Regression (回帰) 11th

Applications

Machine learning (機械学習),

Pattern recognition (パターン認識),

Data mining (データマイニング), etc.

Statistics / Data science4

Page 5: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Population, sample, stochastic model5

Example 1

We sample 6 accounts of twixxer at random. The following table

shows the numbers of followers.

1 2 3 4 5 6

#followers 372 623 89 781 3219 152

Q. How large is the population mean of followers?

Suppose that the number of followers follows some

distribution (e.g., Ex 𝜆 ) with expectation 𝜇.

=> Sample mean ത𝑋 =𝑋1+⋯+𝑋𝑛

𝑛= 872. 7

population (母集団)

sample (標本)

stochastic model (確率モデル)

Page 6: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Sample mean6

Proposition

ത𝑋 =𝑋1+⋯+𝑋𝑛

𝑛is a consistent estimator of 𝜇.

sample mean

Proof.

By the law of large numbers.

Proposition

ത𝑋 =𝑋1+⋯+𝑋𝑛

𝑛is an unbiased estimator of 𝜇.

Definition

𝑇(𝑋) is an unbiased estimator of 𝑔 𝜃

if 𝐸𝜃 𝑇 𝑋 − 𝑔 𝜃 = 0 holds.

Page 7: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Population, sample, stochastic model7

Example 1

We sample 6 accounts of twixxer at random. The following table

shows the numbers of followers.

1 2 3 4 5 6

#followers 372 623 89 781 3219 152

Q. How large is the population variance of #followers?

Suppose that the number of followers follows some

distribution (e.g., Ex 𝜆 ) with expectation 𝜇 and variance 𝜎2

Recall Var 𝑋 ≔ E 𝑋 − 𝜇 2

population (母集団)

sample (標本)

stochastic model (確率モデル)

Page 8: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Consistency of a sample variance8

Proposition

σ𝑖=1𝑛 𝑋𝑖− ത𝑋 2

𝑛−1is a consistent estimator of 2 (if Var 𝑋 − 𝐸 𝑋 2 < ∞)

Proposition

σ𝑖=1𝑛 𝑋𝑖− ത𝑋 2

𝑛−1is an unbiased estimator of 2 (if Var 𝑋 − 𝐸 𝑋 2 < ∞)

Proposition

σ𝑖=1𝑛 𝑋𝑖− ത𝑋 2

𝑛is NOT an unbiased estimator of 2 (in general)

Page 9: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Statistics II

July 8, 2020

来嶋 秀治 (Shuji Kijima)

Dept. Informatics,

Graduate School of ISEE

Todays topics

• maximum likelihood (最尤推定)

確率統計特論 (Probability & Statistics)

Lesson 9

Page 10: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Statistical inference: maximum likelihood10

Example 1

The number of defective products per 10,000 products.

How often do detectives appear?

lot 1 2 3 4 5 6 7 8 9 10

#defective 0 2 0 0 1 1 0 3 1 0

Let 𝑋 be a r.v. denoting #defectives,

Then 𝑋 ∼ Po(𝜆), i.e.,

Pr 𝑋 = 𝑥 =: 𝑓 𝑥 = e−𝜆𝜆𝑥

𝑥!

for unknown parameter 𝜆.

Page 11: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Maximum likelihood 11

Preparation

Let 𝑋1, … , 𝑋𝑛 be i.i.d. with density function 𝑓(𝑥; 𝜃), and

let 𝑓 𝑥1, . . , 𝑥𝑛; 𝜃 ≔ 𝑓 𝑥1; 𝜃 ⋯𝑓(𝑥𝑛; 𝜃),

i.e., density function of a joint distribution of 𝑋1, … , 𝑋𝑛 .

: parameters

e.g. N(,), E()

remark

X1,…,Xn are independent

Maximum likelihood estimation

Given sample values 𝑋1 = 𝑎1, …, 𝑋𝑛 = 𝑎𝑛,

let 𝐿 𝜃 𝒂 ≔ 𝑓(𝑎1, … , 𝑎𝑛; 𝜃), called likelihood function, and

𝜃∗ = argmax𝜃

𝐿 𝜃 is called maximum likelihood estimator.

Page 12: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Ex. Maximum likelihood12

max. likelihood estimator of = argmax L()

𝐿 𝜆; 𝑎1, … , 𝑎𝑛 = e−𝜆𝜆𝑎1

𝑎1!e−𝜆

𝜆𝑎2

𝑎2!⋅⋅⋅ e−𝜆

𝜆𝑎𝑛

𝑎𝑛!

𝜕

𝜕𝜆𝐿 𝜆; 𝑎1, … , 𝑎𝑛 = ⋯

Ex. Poisson distribution

Let 𝑋1, … , 𝑋𝑛 be independent r.v.s according to Po(𝜆), and

𝑎1, … , 𝑎𝑛 are sample values.

Poisson distribution Po(𝜆)

𝑓 𝑥 = e−𝜆𝜆𝑥

𝑥!

Page 13: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Ex. Maximum likelihood 13

ത𝑎 is the maximum likelihood estimator of

Ex. Poisson distribution

Let 𝑋1, … , 𝑋𝑛 be independent r.v.s according to Po(𝜆), and

𝑎1, … , 𝑎𝑛 are sample values.

Poisson distribution Po(𝜆)

𝑓 𝑥 = e−𝜆𝜆𝑥

𝑥!

log 𝐿 𝜆 =

𝑖=1

𝑛

log 𝑓 𝑎𝑖; 𝜆 =

𝑖=1

𝑛

−𝜆 + 𝑎𝑖 log 𝜆 − log 𝑎𝑖!

= 𝑛 ത𝑎 log 𝜆 − 𝜆 −

𝑖=1

𝑛

log 𝑎𝑖!

𝜕

𝜕𝜆log 𝐿 𝜆 = 𝑛 ത𝑎

1

𝜆− 1

Page 14: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Statistical inference: maximum likelihood14

Example 2

The scores of examination.

How much ratio do they understand?

student 1 2 3 4 5 6 7 8 9 10

score 72 89 64 52 96 64 70 83 56 70

Let 𝑋 be a r.v. denoting scores,

Then 𝑋 ∼ N(𝜇, 𝜎2), i.e.,

𝑓 𝑥 =1

2𝜋𝜎exp −

𝑥 − 𝜇 2

2𝜎2

for unknown parameters 𝜇 and 𝜎.

Page 15: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Ex. Maximum likelihood 15

Ex. Normal distribution

Let 𝑋1, … , 𝑋𝑛 be independent r.v.s according to N 𝜇, 𝜎2 , and

𝑎1, … , 𝑎𝑛 are sample values.

ln 𝐿 𝜇, 𝜎; 𝑎 =

𝑖=1

𝑛

ln 𝑓 𝑎𝑖; 𝜇, 𝜎

=

𝑖=1

𝑛

ln1

2𝜋𝜎exp −

𝑎𝑖 − 𝜇 2

2𝜎2

=

𝑖=1

𝑛

−1

2ln 2𝜋 −

1

2ln 𝜎2 −

𝑎𝑖 − 𝜇 2

2𝜎2

= −𝑛

2ln 2𝜋 −

𝑛

2ln𝜎2 −

𝑖=1

𝑛𝑎𝑖 − 𝜇 2

2𝜎2

Page 16: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Ex. Maximum likelihood 16

ത𝑎 is the maximum likelihood

estimator of

Since = ത𝑎 maximize L(,)

(independent of ),

𝜎∗ 2 =σ 𝑎𝑖 − 𝑎

2

𝑛

Ex. Normal distribution

Let 𝑋1, … , 𝑋𝑛 be independent r.v.s according to N 𝜇, 𝜎2 , and

𝑎1, … , 𝑎𝑛 are sample values.

𝜕

𝜕𝜇ln 𝐿 𝜇, 𝜎; 𝑎 = −

𝜕

𝜕𝜇

σ𝑖=1𝑛 𝑎𝑖 − 𝜇 2

2𝜎2= −

σ𝑖=1𝑛 𝑎𝑖 − 𝜇

𝜎2

𝜕

𝜕𝜎ln 𝐿 𝜇, 𝜎; 𝑎

= −𝜕

𝜕𝜎

𝑛

2ln 𝜎2 −

𝜕

𝜕𝜎

σ𝑖=1𝑛 𝑎𝑖 − 𝜇 2

2𝜎2

= −𝑛

𝜎+σ𝑖=1𝑛 𝑎𝑖 − 𝜇 2

𝜎3

Page 17: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Statistics III

July 15, 2020

来嶋 秀治 (Shuji Kijima)

Dept. Informatics,

Graduate School of ISEE

Todays topics

• interval estimation (区間推定)

• hypothesis testing (仮説検定)

• t-test

• 2-test

確率統計特論 (Probability & Statistics)

Lesson 10

Page 18: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

1. Interval estimation

Statistical Inference (統計的推定)

point estimation (点推定)

consistent estimation (一致推定)

unbiased estimation (不偏推定)

maximum likelihood (最尤推定)

interval estimation (区間推定)

Page 19: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Statistical inference19

Example 1

A clerk says “our eggs are big. 70[g] in average.”

ത𝑋 = 66.3[g], s2 = 17.584[g2] for 6 eggs.

Suppose 2=18.0 for simplicity.

Let z* (>0) satisfy

Pr −𝑧∗ ≤𝑋 − 𝜇𝜎𝑛

≤ 𝑧∗ ≥ 0.95

Since central limit theorem,

Pr −𝑧∗ ≤𝑋 − 𝜇𝜎𝑛

≤ 𝑧∗ = න−𝑧∗

𝑧∗ 1

2𝜋𝜎exp −

1

2𝑥2 d𝑥

… and we see that z* = 1.960 (see normal distribution table).

“two-sided 95%

confidence interval”

両側95%信頼区間

Page 20: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Normal distribution20

Wikipedia: Standard normal table

http://en.wikipedia.org/wiki/Normal_distribution

Page 21: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Statistical inference21

Example 1

A clerk says “our eggs are big. 70[g] in average.”

ത𝑋 = 66.3[g], s2 = 17.584[g2] for 6 eggs.

Suppose 2=18.0 for simplicity.

ത𝑋 = 66.3[g]

𝑧∗ = 1.960

𝜎2 = 18.0

𝑛 = 6

Pr −𝑧∗ ≤𝑋 − 𝜇𝜎𝑛

≤ 𝑧∗ = Pr −𝑧∗𝜎

𝑛≤ 𝑋 − 𝜇 ≤ 𝑧∗

𝜎

𝑛

= Pr −𝑋 − 𝑧∗𝜎

𝑛≤ −𝜇 ≤ −𝑋 + 𝑧∗

𝜎

𝑛

= Pr 𝑋 + 𝑧∗𝜎

𝑛≥ 𝜇 ≥ 𝑋 − 𝑧∗

𝜎

𝑛

= Pr 66.3 + 1.96018

6≥ 𝜇 ≥ 66.3 − 1.960

18

6

= Pr 69.69 ≥ 𝜇 ≥ 62.91

Page 22: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

2. hypothesis testing (仮説検定)

Todays topics

• interval estimation (区間推定)

• hypothesis testing (仮説検定)

• t-test

• 2-test

Page 23: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Hypothesis testing (仮説検定)23

Terminology

• null hypothesis (帰無仮説)

• alternative hypothesis (対立仮説)

Idea

Pr[null hypo is true]

reject the null hypothesis with significant level

(有意水準で帰無仮説を棄却する)

Pr[null hypo is true]

fail to reject the null hypothesis with significant level

(有意水準で帰無仮説を棄却しない)

Page 24: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Statistical inference24

Example 1

A clerk says “our eggs are big. 70[g] in average.”

You bought 6 eggs in a shop.

How large are eggs sold in this shop?

ത𝑋 = 66.3[g], s2 = 17.584[g2]

Is the clerk honest?

1 2 3 4 5 6

weight[g] 64.3 70.4 63.2 67.8 71.3 60.8

Page 25: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Pr −𝑧∗ ≤𝑋 − 𝜇𝜎𝑛

≤ 𝑧∗ = Pr −𝑧∗𝜎

𝑛≤ 𝑋 − 𝜇 ≤ 𝑧∗

𝜎

𝑛

= Pr 𝜇 − 𝑧∗𝜎

𝑛≤ 𝑋 ≤ 𝜇 + 𝑧∗

𝜎

𝑛

= Pr 70 − 1.96018

6≤ 𝑋 ≤ 70 + 1.960

18

6

= Pr 66.6 ≤ 𝑋 ≤ 73.4

Statistical inference25

Example 1

A clerk says “our eggs are big. 70[g] in average.”

ത𝑋 = 66.3[g], s2 = 17.584[g2] for 6 eggs.

Let assume = 70.0 Suppose 2=18.0 for simplicity.

It rejects the null hypothesis = 70.0 with significant level 5%

(帰無仮説 = 70.0 は有意水準5%で棄却される.)

𝜇 = 70

𝑧∗ = 1.960

𝜎2 = 18.0

𝑛 = 6

ത𝑋 = 66.3[g]

Page 26: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Statistics IV

July 22, 2020

来嶋 秀治 (Shuji Kijima)

Dept. Informatics,

Graduate School of ISEE

Todays topics

• linear regression (線形回帰)

•単回帰

• 重回帰

•自己回帰

• モデル選択 AIC

確率統計特論 (Probability & Statistics)

Lesson 11

Page 27: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Ex. Advertisement27

Question

How does 𝑦 increase, as 𝑥 increasing?

year 1 2 3 4 5 6 7 8

𝑥: ad. cost 8 11 13 10 15 19 17 20

𝑦: sale amount 115 124 138 120 151 186 169 193

0

50

100

150

200

250

0 5 10 15 20 25

系列1

Page 28: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Least Square Estimator28

Question

How does 𝑦 increase, as 𝑥 increasing?

Linear regression (線形回帰)

Suppose 𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝑒𝑖 where 𝑒𝑖 ∼ N(0, 𝜎2).

Estimate 𝛼 and 𝛽 such that

min

𝑖=1

𝑛

𝑦𝑖 − 𝛼 + 𝛽𝑥𝑖2

year 1 2 3 4 5 6 7 8

𝑥: ad. cost 8 11 13 10 15 19 17 20

𝑦: sale amount 115 124 138 120 151 186 169 193

Page 29: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Least Square Estimator29

Linear regression (線形回帰)

Suppose 𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝑒𝑖 where 𝑒𝑖 ∼ N(0, 𝜎2).

Estimate 𝛼 and 𝛽 such that minσ𝑖=1𝑛 𝑦𝑖 − 𝛼 + 𝛽𝑥𝑖

2

𝜕

𝜕𝛼𝑔 𝛼, 𝛽 =

𝑖=1

𝑛

−2(𝑦𝑖 − (𝛼 + 𝛽𝑥𝑖))

𝜕

𝜕𝛽𝑔 𝛼, 𝛽 =

𝑖=1

𝑛

(−2𝑥𝑖)(𝑦𝑖 − (𝛼 + 𝛽𝑥𝑖))

𝛼 + 𝛽 ҧ𝑥 = ത𝑦

𝛼 ҧ𝑥 + 𝛽𝑥2 = 𝑥𝑦𝜕

𝜕𝛽𝑔 𝛼, 𝛽 = 0

𝜕

𝜕𝛼𝑔 𝛼, 𝛽 = 0

መ𝛽 =𝑥𝑦 − ҧ𝑥 ⋅ ത𝑦

𝑥2 − ҧ𝑥2

ො𝛼 = ത𝑦 − መ𝛽 ҧ𝑥

Prop.

E መ𝛽 =E 𝑠𝑥𝑦

𝑠𝑥2 = 𝛽

E ො𝛼 = E 𝑦 − መ𝛽𝑥2 = E 𝑦 − E መ𝛽 𝑥 = 𝛼

are unbiased estimators.

Page 30: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Statistics V

July 29, 2020

来嶋 秀治 (Shuji Kijima)

Dept. Informatics,

Graduate School of ISEE

Todays topics

• Bayes estimation

• MAP estimation

確率統計特論 (Probability & Statistics)

Lesson 12

Page 31: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Bayesian inference (for discrete )31

Thm. (Bayes; ベイズの定理)

Let 𝑋 ∼ 𝑓(𝑥; 𝜃) where 𝜃 is an unknown parameter(s).

Now suppose we obtain sample 𝑋 = 𝑧,

then

𝑤′(𝜃 ∣ 𝑧) =𝑤 𝜃 ⋅ 𝑓(𝑧 ∣ 𝜃)

σ𝜃′∈Θ𝑤 𝜃′ ⋅ 𝑓(𝑧 ∣ 𝜃′)

where

𝑤(𝜃) is prior probability distribution (事前分布) of 𝜃,

𝑤′ 𝜃 𝑧 is posterior probability distribution (事後分布) of 𝜃.

Rem. 𝑤′ 𝜃 𝑥 ∝ 𝑤 𝜃 ⋅ 𝐿 𝜃 𝑧

𝐿 𝜃 𝑧 ≔ 𝑓 𝑧 𝜃 : likelihood function

Page 32: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Bayesian inference (for continuous )32

Thm. (Bayes; ベイズの定理)

Let 𝑋 ∼ 𝑓(𝑥; 𝜃) where 𝜃 is an unknown parameter(s).

Now suppose we obtain sample 𝑋 = 𝑧,

then

𝑤′(𝜃 ∣ 𝑧) =𝑤 𝜃 ⋅ 𝑓(𝑧 ∣ 𝜃)

Θ𝑤 𝜃′ ⋅ 𝑓 𝑧 𝜃′ d𝜃′

where

𝑤(𝜃) is prior probability distribution (事前分布) of 𝜃,

𝑤′ 𝜃 𝑧 is posterior probability distribution (事後分布) of 𝜃.

Rem. 𝑤′ 𝜃 𝑥 ∝ 𝑤 𝜃 ⋅ 𝐿 𝜃 𝑧

𝐿 𝜃 𝑧 ≔ 𝑓 𝑧 𝜃 : likelihood function

Page 33: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Conjugate Prior33

Distribution Conjugate Prior

Binomial Beta

Poisson Gamma

Normal Normal

Multinomial Dirichlet

Page 34: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Ex. Bayesian inference34

Let 𝑋 ∼ Po 𝜆 where 𝜆 > 0 is an unknown parameter,

but we know prior prob. of 𝜆 such that 𝜆 ∼ Ga(𝛼, 𝜈).

Now suppose we obtain sample 𝑋 = 𝑘.

Q. Compute posterior prob. 𝑤′ 𝜆 𝑘 . 𝑓 𝑥 =𝛼𝜈

Γ 𝜈𝑥𝜈−1 exp −𝛼𝑥

𝑤′ 𝜆 𝑘 ∝ 𝑤 𝜆 ⋅ 𝐿 𝜆 𝑘 = 𝑤 𝜆 ⋅ 𝑓 𝑧 𝜆

=𝛼 𝜈

Γ 𝜈𝜆𝜈−1 exp −𝛼𝜆

𝜆𝑘

𝑘!exp(−𝜆)

∝ 𝜆𝜈+𝑘−1 exp − 𝛼 + 1 𝜆

∝𝛼 + 1 𝜈+𝑘

Γ 𝜈 + 𝑘exp − 𝛼 + 1 𝜆

hence Ga 𝛼 + 1, 𝜈 + 𝑘conjugate distribution

Page 35: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

MAP estimation

Maximum a posterior estimator

Page 36: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

MAP estimation36

A MAP (maximum a posteriori) estimator is given by

𝜃∗ = argmax𝜃

𝑤′ 𝜃 𝑧

meaning that 𝜃∗ maximizes the posterior probability/density.

Note.

Bayes estimator usually refers the posterior distribution,

while MAP estimator is the 𝜃∗ maximizing the posterior.

Page 37: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Maximum likelihood estimator and MAP estimator37

Rem. posterior density

𝑤′(𝜃 ∣ 𝑧) =𝑤 𝜃 ⋅ 𝑓(𝑧 ∣ 𝜃)

σ𝜃′∈Θ𝑤 𝜃′ ⋅ 𝑓(𝑧 ∣ 𝜃′)

where 𝑤 𝜃 is the a prior density.

Prop.

A posterior density

𝑤′ 𝜃 𝑧 ∝ 𝑤 𝜃 ⋅ 𝐿(𝜃 ∣ 𝑧)

where 𝐿 𝜃 𝑧 = 𝑓(𝑧 ∣ 𝜃) is the likelihood function.

Roughly speaking,

A maximum likelihood estimator maximizes the likelihood function,

that is an artificial concept highly related to but not exactly the same

as the joint probability.

A MAP estimator maximizes the posterior probability, artificially

assuming a prior distribution.

Page 38: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Further Topics

Page 39: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

39

Probability Theory Probability Space Distribution, Expectation, Variance Stochastic ineq., Law of large numbers, central limit them.

Statistics Estimation Test Regression

Optimization

Experimental Design

Machine Learning

Probability Theory

Information Theory

Calculous Linear Algebra

Computer Science• Programming• Algorithm

Data Science

Data mining

Pattern Recognition

Page 40: Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5, 2020 来嶋秀治(Shuji Kijima) Dept. Informatics, Graduate School of ISEE 確率統計

Further topics40

Probability Inequalities

Stochastic Process

Markov process

Brownian motion/stochastic diff. eq.

Martingale

Ergodic theory

Multivariate Statistics

Principal component analysis (主成分分析)

Machine Learning

SVM

NMF

Deep Learning / neural network

Data Mining