õ ½ ì|e B C · Author: kijima Created Date: 2/25/2020 3:18:34 PM
Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5,...
Transcript of Statistics ---Summarytcs.inf.kyushu-u.ac.jp/~kijima/GPS20/GPS20-13.pdfStatistics ---Summary Aug. 5,...
Statistics ---Summary
Aug. 5, 2020
来嶋 秀治 (Shuji Kijima)
Dept. Informatics,
Graduate School of ISEE
確率統計特論 (Probability & Statistics)
Lesson 13
2
Final exam (期末試験)
Date/time: August 12 (8/12), 13:00- 14:30
Place (場所): at moodle.
Submit electronic files (incl. photo: recommended). ≤10MB.
Keep your “original data” (I may ask to submit them later).
電子ファイルを提出 (写真可: 推奨).10MB以内.
紙/データを手元に保存しておくこと
(後日提出を求める場合がある).
Topics (範囲):
Probability and Statistics.
check the course page (講義ページを参照のこと)
http://tcs.inf.kyushu-u.ac.jp/~kijima/
Books, notes, google, etc. are allowed to use (持ち込み可).
Communication (e-mail, SNS, BBS) is prohibited (相談不可).
Statistics I
July 1, 2020
来嶋 秀治 (Shuji Kijima)
Dept. Informatics,
Graduate School of ISEE
Todays topics
• estimating population mean
• estimating population variance
• consistent estimator (一致推定量)
• unbiased estimator (不偏推定量)
確率統計特論 (Probability & Statistics)
Lesson 8
Statistics Inference (統計的推論)
Estimation (推定) 8,9,12th
Statistical test (統計検定) 10th
Regression (回帰) 11th
Applications
Machine learning (機械学習),
Pattern recognition (パターン認識),
Data mining (データマイニング), etc.
Statistics / Data science4
Population, sample, stochastic model5
Example 1
We sample 6 accounts of twixxer at random. The following table
shows the numbers of followers.
1 2 3 4 5 6
#followers 372 623 89 781 3219 152
Q. How large is the population mean of followers?
Suppose that the number of followers follows some
distribution (e.g., Ex 𝜆 ) with expectation 𝜇.
=> Sample mean ത𝑋 =𝑋1+⋯+𝑋𝑛
𝑛= 872. 7
population (母集団)
sample (標本)
stochastic model (確率モデル)
Sample mean6
Proposition
ത𝑋 =𝑋1+⋯+𝑋𝑛
𝑛is a consistent estimator of 𝜇.
sample mean
Proof.
By the law of large numbers.
Proposition
ത𝑋 =𝑋1+⋯+𝑋𝑛
𝑛is an unbiased estimator of 𝜇.
Definition
𝑇(𝑋) is an unbiased estimator of 𝑔 𝜃
if 𝐸𝜃 𝑇 𝑋 − 𝑔 𝜃 = 0 holds.
Population, sample, stochastic model7
Example 1
We sample 6 accounts of twixxer at random. The following table
shows the numbers of followers.
1 2 3 4 5 6
#followers 372 623 89 781 3219 152
Q. How large is the population variance of #followers?
Suppose that the number of followers follows some
distribution (e.g., Ex 𝜆 ) with expectation 𝜇 and variance 𝜎2
Recall Var 𝑋 ≔ E 𝑋 − 𝜇 2
population (母集団)
sample (標本)
stochastic model (確率モデル)
Consistency of a sample variance8
Proposition
σ𝑖=1𝑛 𝑋𝑖− ത𝑋 2
𝑛−1is a consistent estimator of 2 (if Var 𝑋 − 𝐸 𝑋 2 < ∞)
Proposition
σ𝑖=1𝑛 𝑋𝑖− ത𝑋 2
𝑛−1is an unbiased estimator of 2 (if Var 𝑋 − 𝐸 𝑋 2 < ∞)
Proposition
σ𝑖=1𝑛 𝑋𝑖− ത𝑋 2
𝑛is NOT an unbiased estimator of 2 (in general)
Statistics II
July 8, 2020
来嶋 秀治 (Shuji Kijima)
Dept. Informatics,
Graduate School of ISEE
Todays topics
• maximum likelihood (最尤推定)
確率統計特論 (Probability & Statistics)
Lesson 9
Statistical inference: maximum likelihood10
Example 1
The number of defective products per 10,000 products.
How often do detectives appear?
lot 1 2 3 4 5 6 7 8 9 10
#defective 0 2 0 0 1 1 0 3 1 0
Let 𝑋 be a r.v. denoting #defectives,
Then 𝑋 ∼ Po(𝜆), i.e.,
Pr 𝑋 = 𝑥 =: 𝑓 𝑥 = e−𝜆𝜆𝑥
𝑥!
for unknown parameter 𝜆.
Maximum likelihood 11
Preparation
Let 𝑋1, … , 𝑋𝑛 be i.i.d. with density function 𝑓(𝑥; 𝜃), and
let 𝑓 𝑥1, . . , 𝑥𝑛; 𝜃 ≔ 𝑓 𝑥1; 𝜃 ⋯𝑓(𝑥𝑛; 𝜃),
i.e., density function of a joint distribution of 𝑋1, … , 𝑋𝑛 .
: parameters
e.g. N(,), E()
remark
X1,…,Xn are independent
Maximum likelihood estimation
Given sample values 𝑋1 = 𝑎1, …, 𝑋𝑛 = 𝑎𝑛,
let 𝐿 𝜃 𝒂 ≔ 𝑓(𝑎1, … , 𝑎𝑛; 𝜃), called likelihood function, and
𝜃∗ = argmax𝜃
𝐿 𝜃 is called maximum likelihood estimator.
Ex. Maximum likelihood12
max. likelihood estimator of = argmax L()
𝐿 𝜆; 𝑎1, … , 𝑎𝑛 = e−𝜆𝜆𝑎1
𝑎1!e−𝜆
𝜆𝑎2
𝑎2!⋅⋅⋅ e−𝜆
𝜆𝑎𝑛
𝑎𝑛!
𝜕
𝜕𝜆𝐿 𝜆; 𝑎1, … , 𝑎𝑛 = ⋯
Ex. Poisson distribution
Let 𝑋1, … , 𝑋𝑛 be independent r.v.s according to Po(𝜆), and
𝑎1, … , 𝑎𝑛 are sample values.
Poisson distribution Po(𝜆)
𝑓 𝑥 = e−𝜆𝜆𝑥
𝑥!
Ex. Maximum likelihood 13
ത𝑎 is the maximum likelihood estimator of
Ex. Poisson distribution
Let 𝑋1, … , 𝑋𝑛 be independent r.v.s according to Po(𝜆), and
𝑎1, … , 𝑎𝑛 are sample values.
Poisson distribution Po(𝜆)
𝑓 𝑥 = e−𝜆𝜆𝑥
𝑥!
log 𝐿 𝜆 =
𝑖=1
𝑛
log 𝑓 𝑎𝑖; 𝜆 =
𝑖=1
𝑛
−𝜆 + 𝑎𝑖 log 𝜆 − log 𝑎𝑖!
= 𝑛 ത𝑎 log 𝜆 − 𝜆 −
𝑖=1
𝑛
log 𝑎𝑖!
𝜕
𝜕𝜆log 𝐿 𝜆 = 𝑛 ത𝑎
1
𝜆− 1
Statistical inference: maximum likelihood14
Example 2
The scores of examination.
How much ratio do they understand?
student 1 2 3 4 5 6 7 8 9 10
score 72 89 64 52 96 64 70 83 56 70
Let 𝑋 be a r.v. denoting scores,
Then 𝑋 ∼ N(𝜇, 𝜎2), i.e.,
𝑓 𝑥 =1
2𝜋𝜎exp −
𝑥 − 𝜇 2
2𝜎2
for unknown parameters 𝜇 and 𝜎.
Ex. Maximum likelihood 15
Ex. Normal distribution
Let 𝑋1, … , 𝑋𝑛 be independent r.v.s according to N 𝜇, 𝜎2 , and
𝑎1, … , 𝑎𝑛 are sample values.
ln 𝐿 𝜇, 𝜎; 𝑎 =
𝑖=1
𝑛
ln 𝑓 𝑎𝑖; 𝜇, 𝜎
=
𝑖=1
𝑛
ln1
2𝜋𝜎exp −
𝑎𝑖 − 𝜇 2
2𝜎2
=
𝑖=1
𝑛
−1
2ln 2𝜋 −
1
2ln 𝜎2 −
𝑎𝑖 − 𝜇 2
2𝜎2
= −𝑛
2ln 2𝜋 −
𝑛
2ln𝜎2 −
𝑖=1
𝑛𝑎𝑖 − 𝜇 2
2𝜎2
Ex. Maximum likelihood 16
ത𝑎 is the maximum likelihood
estimator of
Since = ത𝑎 maximize L(,)
(independent of ),
𝜎∗ 2 =σ 𝑎𝑖 − 𝑎
2
𝑛
Ex. Normal distribution
Let 𝑋1, … , 𝑋𝑛 be independent r.v.s according to N 𝜇, 𝜎2 , and
𝑎1, … , 𝑎𝑛 are sample values.
𝜕
𝜕𝜇ln 𝐿 𝜇, 𝜎; 𝑎 = −
𝜕
𝜕𝜇
σ𝑖=1𝑛 𝑎𝑖 − 𝜇 2
2𝜎2= −
σ𝑖=1𝑛 𝑎𝑖 − 𝜇
𝜎2
𝜕
𝜕𝜎ln 𝐿 𝜇, 𝜎; 𝑎
= −𝜕
𝜕𝜎
𝑛
2ln 𝜎2 −
𝜕
𝜕𝜎
σ𝑖=1𝑛 𝑎𝑖 − 𝜇 2
2𝜎2
= −𝑛
𝜎+σ𝑖=1𝑛 𝑎𝑖 − 𝜇 2
𝜎3
Statistics III
July 15, 2020
来嶋 秀治 (Shuji Kijima)
Dept. Informatics,
Graduate School of ISEE
Todays topics
• interval estimation (区間推定)
• hypothesis testing (仮説検定)
• t-test
• 2-test
確率統計特論 (Probability & Statistics)
Lesson 10
1. Interval estimation
Statistical Inference (統計的推定)
point estimation (点推定)
consistent estimation (一致推定)
unbiased estimation (不偏推定)
maximum likelihood (最尤推定)
interval estimation (区間推定)
Statistical inference19
Example 1
A clerk says “our eggs are big. 70[g] in average.”
ത𝑋 = 66.3[g], s2 = 17.584[g2] for 6 eggs.
Suppose 2=18.0 for simplicity.
Let z* (>0) satisfy
Pr −𝑧∗ ≤𝑋 − 𝜇𝜎𝑛
≤ 𝑧∗ ≥ 0.95
Since central limit theorem,
Pr −𝑧∗ ≤𝑋 − 𝜇𝜎𝑛
≤ 𝑧∗ = න−𝑧∗
𝑧∗ 1
2𝜋𝜎exp −
1
2𝑥2 d𝑥
… and we see that z* = 1.960 (see normal distribution table).
“two-sided 95%
confidence interval”
両側95%信頼区間
Normal distribution20
Wikipedia: Standard normal table
http://en.wikipedia.org/wiki/Normal_distribution
Statistical inference21
Example 1
A clerk says “our eggs are big. 70[g] in average.”
ത𝑋 = 66.3[g], s2 = 17.584[g2] for 6 eggs.
Suppose 2=18.0 for simplicity.
ത𝑋 = 66.3[g]
𝑧∗ = 1.960
𝜎2 = 18.0
𝑛 = 6
Pr −𝑧∗ ≤𝑋 − 𝜇𝜎𝑛
≤ 𝑧∗ = Pr −𝑧∗𝜎
𝑛≤ 𝑋 − 𝜇 ≤ 𝑧∗
𝜎
𝑛
= Pr −𝑋 − 𝑧∗𝜎
𝑛≤ −𝜇 ≤ −𝑋 + 𝑧∗
𝜎
𝑛
= Pr 𝑋 + 𝑧∗𝜎
𝑛≥ 𝜇 ≥ 𝑋 − 𝑧∗
𝜎
𝑛
= Pr 66.3 + 1.96018
6≥ 𝜇 ≥ 66.3 − 1.960
18
6
= Pr 69.69 ≥ 𝜇 ≥ 62.91
2. hypothesis testing (仮説検定)
Todays topics
• interval estimation (区間推定)
• hypothesis testing (仮説検定)
• t-test
• 2-test
Hypothesis testing (仮説検定)23
Terminology
• null hypothesis (帰無仮説)
• alternative hypothesis (対立仮説)
Idea
Pr[null hypo is true]
reject the null hypothesis with significant level
(有意水準で帰無仮説を棄却する)
Pr[null hypo is true]
fail to reject the null hypothesis with significant level
(有意水準で帰無仮説を棄却しない)
Statistical inference24
Example 1
A clerk says “our eggs are big. 70[g] in average.”
You bought 6 eggs in a shop.
How large are eggs sold in this shop?
ത𝑋 = 66.3[g], s2 = 17.584[g2]
Is the clerk honest?
1 2 3 4 5 6
weight[g] 64.3 70.4 63.2 67.8 71.3 60.8
Pr −𝑧∗ ≤𝑋 − 𝜇𝜎𝑛
≤ 𝑧∗ = Pr −𝑧∗𝜎
𝑛≤ 𝑋 − 𝜇 ≤ 𝑧∗
𝜎
𝑛
= Pr 𝜇 − 𝑧∗𝜎
𝑛≤ 𝑋 ≤ 𝜇 + 𝑧∗
𝜎
𝑛
= Pr 70 − 1.96018
6≤ 𝑋 ≤ 70 + 1.960
18
6
= Pr 66.6 ≤ 𝑋 ≤ 73.4
Statistical inference25
Example 1
A clerk says “our eggs are big. 70[g] in average.”
ത𝑋 = 66.3[g], s2 = 17.584[g2] for 6 eggs.
Let assume = 70.0 Suppose 2=18.0 for simplicity.
It rejects the null hypothesis = 70.0 with significant level 5%
(帰無仮説 = 70.0 は有意水準5%で棄却される.)
𝜇 = 70
𝑧∗ = 1.960
𝜎2 = 18.0
𝑛 = 6
ത𝑋 = 66.3[g]
Statistics IV
July 22, 2020
来嶋 秀治 (Shuji Kijima)
Dept. Informatics,
Graduate School of ISEE
Todays topics
• linear regression (線形回帰)
•単回帰
• 重回帰
•自己回帰
• モデル選択 AIC
確率統計特論 (Probability & Statistics)
Lesson 11
Ex. Advertisement27
Question
How does 𝑦 increase, as 𝑥 increasing?
year 1 2 3 4 5 6 7 8
𝑥: ad. cost 8 11 13 10 15 19 17 20
𝑦: sale amount 115 124 138 120 151 186 169 193
0
50
100
150
200
250
0 5 10 15 20 25
系列1
Least Square Estimator28
Question
How does 𝑦 increase, as 𝑥 increasing?
Linear regression (線形回帰)
Suppose 𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝑒𝑖 where 𝑒𝑖 ∼ N(0, 𝜎2).
Estimate 𝛼 and 𝛽 such that
min
𝑖=1
𝑛
𝑦𝑖 − 𝛼 + 𝛽𝑥𝑖2
year 1 2 3 4 5 6 7 8
𝑥: ad. cost 8 11 13 10 15 19 17 20
𝑦: sale amount 115 124 138 120 151 186 169 193
Least Square Estimator29
Linear regression (線形回帰)
Suppose 𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝑒𝑖 where 𝑒𝑖 ∼ N(0, 𝜎2).
Estimate 𝛼 and 𝛽 such that minσ𝑖=1𝑛 𝑦𝑖 − 𝛼 + 𝛽𝑥𝑖
2
𝜕
𝜕𝛼𝑔 𝛼, 𝛽 =
𝑖=1
𝑛
−2(𝑦𝑖 − (𝛼 + 𝛽𝑥𝑖))
𝜕
𝜕𝛽𝑔 𝛼, 𝛽 =
𝑖=1
𝑛
(−2𝑥𝑖)(𝑦𝑖 − (𝛼 + 𝛽𝑥𝑖))
𝛼 + 𝛽 ҧ𝑥 = ത𝑦
𝛼 ҧ𝑥 + 𝛽𝑥2 = 𝑥𝑦𝜕
𝜕𝛽𝑔 𝛼, 𝛽 = 0
𝜕
𝜕𝛼𝑔 𝛼, 𝛽 = 0
መ𝛽 =𝑥𝑦 − ҧ𝑥 ⋅ ത𝑦
𝑥2 − ҧ𝑥2
ො𝛼 = ത𝑦 − መ𝛽 ҧ𝑥
Prop.
E መ𝛽 =E 𝑠𝑥𝑦
𝑠𝑥2 = 𝛽
E ො𝛼 = E 𝑦 − መ𝛽𝑥2 = E 𝑦 − E መ𝛽 𝑥 = 𝛼
are unbiased estimators.
Statistics V
July 29, 2020
来嶋 秀治 (Shuji Kijima)
Dept. Informatics,
Graduate School of ISEE
Todays topics
• Bayes estimation
• MAP estimation
確率統計特論 (Probability & Statistics)
Lesson 12
Bayesian inference (for discrete )31
Thm. (Bayes; ベイズの定理)
Let 𝑋 ∼ 𝑓(𝑥; 𝜃) where 𝜃 is an unknown parameter(s).
Now suppose we obtain sample 𝑋 = 𝑧,
then
𝑤′(𝜃 ∣ 𝑧) =𝑤 𝜃 ⋅ 𝑓(𝑧 ∣ 𝜃)
σ𝜃′∈Θ𝑤 𝜃′ ⋅ 𝑓(𝑧 ∣ 𝜃′)
where
𝑤(𝜃) is prior probability distribution (事前分布) of 𝜃,
𝑤′ 𝜃 𝑧 is posterior probability distribution (事後分布) of 𝜃.
Rem. 𝑤′ 𝜃 𝑥 ∝ 𝑤 𝜃 ⋅ 𝐿 𝜃 𝑧
𝐿 𝜃 𝑧 ≔ 𝑓 𝑧 𝜃 : likelihood function
Bayesian inference (for continuous )32
Thm. (Bayes; ベイズの定理)
Let 𝑋 ∼ 𝑓(𝑥; 𝜃) where 𝜃 is an unknown parameter(s).
Now suppose we obtain sample 𝑋 = 𝑧,
then
𝑤′(𝜃 ∣ 𝑧) =𝑤 𝜃 ⋅ 𝑓(𝑧 ∣ 𝜃)
Θ𝑤 𝜃′ ⋅ 𝑓 𝑧 𝜃′ d𝜃′
where
𝑤(𝜃) is prior probability distribution (事前分布) of 𝜃,
𝑤′ 𝜃 𝑧 is posterior probability distribution (事後分布) of 𝜃.
Rem. 𝑤′ 𝜃 𝑥 ∝ 𝑤 𝜃 ⋅ 𝐿 𝜃 𝑧
𝐿 𝜃 𝑧 ≔ 𝑓 𝑧 𝜃 : likelihood function
Conjugate Prior33
Distribution Conjugate Prior
Binomial Beta
Poisson Gamma
Normal Normal
Multinomial Dirichlet
Ex. Bayesian inference34
Let 𝑋 ∼ Po 𝜆 where 𝜆 > 0 is an unknown parameter,
but we know prior prob. of 𝜆 such that 𝜆 ∼ Ga(𝛼, 𝜈).
Now suppose we obtain sample 𝑋 = 𝑘.
Q. Compute posterior prob. 𝑤′ 𝜆 𝑘 . 𝑓 𝑥 =𝛼𝜈
Γ 𝜈𝑥𝜈−1 exp −𝛼𝑥
𝑤′ 𝜆 𝑘 ∝ 𝑤 𝜆 ⋅ 𝐿 𝜆 𝑘 = 𝑤 𝜆 ⋅ 𝑓 𝑧 𝜆
=𝛼 𝜈
Γ 𝜈𝜆𝜈−1 exp −𝛼𝜆
𝜆𝑘
𝑘!exp(−𝜆)
∝ 𝜆𝜈+𝑘−1 exp − 𝛼 + 1 𝜆
∝𝛼 + 1 𝜈+𝑘
Γ 𝜈 + 𝑘exp − 𝛼 + 1 𝜆
hence Ga 𝛼 + 1, 𝜈 + 𝑘conjugate distribution
MAP estimation
Maximum a posterior estimator
MAP estimation36
A MAP (maximum a posteriori) estimator is given by
𝜃∗ = argmax𝜃
𝑤′ 𝜃 𝑧
meaning that 𝜃∗ maximizes the posterior probability/density.
Note.
Bayes estimator usually refers the posterior distribution,
while MAP estimator is the 𝜃∗ maximizing the posterior.
Maximum likelihood estimator and MAP estimator37
Rem. posterior density
𝑤′(𝜃 ∣ 𝑧) =𝑤 𝜃 ⋅ 𝑓(𝑧 ∣ 𝜃)
σ𝜃′∈Θ𝑤 𝜃′ ⋅ 𝑓(𝑧 ∣ 𝜃′)
where 𝑤 𝜃 is the a prior density.
Prop.
A posterior density
𝑤′ 𝜃 𝑧 ∝ 𝑤 𝜃 ⋅ 𝐿(𝜃 ∣ 𝑧)
where 𝐿 𝜃 𝑧 = 𝑓(𝑧 ∣ 𝜃) is the likelihood function.
Roughly speaking,
A maximum likelihood estimator maximizes the likelihood function,
that is an artificial concept highly related to but not exactly the same
as the joint probability.
A MAP estimator maximizes the posterior probability, artificially
assuming a prior distribution.
Further Topics
39
Probability Theory Probability Space Distribution, Expectation, Variance Stochastic ineq., Law of large numbers, central limit them.
Statistics Estimation Test Regression
Optimization
Experimental Design
Machine Learning
Probability Theory
Information Theory
Calculous Linear Algebra
Computer Science• Programming• Algorithm
Data Science
Data mining
Pattern Recognition
Further topics40
Probability Inequalities
Stochastic Process
Markov process
Brownian motion/stochastic diff. eq.
Martingale
Ergodic theory
Multivariate Statistics
Principal component analysis (主成分分析)
Machine Learning
SVM
NMF
Deep Learning / neural network
Data Mining