Analysis of Variance Comparisons among multiple populations.
-
date post
21-Dec-2015 -
Category
Documents
-
view
224 -
download
0
Transcript of Analysis of Variance Comparisons among multiple populations.
Analysis of Variance
Comparisons among multiple populations
More than two populations
Group A B C Dobservations
XA,1
XA,2
XA,3
XA,4…
XA,na
XB,1
XB,2
XB,3
…XB,nb
XC,1
XC,2
XC,3
XC,4
XC,5 …
XC,nc
XD,1
XD,2
XD,3
XD,4
…XD,nd
H0: μA=μB=μC=μD
Discu
ss in th
e C
hapte
r 8
H0: μA=μB H0: μB=μC H0: μC=μD
Basic assumptions and the hypothesis testing logic
The observed data are normally distributed with the same variance (although unknown) σ2.
Derive two estimators for σ2
The first is always valid whether the hypothesis, H0: μA=μB=μC=μD, is true or not.
The second one is usually greater than the real parameter σ2 when H0: μA=μB=μC=μD is not true.
Compare these two estimators (2nd/1st) through the sample.
If the ratio is too large, then reject H0
ANOVA testing
ANalysis Of VAriance To test the considered hypothesis by analy
zing the variance σ2 Search the proper estimators Decomposition of variance …
E.g., Decomposition of Syy, Syy=SSr+SSm
_
Y
_
Y
_
Y
X
Y
Yi(Yi- )
(Yi- )
( - )
如果離差越大 , 表示 Y^ 不太可能是水平線 ,因為若是水平線 , 則差的平方和將會很小
如果離差越小 , 表示迴歸線越接近真實值 , 預測得越準確 !Y 的離差 , 因
給定 sample之後固定不變
_
Y
^
Y
^
Y
^
Y
Review x2 distributionX1, X2, X3,…Xn are independent random variables
~N(0,1)( )
~
~ N(0,1)
21nx
2nx
~
One-way ANOVA approach (i)
# of groups
# of obs. In each group
Total d.f =# of group × obs. in ea
ch group
The ith group mean
Replace the group mean by sample mean
2mnmx ~
Total d.f =# of group ×(
obs. in each group-1)
.
Equal group size, n
One-way ANOVA approach (ii)
By definitionCalled as “within samples sum of squares”
SSw/σ2 ~
∴
∵
By definition Called as “between samples sum of squares”
Group sample mean Total sample mean
)1,0(~ N
2mnmx
∵
∴
Replaced total mean by total sample mean X.. d.f.=# of group -1
Var(Xi.)=Var(Xij)/n
i,e., assume all Xi. population means are equal to μ in order to replace μ by total sample mean X..
SSb/σ2 ~2
1mx
One-way ANOVA approach (iii)
)(),1(~ mnmmF
i.e., the numerator is sufficiently large, while the denominator is smaller
=TS
Or reject H0 when <α
Decomposition of Var(Xij) (i)
Xij
A group B group C group
Total mean X..
A group mean
B group mean XB.
C group mean
Total deviation
Between deviationWithin deviation
If the group difference is smaller, the deviation from the center should be caused by the within randomness.
Decomposition of Var(Xij) (ii)
..1 1
2
1 1
2..)( nmXXXX
m
i
n
jij
m
i
n
jij
In usual, define SST as the total sum of squares
m
i
n
jiiij XXXX
1 1
2.... )(=
If H0 is not accepted?
Xi.~N(μi, σ2/n)Yi~N(μ., σ2/n)Set
∴
&
∵Xi.=Yi+μi-μ. X..=Y.
Within deviation
=E[Yi]-E[Y.] =μ.-μ.=0
ANOVA table
∴
SST=
If p-value<α
The meaning of ANOVA table
Source
Sum of error squares
Degree of freedom
Mean of SS F
Between
SSb m-1(# of group -1)
MSb=SSb/(m-1) MSb/MSw
Within SSw nm-m=m(n-1)
MSw=SSw/(nm-m)
total SST nm-1(# of observation -1)
SST=SSb+SSw, 如果 SSb 越大, SSw 將越小,則在不變的組數 m 之下 , MSb 將越大 ,MSw 越小 , 於是 F 值就越大,越可能 reject H0: 各組平均值無差異。也就是說觀察的變數 Xij 與 X 之總平均數的差異,大部份肇因於 Xi. 類別平均數之間的差異。
Unbalanced case—unequal sample size within the groups
Different group size ni
conditional estimator of σ2
unconditional estimator of σ2
Unbalanced F-test for ANOVA
A balanced design is suitable over an unbalanced one because of the insensitivity to slight departures from the assumption of equal population variances.
Two classification factors
A B C Da Xa,A,1
Xa,A,2,…,Xa,A,k
Xa,B,1
Xa,B,2,…,Xa,B,k
Xa,C,1
Xa,C,2,…,Xa,C,k
Xa,D,1
Xa,D,2,…,Xa,D,k
bXb,A,1
Xb,A,2…,Xb,A,k
Xb,B,1
Xb,B,2…, Xb,B,k
Xb,C,1 Xb,C,2…,
Xb,C,k
Xb,D,1, Xb,D,2…,
Xb,Dk
CXc,A,1, Xc,A,2,…,
Xc,A,k
Xc,B,1, Xc,B,2,…,
Xc,B,k
Xc,C,1, Xc,C,2,…,
Xc,C,k
Xc,D,1, Xc,D,2,…,
Xc,D,k
Column factor
Row
fa
ctor
Two-way ANOVA approach (i)
m types
n types
Review
(αi=μi-μ, the deviation from total μ)
m
i
m
iii
m
ii m
1 11
0)( (∵ )
Only one observation within each cell
Two-way ANOVA approach (ii)
The cell mean of size k or other
Supposed an additive model for cell mean, composed by ai and bj
The ith row mean The jth
column mean
The total mean
Average row factor
Average column factor
The ith row mean=the average column factor+ the specific ith row factor
jj
Deviation from average row factor,column factor
&
Two-way ANOVA approach (iii)
Two-way ANOVA approach (iv)
i.e., The expected value of specific ij cell could be decomposed into: Total mean+ the ith deviation from average row factor (the ith row deviation from the total mean)+ the jth deviation from average column factor (the jth column deviation from total mean)
&
Use the unbiased estimators to test the objective hypothesis
The assumed two-way ANOVA model
^^^
,, ji
Two-way ANOVA approach (v)
2~ nmx
Apply each unbiased estimator
?
2)1)(1(~ mnx
Reduced n-1 d.f.Reduced m-1 d.f.Reduced 1 d.f.
Two-way ANOVA approach (vi)
If is true
then
2~ mx
21~ mxDefine
the row sum of squares
2
Two way ANOVA table
=m
Two-way ANOVA with interaction (i)
Two-way ANOVA with interaction (ii)
Two-way ANOVA with interaction (iii)
2)1(~ lnmx
2)1(~ lnmx
2~ nmlx
Two-way ANOVA with interaction (iv)
2~ nmx
2)1)(1(~ mnx
Two-way ANOVA with interaction (v)
Two-way ANOVA with interaction (vi)
21~ mx
2~ mx
Homework #3 Problem 5, 15, 19, 20, 25