Analysis of Variance Comparisons among multiple populations.

Analysis of Variance

Comparisons among multiple populations

More than two populations

Group A B C Dobservations

XA,1

XA,2

XA,3

XA,4…

XA,na

XB,1

XB,2

XB,3

…XB,nb

XC,1

XC,2

XC,3

XC,4

XC,5 …

XC,nc

XD,1

XD,2

XD,3

XD,4

…XD,nd

H0: μA=μB=μC=μD

Discu

ss in th

e C

hapte

r 8

H0: μA=μB H0: μB=μC H0: μC=μD

Basic assumptions and the hypothesis testing logic

The observed data are normally distributed with the same variance (although unknown) σ2.

Derive two estimators for σ2

The first is always valid whether the hypothesis, H0: μA=μB=μC=μD, is true or not.

The second one is usually greater than the real parameter σ2 when H0: μA=μB=μC=μD is not true.

Compare these two estimators (2nd/1st) through the sample.

If the ratio is too large, then reject H0

ANOVA testing

ANalysis Of VAriance To test the considered hypothesis by analy

zing the variance σ2 Search the proper estimators Decomposition of variance …

E.g., Decomposition of Syy, Syy=SSr+SSm

_

Y

_

Y

_

Y

X

Y

Yi(Yi- )

(Yi- )

( - )

如果離差越大 , 表示 Y^ 不太可能是水平線 ,因為若是水平線 , 則差的平方和將會很小

如果離差越小 , 表示迴歸線越接近真實值 , 預測得越準確 !Y 的離差 , 因

給定 sample之後固定不變

_

Y

^

Y

^

Y

^

Y

Review x2 distributionX1, X2, X3,…Xn are independent random variables

~N(0,1)( )

～

～ N(0,1)

21nx

2nx

～

One-way ANOVA approach (i)

# of groups

# of obs. In each group

Total d.f =# of group × obs. in ea

ch group

The ith group mean

Replace the group mean by sample mean

2mnmx ～

Total d.f =# of group ×(

obs. in each group-1)

.

Equal group size, n

One-way ANOVA approach (ii)

By definitionCalled as “within samples sum of squares”

SSw/σ2 ～

∴

∵

By definition Called as “between samples sum of squares”

Group sample mean Total sample mean

)1,0(~ N

2mnmx

∵

∴

Replaced total mean by total sample mean X.. d.f.=# of group -1

Var(Xi.)=Var(Xij)/n

i,e., assume all Xi. population means are equal to μ in order to replace μ by total sample mean X..

SSb/σ2 ～2

1mx

One-way ANOVA approach (iii)

)(),1(~ mnmmF

i.e., the numerator is sufficiently large, while the denominator is smaller

=TS

Or reject H0 when <α

Decomposition of Var(Xij) (i)

Xij

A group B group C group

Total mean X..

A group mean

B group mean XB.

C group mean

Total deviation

Between deviationWithin deviation

If the group difference is smaller, the deviation from the center should be caused by the within randomness.

Decomposition of Var(Xij) (ii)

..1 1

2

1 1

2..)( nmXXXX

m

i

n

jij

m

i

n

jij

In usual, define SST as the total sum of squares

m

i

n

jiiij XXXX

1 1

2.... )(=

If H0 is not accepted?

Xi.~N(μi, σ2/n)Yi~N(μ., σ2/n)Set

∴

&

∵Xi.=Yi+μi-μ. X..=Y.

Within deviation

=E[Yi]-E[Y.] =μ.-μ.=0

ANOVA table

∴

SST=

If p-value<α

The meaning of ANOVA table

Source

Sum of error squares

Degree of freedom

Mean of SS F

Between

SSb m-1(# of group -1)

MSb=SSb/(m-1) MSb/MSw

Within SSw nm-m=m(n-1)

MSw=SSw/(nm-m)

total SST nm-1(# of observation -1)

SST=SSb+SSw, 如果 SSb 越大， SSw 將越小，則在不變的組數 m 之下 , MSb 將越大 ,MSw 越小 , 於是 F 值就越大，越可能 reject H0: 各組平均值無差異。也就是說觀察的變數 Xij 與 X 之總平均數的差異，大部份肇因於 Xi. 類別平均數之間的差異。

Unbalanced case—unequal sample size within the groups

Different group size ni

conditional estimator of σ2

unconditional estimator of σ2

Unbalanced F-test for ANOVA

A balanced design is suitable over an unbalanced one because of the insensitivity to slight departures from the assumption of equal population variances.

Two classification factors

A B C Da Xa,A,1

Xa,A,2,…,Xa,A,k

Xa,B,1

Xa,B,2,…,Xa,B,k

Xa,C,1

Xa,C,2,…,Xa,C,k

Xa,D,1

Xa,D,2,…,Xa,D,k

bXb,A,1

Xb,A,2…,Xb,A,k

Xb,B,1

Xb,B,2…, Xb,B,k

Xb,C,1 Xb,C,2…,

Xb,C,k

Xb,D,1, Xb,D,2…,

Xb,Dk

CXc,A,1, Xc,A,2,…,

Xc,A,k

Xc,B,1, Xc,B,2,…,

Xc,B,k

Xc,C,1, Xc,C,2,…,

Xc,C,k

Xc,D,1, Xc,D,2,…,

Xc,D,k

Column factor

Row

fa

ctor

Two-way ANOVA approach (i)

m types

n types

Review

(αi=μi-μ, the deviation from total μ)

m

i

m

iii

m

ii m

1 11

0)( (∵ )

Only one observation within each cell

Two-way ANOVA approach (ii)

The cell mean of size k or other

Supposed an additive model for cell mean, composed by ai and bj

The ith row mean The jth

column mean

The total mean

Average row factor

Average column factor

The ith row mean=the average column factor+ the specific ith row factor

jj

Deviation from average row factor,column factor

&

Two-way ANOVA approach (iii)

Two-way ANOVA approach (iv)

i.e., The expected value of specific ij cell could be decomposed into: Total mean+ the ith deviation from average row factor (the ith row deviation from the total mean)+ the jth deviation from average column factor (the jth column deviation from total mean)

&

Use the unbiased estimators to test the objective hypothesis

The assumed two-way ANOVA model

^^^

,, ji

Two-way ANOVA approach (v)

2~ nmx

Apply each unbiased estimator

?

2)1)(1(~ mnx

Reduced n-1 d.f.Reduced m-1 d.f.Reduced 1 d.f.

Two-way ANOVA approach (vi)

If is true

then

2~ mx

21~ mxDefine

the row sum of squares

2

Two way ANOVA table

=m

Two-way ANOVA with interaction (i)

Two-way ANOVA with interaction (ii)

Two-way ANOVA with interaction (iii)

2)1(~ lnmx

2)1(~ lnmx

2~ nmlx

Two-way ANOVA with interaction (iv)

2~ nmx

2)1)(1(~ mnx

Two-way ANOVA with interaction (v)

Two-way ANOVA with interaction (vi)

21~ mx

2~ mx

Homework #3 Problem 5, 15, 19, 20, 25

Analysis of Variance Comparisons among multiple populations.

Documents

Transcript of Analysis of Variance Comparisons among multiple populations.