STAT22200 Spring 2014 Chapter 6 - University of...

STAT22200 Spring 2014 Chapter 6

Yibi Huang

May 1, 2014May 6, 2014

Chapter 6 Model Assumption Checking and Remedies

Chapter 6 - 1

THREE ASSUMPTIONS We Need to Check

In the means model yij = µi + εij or the effects model

yij = µ+ τi + εij ,

we make 3 assumptions about the error term εij ’s.:

1. the errors εij are independent, randomly distributed

2. the errors εij have constant variance across treatments

3. the errors εij follow a normal distribution

As the 3 assumptions are all related to errors εij , most of themodel diagnostic methods are based on the residuals

residual eij = yij − yij = yij − y i•.

Chapter 6 - 2

Standardized and Studentized Residuals

Because εij has SD σ, we sometimes standardize the residuals

standardized residual dij =eij√MSE

.

By the normality assumption, dij is approximately N(0, 1).Observations with |dij | > 3 are potential outliers.

If the design is not balanced, a more accurate standardization is the

studentized residual sij =eij√

MSE(1− 1ni

).

We divided by√

MSE(1− 1ni

) because eij = yij − y i• has SD

σ√

1− 1ni. Again, if the errors are normal, sij is approximately

N(0, 1). Observations with |sij | > 3 are potential outliers.

Chapter 6 - 3

Studentized Residuals In a Regression Model (May Skip)

In general, in a regression model Y = Xβ + ε, the residual is

e = Y − X β = Y − X (XTX )−1XTY = (I − H)Y ,

in which H = X (XTX )−1XT is called the hat matrix. One canshow that the variance-covariance matrix of the residuals is

Var(e) = σ2(I − H)

Thus Var(ei ) = σ2(1− hii ), in which hii is the ith diagonalelement of the hat matrix H, called the leverage. Thus thestudentized residuals are defined as

si =ei√

MSE (1− hii )

In a complete randomized design, hii = 1/ni .

Chapter 6 - 4

Example: Resin Glue Failure Time

In previous lectures, the responseof the resin glue experiment islog10(lifetime).In the original data, the responseis simply the life time of glue inhours

●

●

●

●

●

●

●●

●●

●

●

●●●

●

●●

●●●●●●

●●●●●●●

●

●●●●●

020

6010

0

Temperature (Celsius)

Life

(H

our)

175 194 213 231 250

Temperature Life time in Hours175◦C 110.0 82.2 99.1 82.9 71.3 91.7 76.0 79.2194◦C 45.8 51.3 26.5 58.0 45.3 40.8 35.8 45.6213◦C 33.8 34.8 24.2 20.5 22.5 18.8 18.2 24.2231◦C 14.2 16.7 14.8 14.6 16.2 18.9 14.8250◦C 18.0 6.7 12.0 10.5 12.2 11.4

Chapter 6 - 5

Residuals

> mydata = read.table("resinlife.txt", header=T)

> attach(mydata)

> aov1 = aov(life ~ as.factor(temp))

> aov1$res # get residuals, output omitted

> round(aov1$res,2) # round residuals to 2 decimals

1 2 3 4 5 6 7 8

23.45 -4.35 12.55 -3.65 -15.25 5.15 -10.55 -7.35

9 10 11 12 13 14 15 16

2.16 7.66 -17.14 14.36 1.66 -2.84 -7.84 1.96

17 18 19 20 21 22 23 24

9.17 10.17 -0.42 -4.12 -2.12 -5.82 -6.43 -0.42

25 26 27 28 29 30 31 32

-1.54 0.96 -0.94 -1.14 0.46 3.16 -0.94 6.20

33 34 35 36 37

-5.10 0.20 -1.30 0.40 -0.40

We see the first observation has the largest residual 23.45.Is it an outlier?

Chapter 6 - 6

Studentized ResidualsIn R, studres() is the command to get the studentized residuals.

> library(MASS) # must load library "MASS" to studres()

> ?studres # see help file of studres()

> studres(aov1) # studentized residual

1 2 3 4 5

3.55341908 -0.55843235 1.67395045 -0.46787374 -2.07936247

(... some output is omitted ...)

36 37

0.05235783 -0.05235783

> round(studres(aov1),2) # round studentized residuals to 2 decimals

1 2 3 4 5 6 7 8 9 10

3.55 -0.56 1.67 -0.47 -2.08 0.66 -1.39 -0.95 0.28 0.99

11 12 13 14 15 16 17 18 19 20

-2.38 1.94 0.21 -0.36 -1.02 0.25 1.20 1.34 -0.05 -0.53

21 22 23 24 25 26 27 28 29 30

-0.27 -0.75 -0.83 -0.05 -0.20 0.12 -0.12 -0.15 0.06 0.41

31 32 33 34 35 36 37

-0.12 0.82 -0.67 0.03 -0.17 0.05 -0.05

The first observation with sij = 3.55 > 3 is a potential outlier.

Chapter 6 - 7

3.4.1 Methods for Checking the Normality Assumption

I Histogram of the residuals, if normal, should be bell-shaped

I Pros: simple, intuitiveI Cons: for a small sample, histogram may not be

bell-shaped even though the sample is from a normaldistribution

I Normal probability plot of the residuals

I aka. normal QQ plot, in which QQ stands for“quantile-quantile”

I the best method to assess normalityI See next slide for details

Chapter 6 - 8

How to Make a Normal Probability Plot?

1. Given data (y1, y2, . . . , yn), standardize: xi = (yi − y)/s.

2. Arrange the data in increasing order: x(1), x(2), . . . , x(n).

3. Find quantiles of the N(0, 1) distribution: z( 1n+1

), z( 2n+1

),. . . , z( n

n+1).

That is, z( in+1

) is a value such that P(Z ≤ z( in+1

)) = in+1 for

Z ∼ N(0, 1).

4. Plot the x(i) values against the z( in+1

) values.

That is, plot the points (z( in+1

), x(i)) for i = 1, 2, . . . , n

Chapter 6 - 9

Interpreting Normal Probability Plots

I If the data are approximately normal, the plot will be close toa straight line.

I Systematic deviations from a straight line indicate anon-normal distribution.

I Outliers appear as points that are far away from the overallpattern of the plot.

I R command: qqnorm()

Chapter 6 - 10

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

−2 −1 0 1 2

−3

−2

−1

01

2

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Normal

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●●

●

●●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

−2 −1 0 1 2

01

23

45

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

Skewed

●●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●●

●

●●●

●●

●

●●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

● ●●

●

●

●

●●

●

●

●●

●

●

●

●

●●

● ●

●

●

●

●●

●

●

●●

●●●

●● ●

●

●

●

●

●●

●

−2 −1 0 1 2

−5

05

10

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

Heavy tailed

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

● ●

●

●●

●

●

●

●

●●

●

●

●

●

●

−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

Light tailed

Chapter 6 - 11

QQ plot for the Resin Glue Residuals

> qqnorm(lmorig$res)

> qqline(lmorig$res)

> hist(lmorig$res,xlab="Residuals",main="Histogram of Residuals")

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●●

●●

●●●

●

●

●

●

●

●●

●●

−2 −1 0 1 2

−10

010

20

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

Histogram of Residuals

Residuals

Fre

quen

cy−20 −10 0 10 20

02

46

810

Do the residuals look normal?

Chapter 6 - 12

Example: Hodgkin’s Disease

Hodgkin’s disease is a type of lymphoma, which is a canceroriginating from white blood cells called lymphocytes. The dataset Hodgkins.txt contains plasma bradykininogen levels (inmicrograms of bradykininogen per milliliter of plasma) in 3 types ofsubjects

I normal,

I in patients with active Hodgkin’s disease, and

I in patients with inactive Hodgkin’s disease.

The globulin bradykininogen is the precursor substance forbradykinin, which is thought to be a chemical mediator ofinflammation.

I Is this an experiment?

I We can use ANOVA to compare means of several samples inan observational study.

Chapter 6 - 13

> mydata = read.table("Hodgkins.txt", header=T)

> attach(mydata)

> plot(Hodgkins,BradyLevel)

> plot(as.integer(Hodgkins), BradyLevel, pch=20, cex=0.75)

●

●

Active Inactive Normal

24

68

1014

●●

●

●

●

●

●●●

●

●●

●

●

●

●●●

●

●

●

●

●

●

●

●●●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

●●

●●●

●

●

●

●

●

●

1.0 1.5 2.0 2.5 3.0

24

68

1014

as.numeric(Hodgkins)

Bra

dyLe

vel

The distribution within each group looks right skewed. Let’s fit theANOVA model anyway and take a look at the residuals.

> brady1 = lm(BradyLevel ~ Hodgkins)

> anova(brady1)

Df Sum Sq Mean Sq F value Pr(>F)

Hodgkins 2 65.89 32.95 10.67 0.000104 ***

Residuals 62 191.45 3.09

Chapter 6 - 14

Normal QQ Plot for The Hodgkin Data

> qqnorm(brady1$res, ylab="Residuals")

> qqline(brady1$res)

> library(MASS)

> qqnorm(studres(brady1), ylab="Studentized Residuals")

> qqline(studres(brady1))

●●

●

●

●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●●

●

●

●

●●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

●●

●

●●

●

●

●

●

●

●

−2 −1 0 1 2

−2

02

46

Normal Q−Q Plot


Res

idua

ls

●●

●

●

●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

●●

●●

●

●

●

●

●

●

●

−2 −1 0 1 2−

10

12

34

5

Normal Q−Q Plot


Stu

dent

ized

Res

idua

ls

Does the distribution of the residuals look normal?No, it’s right-skewed, and has a outlier with sij > 5.

Chapter 6 - 15

Remedies for Non-Normality

Asymmetry can often be ameliorated by transforming theresponse — often a power transformation.

fλ(y) =

{yλ, if λ 6= 0

log(y), if λ = 0.

1. If right-skewned, try taking square root, logarithm, or otherpowers < 1

y −→ log(y),√y , or yλ with λ < 1

2. If left-skewned, try squaring, cubing, or other powers > 1

y −→ y2, y3, or yλ with λ > 1

Chapter 6 - 16

Example: Hodgkin’s Disease – QQ Plots

●●

●

●

●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

●●

●●

●

●

●

●

●

●

●

−2 −1 0 1 2

−1

01

23

45

Original Scale


Stu

dent

ized

Res

idua

ls After a log transformation, theresiduals are less skewed, and theoutlier looks less extreme.

The square-root transformationalso improves normality but not asmuch as the log transformation.

●●

●

●

●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

●●

●

●●

●

●

●

●

●

●

−2 −1 0 1 2

−2

01

23

4

Square−root Transform


Stu

dent

ized

Res

idua

ls

●●

●

●

●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

●●

●

●●

●

●

●

●

●

●

−2 −1 0 1 2

−2

−1

01

23

Log Transform


Stu

dent

ized

Res

idua

ls

Chapter 6 - 17

R-Codes for Making the Plots on the Previous Slide

> brady1 = lm(BradyLevel ~ Hodgkins)

> brady2 = lm(sqrt(BradyLevel) ~ Hodgkins)

> brady3 = lm(log(BradyLevel) ~ Hodgkins)

> library(MASS)

> qqnorm(studres(brady1), main="Original Scale")


> qqnorm(studres(brady2), main="Square-root Transform")


> qqnorm(studres(brady3), main="Log Transform")


Chapter 6 - 18

Box-Cox MethodBox-Cox method is an automatic procedure to select the “best”power λ that make the residuals of the model

yλij = µi + εij

closest to normal.

I We usually round the optimal λ to a convenient power like

−1, −1

2, −1

3, 0,

1

3,

1

2, 1, 2, . . .

since the practical difference of y0.5827 and y0.5 is usuallysmall, but the square-root transformation is much easier tointerpret.

I A confidence interval for the optimal λ can also be obtained.See Oehlert, p.129 for details.We usually select a convenient power λ∗ in this C.I.

Chapter 6 - 19

Example: Hodgkin’s Disease – Box-Cox

In R, one must first load thelibrary MASS to use boxcox().The argument of the boxcox()

function can be a model formula,or a lm model, or an aov model.

> library(MASS)

> boxcox(brady1)

> boxcox(BradyLevel ~ Hodgkins)−2 −1 0 1 2

−80

−70

−60

−50

λ

log−

Like

lihoo

d

95%

The middle dash line marks the optimal λ, the right and left dashline mark the 95% C.I. for the optimal λ.

For the plot, we see the optimal λ is around −0.2, and the 95%C.I. contains 0. For simplicity, we use the log-transformedBradyLevel as our response.

Chapter 6 - 20

A Remark On the Log-Scale

In fact, for concentration, log scale more commonly used than theoriginal scale. For example,

I A concentration of 10.1 and 10.001 are almost the same, buta concentration of 0.1 and 0.001 are very different since 0.1 is100 times higher than 0.001.

I In the original scale, (10.1, 10.001) and (0.1, 0.001) differ bythe same amount, 0.099.

I In log scale,

log10 0.1− log10 0.001 = 2 much greater than

log10 10.1− log10 10.001 ≈ 0.0043

Chapter 6 - 21

Example: Hodgkin’s Disease

From the 2 ANOVA tables below, we see a log transformationmakes the difference between the 3 group of patients moresignificant because the outlier become less extreme and not inflatethe MSE as much.

Response: BradyLevel


Hodgkins 2 65.893 32.946 10.67 0.0001042 ***

Residuals 62 191.449 3.088

Response: log(BradyLevel)


Hodgkins 2 2.2526 1.12631 15.436 3.628e-06 ***

Residuals 62 4.5238 0.07297

Chapter 6 - 22

Non-Parametric Tests

I Transformation does not always work. For example, it helpslittle for symmetric but heavy-tailed (many outliers)distributions.

I The ANOVA F -test is robust to non-normality, but it is notresistent to outliers.

I If outliers are unavoidable, and cannot be removed, trynon-parametric tests (like permutation test, Kruskal-Wallistest (Section 3.11.1)) that doesn’t rely on normalityassumption.

I There might be a lecture specifically on non-parametric testsif we have time.

Chapter 6 - 23

Constant Variance Assumption

I Why Is Non-Constant Variance a Problem?I Tools for checking constant variance assumption — Residual

PlotsI Residuals v.s. Fitted ValuesI Residuals v.s. TreatmentsI Residuals v.s. Other Variables

I RemediesI Transforming the Response,

Variance-Stabilizing TransformationI Brown-Forsythe Modified F -test — an alternative to ANOVA

F -testI Welch Test for Contrasts w/o Constant Variance Assumption

Chapter 6 - 24

Why Is Non-Constant Variance a Problem?Example — Serial Dilution Plating (Exercise 6.2 on p.143). . . . . . . . . . . . a method to count the number of bacteria in solution

Chapter 6 - 25

Example — Serial Dilution Plating (Cont’d)

I Goal: to compare 3 pasteurization methods for milk

I Design: 15 samples of milk randomly assigned to the 3 trts

I Response: the bacterial load in each sample after treatment,determined via serial dilution plating

I Data: http://users.stat.umn.edu/~gary/book/fcdae.data/ex6.2

Method 1 Method 2 Method 326× 102 35× 103 29× 105

29× 102 23× 103 23× 105

20× 102 20× 103 17× 105

22× 102 30× 103 29× 105

32× 102 27× 103 20× 105

Mean 25.8× 102 27× 103 23.6× 105

SD 492 5874 536656

√MSE = 309900

Chapter 6 - 26

Example — Serial Dilution Plating (Cont’d)

95%C.I. for the mean of Method 1:

y1• ± t0.025,15−3

√MSE√n1

= 2580± 2.179309900√

5

= 2580± 301965

= (−299385, 304545)

which is way larger than the range of 5 observations for method 1(2000-3200). What happened?

Chapter 6 - 27

Checking Constant Variance Assumption (1)

Tests for equality of variance are available, but not useful, because

1. the tests do not tell us how much nonconstant variance ispresent and how much it affects our inferences.

2. classical tests of constant variance are very sensitive tononnormality in the errors.It is very difficult to tell non-normality from normality withnon-constant variance.)

Chapter 6 - 28

Checking Constant Variance Assumption (2)

Better tool for checking constant variance — Residual Plots

I Residuals v.s. Fitted Values

I Residuals v.s. Treatments

I Residuals v.s. Other Variables

If the constant variance assumption is true, residuals will evenlyspread around the zero line.

Rule of thumb: ANOVA F -tests for CRD can toleratenon-constant variance to some extent, so do tests for contrasts.It usually fine as long as

max{σi}min{σi}

≤ 2 or even 3,

especially when the group sizes ni are (roughly) equal.

Chapter 6 - 29

Example: Resin Glue Data> plot(lmorig$fit,lmorig$res,ylab="Residuals",xlab="Fitted Values")

> abline(h=0) # adding a zero line

> plot(temp,lmorig$res,ylab="Residuals",xlab="Centigrade Temperature")

> abline(h=0) # adding a zero line

Residuals v.s. Fitted Values Residuals v.s. Treatments●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●

●●

●●●●●●●

●

●

●

●●●●

20 40 60 80

−10

010

20

Fitted Values

Res

idua

ls

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●

●●

● ●●●●●●

●

●

●

●●●●

180 200 220 240

−10

010

20Centigrade Temperature

Res

idua

ls

I Why the points make vertical strips?

I The spread of residuals increases with fitted valuesI The lower the temperature, the more variable the response.

Chapter 6 - 30

More on the Residuals PlotsResiduals plots can be used to check many other things, likenon-linearity.

In the resin glue example, if one fit the a linear modelyij = β0 + β1T + εij . One can check whether lifetime is indeedlinear in temperature using a residual plot

> lmorig1 = lm(life ~ temp)

> plot(temp,lmorig1$res,ylab="Residuals",xlab="Centigrade Temperature")

> abline(h=0)

●

●

●

●

●

●

●●

●

●

●

●

●●●

●

●●

●●●●●

●

●●●●●●●

●

●●●●●

−30

−10

1030

Centigrade Temperature

Res

idua

ls

175 194 213 231 250

From the residual plot, we see

I non-constant variance acrosstemperature

I lifetime is curved, not linearwith temperature

Chapter 6 - 31

Plot of Residuals v.s. Other Variables

If there are other variables that might possibly affect the response,but not included in the model, then one should check the residualplots with these variables. For example,

I if experimental units come from different batches, thenresiduals should be plotted v.s. batches

I if measurements are made by several operators, then residualsshould be plotted v.s. operators

Patterns in such residual plots suggest these variables should be

I either included in the analysis(but note one CANNOT claim these variables have acausal-effect on the response, since they are not controlled inadvance)

I or controlled more carefully (e.g., by a block design) in futureexperiments

Chapter 6 - 32

Remedy 1: Variance-Stabilizing TransformationIf the SD σ (the spread of residuals) depends on the mean µ (thefitted value), you can try a variance-stabilizing transformation ofthe response to make the variance (close to) constant.

I if the SD ∝ the fitted value, then transform the response y tolog(y),

I e.g., the serial dilution plating example

I if the SD ∝√

the fitted value, i.e.,

Variance ∝ fitted value,

theny → √y

I In general, if σ ∝ µα, then the variance-stabilizingtransformation is

y →

{y1−α for α 6= 1

log(y) for α = 1

Chapter 6 - 33

Example: Resin Glue

> lm1 = lm(life ~ as.factor(temp))

> lm2 = lm(sqrt(life) ~ as.factor(temp))

> lm3 = lm(log(life) ~ as.factor(temp))

> plot(lm1$fit, lm1$res, xlab="Fitted Value", ylab="Residuals")

> abline(h=0,lty=2)


> abline(h=0,lty=2)


> abline(h=0,lty=2)

Original Scale Square-root Transform Log Transform●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●●

●●●●●●●

●

●

●

●●●●

20 40 60 80

−10

010

20

Fitted Value

Res

idua

ls

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●●

4 5 6 7 8 9

−1.

5−

0.5

0.5

Fitted Value

Res

idua

ls

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

2.5 3.0 3.5 4.0 4.5

−0.

40.

00.

20.

4

Fitted ValueR

esid

uals

Chapter 6 - 34

Box-Cox Again

In many cases, we don’t have a good idea what is the value of α.We can still try power transformation of the response.

fλ(y) =

{yλ, if λ 6= 0

log(y), if λ = 0.

How to select λ?

I Trial and error: try convenient power like −1,−1/2,−1/3, 0,1/3, 1/2, 2, . . . and then check residual plots for each of themfor the constant variance.

I Box-Cox method:Though Box-Cox is developed to select a power transformationmaking the residuals as normal as possible, it’s been shownthat the optimal λ is often close to the variance-stabilizing λ.

Chapter 6 - 35

Example – Count of Bacteria Revisit

> library(MASS)

> boxcox(count ~ as.factor(method))

> plot(method, log10(count),

xlab="Methods",

ylab="log10(Count of Bacteria)")

−2 −1 0 1 2

−10

0−

60−

200

λ

log−

Like

lihoo

d

95%

●●●●●

●●●●●

●●●

●●

1.0 1.5 2.0 2.5 3.0

3.5

4.5

5.5

6.5

Methods

log1

0(C

ount

of B

acte

ria)

After log transformation, the 3 variances look even.

Chapter 6 - 36

Example: Resin Glue – Box-Cox

> library(MASS)

> boxcox(life ~ as.factor(temp))

−2 −1 0 1 2

−70

−50

−30

−10

λ

log−

Like

lihoo

d

95%

The 95% C.I. for λ contains both 0 and 1/2. As λ = 1/2 is veryclose to the boundary of the C.I, λ = 0 seems to be a betterchoice, which is consistent with the Arrhenius Law.

Chapter 6 - 37

Drawbacks of Transformation

I Except for a few special transformation (log,√

, reciprocal),the transformed response usually lacks natural interpretation(what does y0.1 mean?)

I Unless having a good interpretation on the transformedresponse, think again before making transformations

Remember that ANOVA tests have some tolerance fornon-constant variance. If

max{σi}min{σi}

≤ 2 or 3,

don’t worry too much about non-constant variance.In that case, it is fine to leave the response untransformed.

Chapter 6 - 38

> tapply(life, temp, sd)

175 194 213 231 250

12.895514 9.557785 6.378703 1.661181 3.646917

> tapply(sqrt(life), temp, sd)

175 194 213 231 250

0.6802037 0.7505140 0.6223047 0.2048698 0.5305554

> tapply(log(life), temp, sd)

175 194 213 231 250

0.1440872 0.2393055 0.2451210 0.1012540 0.3177677

After the log transformation, the ratio of the largest and smallestSD is 2.2, which is acceptable

Chapter 6 - 39

Brown-Forsythe Modified F -test

If transformation doesn’t work well, or you don’t want to transformfor lacking a good interpretation, try tests that don’t rely on theconstant variance assumption:

I Brown-Forsythe modified F -test — an alternative ofANOVA F -test:

BF =

∑gi=1 ni (y i• − y••)2∑gi=1 s

2i (1− ni/N)

=SSTrt∑g

i=1 s2i (1− ni/N)

in which s2i is the sample variance in treatment i . Under thenull hypothesis of equal treatment means, BF is approximatelydistributed as an F -distribution with g − 1 and ν degrees offreedom, where

ν =(∑g

i=1 di )2∑g

i=1 d2i /(ni − 1)

in which di = s2i (1− ni/N).

Chapter 6 - 40

Welch Test for Contrasts w/o Constant VarianceAssumption

If transformation doesn’t work well, try tests that doesn’t rely onthe constant variance assumption:

I Welch test for a contrast∑g

i=1 ciµi : The test statistic is

t =

∑gi=1 ciy i•√∑gi=1 c

2i s

2i /ni

which is approximately a t-distribution with ν degrees offreedom, where

ν =

(∑gi=1 c

2i s

2i /ni

)2∑gi=1

1ni−1

c4i s4i

n2i

Observe that this is a generalization of the two-sample testwithout equal variance assumption.

Chapter 6 - 41

Checking for Dependent Errors and Remedies

Chapter 6 - 42

Checking for Dependent Errors

I Among the 3 assumptions, violation of the independenceassumption causes severest problem. Most of our the analysis(ANOVA, test of contrasts, multiple comparisons, etc.) havelittle tolerance on dependence of errors

I There are various forms of dependence, serial dependence andspatial dependence are two common ones

I A time plot of the residuals (v.s. the order we measure them)is a good way to check for serial dependence.

I It’s better to keep track of the order units are measured.I A smooth time-plot is a sign of positive serial

dependence, since a smooth time plot means successiveresiduals are too close together

Chapter 6 - 43

Autocorrelation and Autocorrelation Plots

Autocorrelation plots are also common ways to check for serialdependence.

I Lag 1 autocorrelation plot: plotting (X1, . . . ,Xn−1) v.s.(X2, . . . ,Xn)

I Lag k autocorrelation plot: plotting (X1, . . . ,Xn−k) v.s.(X1+k , . . . ,Xn)

I Any trend in the autocorrelation plot is a sign of serialdependence.

A numerical way to to check for serial dependence, is to computethe Lag-k autocorrelation coefficient, which is the correlationcoefficient of (X1, . . . ,Xn−k) v.s. (X1+k , . . . ,Xn), k = 1, 2, 3, . . . .

Chapter 6 - 44

Spatial Dependence

Spatial dependence can arise when the experimental units arearranged in space, like plants in a farm. Spatial dependence occurswhen units that are closer together are more similar than unitsfarther apart.

Chapter 6 - 45

Remedies for Dependence

1. There isn’t much we can do about dependence using ourcurrent machinery, since no simple transformation can removedependence.

2. Analysis of dependent data requires tools like time series orspatial statistics, which is beyond the scope of this class

Chapter 6 - 46

Example: Standard GravityThe National Bureau of Standards performed eight series ofexperiments between 1924 and 1935 to determine g , the standardgravity.

The data are given in the table below (in deviations from9.8m/s2 × 105 so that, for example, the first measurement of g is9.80076 m/s2), with series 1 representing the earliest set ofexperiments and series 8 the last.

Series Measurements

1 76 82 83 54 35 46 87 682 87 95 98 100 109 109 100 81 75 68 673 105 83 76 75 51 76 93 75 624 95 90 76 76 87 79 77 715 76 76 78 79 72 68 75 786 78 78 78 86 87 81 73 67 75 82 837 82 79 81 79 77 79 79 78 79 82 76 73 648 84 86 85 82 77 76 77 80 83 81 78 78 78

Chapter 6 - 47

Example: Standard Gravity

The National Bureau of Standards1 is the government agency thatmeasures things. The following statement is taken from the NISTwebsite:

Founded in 1901, NIST is a non-regulatory federalagency within the U.S. Department of Commerce. NISTmission is to promote U.S. innovation and industrialcompetitiveness by advancing measurement science,standards, and technology in ways that enhanceeconomic security and improve our quality of life.

Thus, it is safe to assume that the NBS scientists were trying hardto measure the same quantity g (e.g., all experiments were done inthe same location) throughout all eight series of experiments.

1now called the National Institute of Standards and TechnologyChapter 6 - 48

> gee = read.table("g.txt", header=T)

> attach(gee)

> plot(series,g)

Variance decreases with series,which makes sense since theaccuracy of measurementimproves as time goes by.

●

●●

●

●

●

●

●

●

●●●

●●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●●

●

●●

●

●●●●

●●

●● ●●●

●●

●

●

●

●

●● ●●●●●●●●●●

●●

●

●●●●

●●●●●●●●●

1 2 3 4 5 6 7 8

4060

8010

0

series

Sta

ndar

d G

ravi

ty g

(m

/s^2

)

What happens if we ignore this fact, and do an ANOVA F -test?

> lmg = lm(g ~ as.factor(series), data=gee)

> anova(lmg)


as.factor(series) 7 2819 402.7 3.568 0.00236 **

Residuals 73 8239 112.9

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

ANOVA rejects the null hypothesis that the 8 series have the samemean.

Chapter 6 - 49

Question

What is implication if the null hypothesis is rejected? Will youconclude that

(a) g had changed over time, or

(b) the ANOVA F -test failed?

If your choice is (a), how do you explain the change of g?If your choice is (b), why the ANOVA F -test fails?

The ANOVA F -test here is not reliable because at least of itsassumptions are not met — constant variance.

> round(tapply(g,series,sd),2)

1 2 3 4 5 6 7 8

19.25 15.29 15.76 8.30 3.65 5.84 4.74 3.36

Will a transformation work here? Try Box-Cox?

Chapter 6 - 50

Brown-Forsythe Modified F -testSince the variances are not constant, let’s try the Brown-Forsythemodified F -test

BF =SSTrt∑g

i=1 s2i (1− ni/N)

The numerator SSTrt = 2819 can be found in the ANOVA tableabove. The denominator is found using R (see the codes below) tobe 888.5747

> sds = tapply(g,series,sd)

> nis = tapply(g,series,length)

> di = sds^2*(1-nis/length(g))

> BFbottom = sum(di)

> BFbottom

[1] 888.5747

> BF = 2819/BFbottom

The BF -statistic is thus

BF =2819

888.5747= 3.1725

Chapter 6 - 51

Under the null hypothesis of equal treatment means, BF isapproximately distributed as an F -distribution with g − 1 and νdegrees of freedom, where

ν =(∑g

i=1 di )2∑g

i=1 d2i /(ni − 1)

in which di = s2i (1− ni/N)

which is calculated as 29.46249 in the R code below.

> nu = (BFbottom)^2/sum(di^2/(nis-1))

> nu

[1] 29.46249

> 1 - pf(BF, df1 = 7, df2 = nu) # P-value of the BF-test

[1] 0.01263341

However, the BF-test, not relying on the constant varianceassumption, also rejects the null hypothesis of equal mean at aP-value 0.0126. How can this be true?

Chapter 6 - 52

Maybe earlier measurements are less reliable, what if we excludethe first series?

> lmg2 = lm(g ~ as.factor(series), subset = (series > 1))

> anova(lmg2)


as.factor(series) 6 1428.6 238.092 2.7835 0.01787 *

Residuals 66 5645.5 85.538

The Brown-Forsythe Modified F -test gives BF = 2.616 andP-value = 0.038.

What if we remove the first and second series?

> lmg2 = lm(g ~ as.factor(series), subset = (series>2))

> anova(lmg2)


as.factor(series) 5 222.8 44.553 0.7545 0.5863

Residuals 56 3306.6 59.046

The Brown-Forsythe Modified F -test gives BF = 0.658 andP-value = 0.659.

Can we conclude series 3-8 have a common mean, but not series 1and 2?

Chapter 6 - 53

The observations in each series are in fact given in time ordertaken. We can thus make a time plot.

0 20 40 60 80

4060

8010

0

Index

Sta

ndar

d G

ravi

ty g

(m

/s^2

)

●

● ●

●

●

●

●

●

●

●●

●

● ●

●

●

●

● ●

●

●

● ●

●

●

●

●

●

●

●

● ●

●

●●

●

● ●● ●

●

●

●● ● ● ●

● ●

●

●

●

●

● ● ●●

●●

●● ● ● ●

●

●●

●

●● ●

●

● ● ●●

●●

● ● ●

The vertical lines separate measurements of difference series.

We can see a lot of measurements are close to the previousmeasurements, which indicates a positive serial correlation.

It’s not surprising that scientists in NIST might unconsciouslymatch their results with the previous measurement, since theprevious measurement is often regarded as the most accurate onetill then.

Chapter 6 - 54

> plot(g[2:81],g[1:80], pch=20,ylab = "g",xlab="g (Lag = 1)")

> cor(g[2:81],g[1:80])

[1] 0.5002454

> plot(g[3:81],g[1:79], pch=20,ylab = "g",xlab="g (Lag = 2)")

> cor(g[3:81],g[1:79])

[1] 0.1232966

●

●●

●

●

●

●

●

●

●●

●

●●

●

●

●

● ●

●

●

●●

●

●

●

●

●

●

●

● ●

●

●●

●

●●●●

●●

●●●● ●

●●

●

●

●

●

●●●●●●

●●●●●

●

●●

●

●●●●

●●●●

●●●●

40 60 80 100

4060

8010

0

g[2:81]

g[1:

80]

●

●●

●

●

●

●

●

●

●●●

●●

●

●

●

●●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●●

●

●●●●

●●

●●● ●●

●●

●

●

●

●

●●●●●●●●●●●

●

●●

●

●●●●

●●●●

●●●

40 60 80 10040

6080

100

g[3:81]

g[1:

79]

Lag 1 autocorrelation = 0.5002 Lag 2 autocorrelation = 0.1233

Chapter 6 - 55

Effect of Dependent Errors

With positive serial correlation, the mean of SSTrt is higher than(8− 1)σ2, which makes F -statistic larger and easier to reject thenull.

Chapter 6 - 56

We cannot check the independence assumption since we don’thave other information. If the data are collected in the order theyare recorded, we can check the autocorrelation. The time plotbelow shows no clear pattern.

●●

●

●

●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

●

●

●●

●

●●

●

●

●

●

●

●

0 10 20 30 40 50 60

−0.

6−

0.2

0.2

0.6

Index

Res

idua

ls

The lag-1 autocorrelation coefficient is −0.113, which meansbasically, no correlation.

> cor(brady3$res[-65],brady3$res[-1])

[1] -0.1132061

If there are other variables available, one should plot the residualsagainst each of these variables.

Chapter 6 - 57

Coming Up Next...

Chapter 6 - 58

STAT22200 Spring 2014 Chapter 6 - University of...

Documents

Transcript of STAT22200 Spring 2014 Chapter 6 - University of...