STAT22200 Spring 2014 Chapter 6 - University of...
Transcript of STAT22200 Spring 2014 Chapter 6 - University of...
STAT22200 Spring 2014 Chapter 6
Yibi Huang
May 1, 2014May 6, 2014
Chapter 6 Model Assumption Checking and Remedies
Chapter 6 - 1
THREE ASSUMPTIONS We Need to Check
In the means model yij = µi + εij or the effects model
yij = µ+ τi + εij ,
we make 3 assumptions about the error term εij ’s.:
1. the errors εij are independent, randomly distributed
2. the errors εij have constant variance across treatments
3. the errors εij follow a normal distribution
As the 3 assumptions are all related to errors εij , most of themodel diagnostic methods are based on the residuals
residual eij = yij − yij = yij − y i•.
Chapter 6 - 2
Standardized and Studentized Residuals
Because εij has SD σ, we sometimes standardize the residuals
standardized residual dij =eij√MSE
.
By the normality assumption, dij is approximately N(0, 1).Observations with |dij | > 3 are potential outliers.
If the design is not balanced, a more accurate standardization is the
studentized residual sij =eij√
MSE(1− 1ni
).
We divided by√
MSE(1− 1ni
) because eij = yij − y i• has SD
σ√
1− 1ni. Again, if the errors are normal, sij is approximately
N(0, 1). Observations with |sij | > 3 are potential outliers.
Chapter 6 - 3
Studentized Residuals In a Regression Model (May Skip)
In general, in a regression model Y = Xβ + ε, the residual is
e = Y − X β = Y − X (XTX )−1XTY = (I − H)Y ,
in which H = X (XTX )−1XT is called the hat matrix. One canshow that the variance-covariance matrix of the residuals is
Var(e) = σ2(I − H)
Thus Var(ei ) = σ2(1− hii ), in which hii is the ith diagonalelement of the hat matrix H, called the leverage. Thus thestudentized residuals are defined as
si =ei√
MSE (1− hii )
In a complete randomized design, hii = 1/ni .
Chapter 6 - 4
Example: Resin Glue Failure Time
In previous lectures, the responseof the resin glue experiment islog10(lifetime).In the original data, the responseis simply the life time of glue inhours
●
●
●
●
●
●
●●
●●
●
●
●●●
●
●●
●●●●●●
●●●●●●●
●
●●●●●
020
6010
0
Temperature (Celsius)
Life
(H
our)
175 194 213 231 250
Temperature Life time in Hours175◦C 110.0 82.2 99.1 82.9 71.3 91.7 76.0 79.2194◦C 45.8 51.3 26.5 58.0 45.3 40.8 35.8 45.6213◦C 33.8 34.8 24.2 20.5 22.5 18.8 18.2 24.2231◦C 14.2 16.7 14.8 14.6 16.2 18.9 14.8250◦C 18.0 6.7 12.0 10.5 12.2 11.4
Chapter 6 - 5
Residuals
> mydata = read.table("resinlife.txt", header=T)
> attach(mydata)
> aov1 = aov(life ~ as.factor(temp))
> aov1$res # get residuals, output omitted
> round(aov1$res,2) # round residuals to 2 decimals
1 2 3 4 5 6 7 8
23.45 -4.35 12.55 -3.65 -15.25 5.15 -10.55 -7.35
9 10 11 12 13 14 15 16
2.16 7.66 -17.14 14.36 1.66 -2.84 -7.84 1.96
17 18 19 20 21 22 23 24
9.17 10.17 -0.42 -4.12 -2.12 -5.82 -6.43 -0.42
25 26 27 28 29 30 31 32
-1.54 0.96 -0.94 -1.14 0.46 3.16 -0.94 6.20
33 34 35 36 37
-5.10 0.20 -1.30 0.40 -0.40
We see the first observation has the largest residual 23.45.Is it an outlier?
Chapter 6 - 6
Studentized ResidualsIn R, studres() is the command to get the studentized residuals.
> library(MASS) # must load library "MASS" to studres()
> ?studres # see help file of studres()
> studres(aov1) # studentized residual
1 2 3 4 5
3.55341908 -0.55843235 1.67395045 -0.46787374 -2.07936247
(... some output is omitted ...)
36 37
0.05235783 -0.05235783
> round(studres(aov1),2) # round studentized residuals to 2 decimals
1 2 3 4 5 6 7 8 9 10
3.55 -0.56 1.67 -0.47 -2.08 0.66 -1.39 -0.95 0.28 0.99
11 12 13 14 15 16 17 18 19 20
-2.38 1.94 0.21 -0.36 -1.02 0.25 1.20 1.34 -0.05 -0.53
21 22 23 24 25 26 27 28 29 30
-0.27 -0.75 -0.83 -0.05 -0.20 0.12 -0.12 -0.15 0.06 0.41
31 32 33 34 35 36 37
-0.12 0.82 -0.67 0.03 -0.17 0.05 -0.05
The first observation with sij = 3.55 > 3 is a potential outlier.
Chapter 6 - 7
3.4.1 Methods for Checking the Normality Assumption
I Histogram of the residuals, if normal, should be bell-shaped
I Pros: simple, intuitiveI Cons: for a small sample, histogram may not be
bell-shaped even though the sample is from a normaldistribution
I Normal probability plot of the residuals
I aka. normal QQ plot, in which QQ stands for“quantile-quantile”
I the best method to assess normalityI See next slide for details
Chapter 6 - 8
How to Make a Normal Probability Plot?
1. Given data (y1, y2, . . . , yn), standardize: xi = (yi − y)/s.
2. Arrange the data in increasing order: x(1), x(2), . . . , x(n).
3. Find quantiles of the N(0, 1) distribution: z( 1n+1
), z( 2n+1
),. . . , z( n
n+1).
That is, z( in+1
) is a value such that P(Z ≤ z( in+1
)) = in+1 for
Z ∼ N(0, 1).
4. Plot the x(i) values against the z( in+1
) values.
That is, plot the points (z( in+1
), x(i)) for i = 1, 2, . . . , n
Chapter 6 - 9
Interpreting Normal Probability Plots
I If the data are approximately normal, the plot will be close toa straight line.
I Systematic deviations from a straight line indicate anon-normal distribution.
I Outliers appear as points that are far away from the overallpattern of the plot.
I R command: qqnorm()
Chapter 6 - 10
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
−2 −1 0 1 2
−3
−2
−1
01
2
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Normal
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
−2 −1 0 1 2
01
23
45
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Skewed
●●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●●
●
●●●
●●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●●
●
●
●●
●
●
●
●
●●
● ●
●
●
●
●●
●
●
●●
●●●
●● ●
●
●
●
●
●●
●
−2 −1 0 1 2
−5
05
10
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Heavy tailed
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
● ●
●
●●
●
●
●
●
●●
●
●
●
●
●
−2 −1 0 1 2
0.0
0.2
0.4
0.6
0.8
1.0
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Light tailed
Chapter 6 - 11
QQ plot for the Resin Glue Residuals
> qqnorm(lmorig$res)
> qqline(lmorig$res)
> hist(lmorig$res,xlab="Residuals",main="Histogram of Residuals")
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●
●●●
●
●
●
●
●
●●
●●
−2 −1 0 1 2
−10
010
20
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Histogram of Residuals
Residuals
Fre
quen
cy−20 −10 0 10 20
02
46
810
Do the residuals look normal?
Chapter 6 - 12
Example: Hodgkin’s Disease
Hodgkin’s disease is a type of lymphoma, which is a canceroriginating from white blood cells called lymphocytes. The dataset Hodgkins.txt contains plasma bradykininogen levels (inmicrograms of bradykininogen per milliliter of plasma) in 3 types ofsubjects
I normal,
I in patients with active Hodgkin’s disease, and
I in patients with inactive Hodgkin’s disease.
The globulin bradykininogen is the precursor substance forbradykinin, which is thought to be a chemical mediator ofinflammation.
I Is this an experiment?
I We can use ANOVA to compare means of several samples inan observational study.
Chapter 6 - 13
> mydata = read.table("Hodgkins.txt", header=T)
> attach(mydata)
> plot(Hodgkins,BradyLevel)
> plot(as.integer(Hodgkins), BradyLevel, pch=20, cex=0.75)
●
●
Active Inactive Normal
24
68
1014
●●
●
●
●
●
●●●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●●●
●
●
●
●
●
●
1.0 1.5 2.0 2.5 3.0
24
68
1014
as.numeric(Hodgkins)
Bra
dyLe
vel
The distribution within each group looks right skewed. Let’s fit theANOVA model anyway and take a look at the residuals.
> brady1 = lm(BradyLevel ~ Hodgkins)
> anova(brady1)
Df Sum Sq Mean Sq F value Pr(>F)
Hodgkins 2 65.89 32.95 10.67 0.000104 ***
Residuals 62 191.45 3.09
Chapter 6 - 14
Normal QQ Plot for The Hodgkin Data
> qqnorm(brady1$res, ylab="Residuals")
> qqline(brady1$res)
> library(MASS)
> qqnorm(studres(brady1), ylab="Studentized Residuals")
> qqline(studres(brady1))
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●
●●
●
●
●
●
●
●
−2 −1 0 1 2
−2
02
46
Normal Q−Q Plot
Theoretical Quantiles
Res
idua
ls
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●
●
●
●
●
−2 −1 0 1 2−
10
12
34
5
Normal Q−Q Plot
Theoretical Quantiles
Stu
dent
ized
Res
idua
ls
Does the distribution of the residuals look normal?No, it’s right-skewed, and has a outlier with sij > 5.
Chapter 6 - 15
Remedies for Non-Normality
Asymmetry can often be ameliorated by transforming theresponse — often a power transformation.
fλ(y) =
{yλ, if λ 6= 0
log(y), if λ = 0.
1. If right-skewned, try taking square root, logarithm, or otherpowers < 1
y −→ log(y),√y , or yλ with λ < 1
2. If left-skewned, try squaring, cubing, or other powers > 1
y −→ y2, y3, or yλ with λ > 1
Chapter 6 - 16
Example: Hodgkin’s Disease – QQ Plots
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●
●
●
●
●
−2 −1 0 1 2
−1
01
23
45
Original Scale
Theoretical Quantiles
Stu
dent
ized
Res
idua
ls After a log transformation, theresiduals are less skewed, and theoutlier looks less extreme.
The square-root transformationalso improves normality but not asmuch as the log transformation.
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●
●●
●
●
●
●
●
●
−2 −1 0 1 2
−2
01
23
4
Square−root Transform
Theoretical Quantiles
Stu
dent
ized
Res
idua
ls
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●
●●
●
●
●
●
●
●
−2 −1 0 1 2
−2
−1
01
23
Log Transform
Theoretical Quantiles
Stu
dent
ized
Res
idua
ls
Chapter 6 - 17
R-Codes for Making the Plots on the Previous Slide
> brady1 = lm(BradyLevel ~ Hodgkins)
> brady2 = lm(sqrt(BradyLevel) ~ Hodgkins)
> brady3 = lm(log(BradyLevel) ~ Hodgkins)
> library(MASS)
> qqnorm(studres(brady1), main="Original Scale")
> qqline(studres(brady1))
> qqnorm(studres(brady2), main="Square-root Transform")
> qqline(studres(brady2))
> qqnorm(studres(brady3), main="Log Transform")
> qqline(studres(brady3))
Chapter 6 - 18
Box-Cox MethodBox-Cox method is an automatic procedure to select the “best”power λ that make the residuals of the model
yλij = µi + εij
closest to normal.
I We usually round the optimal λ to a convenient power like
−1, −1
2, −1
3, 0,
1
3,
1
2, 1, 2, . . .
since the practical difference of y0.5827 and y0.5 is usuallysmall, but the square-root transformation is much easier tointerpret.
I A confidence interval for the optimal λ can also be obtained.See Oehlert, p.129 for details.We usually select a convenient power λ∗ in this C.I.
Chapter 6 - 19
Example: Hodgkin’s Disease – Box-Cox
In R, one must first load thelibrary MASS to use boxcox().The argument of the boxcox()
function can be a model formula,or a lm model, or an aov model.
> library(MASS)
> boxcox(brady1)
> boxcox(BradyLevel ~ Hodgkins)−2 −1 0 1 2
−80
−70
−60
−50
λ
log−
Like
lihoo
d
95%
The middle dash line marks the optimal λ, the right and left dashline mark the 95% C.I. for the optimal λ.
For the plot, we see the optimal λ is around −0.2, and the 95%C.I. contains 0. For simplicity, we use the log-transformedBradyLevel as our response.
Chapter 6 - 20
A Remark On the Log-Scale
In fact, for concentration, log scale more commonly used than theoriginal scale. For example,
I A concentration of 10.1 and 10.001 are almost the same, buta concentration of 0.1 and 0.001 are very different since 0.1 is100 times higher than 0.001.
I In the original scale, (10.1, 10.001) and (0.1, 0.001) differ bythe same amount, 0.099.
I In log scale,
log10 0.1− log10 0.001 = 2 much greater than
log10 10.1− log10 10.001 ≈ 0.0043
Chapter 6 - 21
Example: Hodgkin’s Disease
From the 2 ANOVA tables below, we see a log transformationmakes the difference between the 3 group of patients moresignificant because the outlier become less extreme and not inflatethe MSE as much.
Response: BradyLevel
Df Sum Sq Mean Sq F value Pr(>F)
Hodgkins 2 65.893 32.946 10.67 0.0001042 ***
Residuals 62 191.449 3.088
Response: log(BradyLevel)
Df Sum Sq Mean Sq F value Pr(>F)
Hodgkins 2 2.2526 1.12631 15.436 3.628e-06 ***
Residuals 62 4.5238 0.07297
Chapter 6 - 22
Non-Parametric Tests
I Transformation does not always work. For example, it helpslittle for symmetric but heavy-tailed (many outliers)distributions.
I The ANOVA F -test is robust to non-normality, but it is notresistent to outliers.
I If outliers are unavoidable, and cannot be removed, trynon-parametric tests (like permutation test, Kruskal-Wallistest (Section 3.11.1)) that doesn’t rely on normalityassumption.
I There might be a lecture specifically on non-parametric testsif we have time.
Chapter 6 - 23
Constant Variance Assumption
I Why Is Non-Constant Variance a Problem?I Tools for checking constant variance assumption — Residual
PlotsI Residuals v.s. Fitted ValuesI Residuals v.s. TreatmentsI Residuals v.s. Other Variables
I RemediesI Transforming the Response,
Variance-Stabilizing TransformationI Brown-Forsythe Modified F -test — an alternative to ANOVA
F -testI Welch Test for Contrasts w/o Constant Variance Assumption
Chapter 6 - 24
Why Is Non-Constant Variance a Problem?Example — Serial Dilution Plating (Exercise 6.2 on p.143). . . . . . . . . . . . a method to count the number of bacteria in solution
Chapter 6 - 25
Example — Serial Dilution Plating (Cont’d)
I Goal: to compare 3 pasteurization methods for milk
I Design: 15 samples of milk randomly assigned to the 3 trts
I Response: the bacterial load in each sample after treatment,determined via serial dilution plating
I Data: http://users.stat.umn.edu/~gary/book/fcdae.data/ex6.2
Method 1 Method 2 Method 326× 102 35× 103 29× 105
29× 102 23× 103 23× 105
20× 102 20× 103 17× 105
22× 102 30× 103 29× 105
32× 102 27× 103 20× 105
Mean 25.8× 102 27× 103 23.6× 105
SD 492 5874 536656
√MSE = 309900
Chapter 6 - 26
Example — Serial Dilution Plating (Cont’d)
95%C.I. for the mean of Method 1:
y1• ± t0.025,15−3
√MSE√n1
= 2580± 2.179309900√
5
= 2580± 301965
= (−299385, 304545)
which is way larger than the range of 5 observations for method 1(2000-3200). What happened?
Chapter 6 - 27
Checking Constant Variance Assumption (1)
Tests for equality of variance are available, but not useful, because
1. the tests do not tell us how much nonconstant variance ispresent and how much it affects our inferences.
2. classical tests of constant variance are very sensitive tononnormality in the errors.It is very difficult to tell non-normality from normality withnon-constant variance.)
Chapter 6 - 28
Checking Constant Variance Assumption (2)
Better tool for checking constant variance — Residual Plots
I Residuals v.s. Fitted Values
I Residuals v.s. Treatments
I Residuals v.s. Other Variables
If the constant variance assumption is true, residuals will evenlyspread around the zero line.
Rule of thumb: ANOVA F -tests for CRD can toleratenon-constant variance to some extent, so do tests for contrasts.It usually fine as long as
max{σi}min{σi}
≤ 2 or even 3,
especially when the group sizes ni are (roughly) equal.
Chapter 6 - 29
Example: Resin Glue Data> plot(lmorig$fit,lmorig$res,ylab="Residuals",xlab="Fitted Values")
> abline(h=0) # adding a zero line
> plot(temp,lmorig$res,ylab="Residuals",xlab="Centigrade Temperature")
> abline(h=0) # adding a zero line
Residuals v.s. Fitted Values Residuals v.s. Treatments●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●●●●●●
●
●
●
●●●●
20 40 60 80
−10
010
20
Fitted Values
Res
idua
ls
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●●
● ●●●●●●
●
●
●
●●●●
180 200 220 240
−10
010
20Centigrade Temperature
Res
idua
ls
I Why the points make vertical strips?
I The spread of residuals increases with fitted valuesI The lower the temperature, the more variable the response.
Chapter 6 - 30
More on the Residuals PlotsResiduals plots can be used to check many other things, likenon-linearity.
In the resin glue example, if one fit the a linear modelyij = β0 + β1T + εij . One can check whether lifetime is indeedlinear in temperature using a residual plot
> lmorig1 = lm(life ~ temp)
> plot(temp,lmorig1$res,ylab="Residuals",xlab="Centigrade Temperature")
> abline(h=0)
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●●
●●●●●
●
●●●●●●●
●
●●●●●
−30
−10
1030
Centigrade Temperature
Res
idua
ls
175 194 213 231 250
From the residual plot, we see
I non-constant variance acrosstemperature
I lifetime is curved, not linearwith temperature
Chapter 6 - 31
Plot of Residuals v.s. Other Variables
If there are other variables that might possibly affect the response,but not included in the model, then one should check the residualplots with these variables. For example,
I if experimental units come from different batches, thenresiduals should be plotted v.s. batches
I if measurements are made by several operators, then residualsshould be plotted v.s. operators
Patterns in such residual plots suggest these variables should be
I either included in the analysis(but note one CANNOT claim these variables have acausal-effect on the response, since they are not controlled inadvance)
I or controlled more carefully (e.g., by a block design) in futureexperiments
Chapter 6 - 32
Remedy 1: Variance-Stabilizing TransformationIf the SD σ (the spread of residuals) depends on the mean µ (thefitted value), you can try a variance-stabilizing transformation ofthe response to make the variance (close to) constant.
I if the SD ∝ the fitted value, then transform the response y tolog(y),
I e.g., the serial dilution plating example
I if the SD ∝√
the fitted value, i.e.,
Variance ∝ fitted value,
theny → √y
I In general, if σ ∝ µα, then the variance-stabilizingtransformation is
y →
{y1−α for α 6= 1
log(y) for α = 1
Chapter 6 - 33
Example: Resin Glue
> lm1 = lm(life ~ as.factor(temp))
> lm2 = lm(sqrt(life) ~ as.factor(temp))
> lm3 = lm(log(life) ~ as.factor(temp))
> plot(lm1$fit, lm1$res, xlab="Fitted Value", ylab="Residuals")
> abline(h=0,lty=2)
> plot(lm2$fit, lm2$res, xlab="Fitted Value", ylab="Residuals")
> abline(h=0,lty=2)
> plot(lm3$fit, lm3$res, xlab="Fitted Value", ylab="Residuals")
> abline(h=0,lty=2)
Original Scale Square-root Transform Log Transform●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●●●●●●
●
●
●
●●●●
20 40 60 80
−10
010
20
Fitted Value
Res
idua
ls
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
4 5 6 7 8 9
−1.
5−
0.5
0.5
Fitted Value
Res
idua
ls
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
2.5 3.0 3.5 4.0 4.5
−0.
40.
00.
20.
4
Fitted ValueR
esid
uals
Chapter 6 - 34
Box-Cox Again
In many cases, we don’t have a good idea what is the value of α.We can still try power transformation of the response.
fλ(y) =
{yλ, if λ 6= 0
log(y), if λ = 0.
How to select λ?
I Trial and error: try convenient power like −1,−1/2,−1/3, 0,1/3, 1/2, 2, . . . and then check residual plots for each of themfor the constant variance.
I Box-Cox method:Though Box-Cox is developed to select a power transformationmaking the residuals as normal as possible, it’s been shownthat the optimal λ is often close to the variance-stabilizing λ.
Chapter 6 - 35
Example – Count of Bacteria Revisit
> library(MASS)
> boxcox(count ~ as.factor(method))
> plot(method, log10(count),
xlab="Methods",
ylab="log10(Count of Bacteria)")
−2 −1 0 1 2
−10
0−
60−
200
λ
log−
Like
lihoo
d
95%
●●●●●
●●●●●
●●●
●●
1.0 1.5 2.0 2.5 3.0
3.5
4.5
5.5
6.5
Methods
log1
0(C
ount
of B
acte
ria)
After log transformation, the 3 variances look even.
Chapter 6 - 36
Example: Resin Glue – Box-Cox
> library(MASS)
> boxcox(life ~ as.factor(temp))
−2 −1 0 1 2
−70
−50
−30
−10
λ
log−
Like
lihoo
d
95%
The 95% C.I. for λ contains both 0 and 1/2. As λ = 1/2 is veryclose to the boundary of the C.I, λ = 0 seems to be a betterchoice, which is consistent with the Arrhenius Law.
Chapter 6 - 37
Drawbacks of Transformation
I Except for a few special transformation (log,√
, reciprocal),the transformed response usually lacks natural interpretation(what does y0.1 mean?)
I Unless having a good interpretation on the transformedresponse, think again before making transformations
Remember that ANOVA tests have some tolerance fornon-constant variance. If
max{σi}min{σi}
≤ 2 or 3,
don’t worry too much about non-constant variance.In that case, it is fine to leave the response untransformed.
Chapter 6 - 38
> tapply(life, temp, sd)
175 194 213 231 250
12.895514 9.557785 6.378703 1.661181 3.646917
> tapply(sqrt(life), temp, sd)
175 194 213 231 250
0.6802037 0.7505140 0.6223047 0.2048698 0.5305554
> tapply(log(life), temp, sd)
175 194 213 231 250
0.1440872 0.2393055 0.2451210 0.1012540 0.3177677
After the log transformation, the ratio of the largest and smallestSD is 2.2, which is acceptable
Chapter 6 - 39
Brown-Forsythe Modified F -test
If transformation doesn’t work well, or you don’t want to transformfor lacking a good interpretation, try tests that don’t rely on theconstant variance assumption:
I Brown-Forsythe modified F -test — an alternative ofANOVA F -test:
BF =
∑gi=1 ni (y i• − y••)2∑gi=1 s
2i (1− ni/N)
=SSTrt∑g
i=1 s2i (1− ni/N)
in which s2i is the sample variance in treatment i . Under thenull hypothesis of equal treatment means, BF is approximatelydistributed as an F -distribution with g − 1 and ν degrees offreedom, where
ν =(∑g
i=1 di )2∑g
i=1 d2i /(ni − 1)
in which di = s2i (1− ni/N).
Chapter 6 - 40
Welch Test for Contrasts w/o Constant VarianceAssumption
If transformation doesn’t work well, try tests that doesn’t rely onthe constant variance assumption:
I Welch test for a contrast∑g
i=1 ciµi : The test statistic is
t =
∑gi=1 ciy i•√∑gi=1 c
2i s
2i /ni
which is approximately a t-distribution with ν degrees offreedom, where
ν =
(∑gi=1 c
2i s
2i /ni
)2∑gi=1
1ni−1
c4i s4i
n2i
Observe that this is a generalization of the two-sample testwithout equal variance assumption.
Chapter 6 - 41
Checking for Dependent Errors and Remedies
Chapter 6 - 42
Checking for Dependent Errors
I Among the 3 assumptions, violation of the independenceassumption causes severest problem. Most of our the analysis(ANOVA, test of contrasts, multiple comparisons, etc.) havelittle tolerance on dependence of errors
I There are various forms of dependence, serial dependence andspatial dependence are two common ones
I A time plot of the residuals (v.s. the order we measure them)is a good way to check for serial dependence.
I It’s better to keep track of the order units are measured.I A smooth time-plot is a sign of positive serial
dependence, since a smooth time plot means successiveresiduals are too close together
Chapter 6 - 43
Autocorrelation and Autocorrelation Plots
Autocorrelation plots are also common ways to check for serialdependence.
I Lag 1 autocorrelation plot: plotting (X1, . . . ,Xn−1) v.s.(X2, . . . ,Xn)
I Lag k autocorrelation plot: plotting (X1, . . . ,Xn−k) v.s.(X1+k , . . . ,Xn)
I Any trend in the autocorrelation plot is a sign of serialdependence.
A numerical way to to check for serial dependence, is to computethe Lag-k autocorrelation coefficient, which is the correlationcoefficient of (X1, . . . ,Xn−k) v.s. (X1+k , . . . ,Xn), k = 1, 2, 3, . . . .
Chapter 6 - 44
Spatial Dependence
Spatial dependence can arise when the experimental units arearranged in space, like plants in a farm. Spatial dependence occurswhen units that are closer together are more similar than unitsfarther apart.
Chapter 6 - 45
Remedies for Dependence
1. There isn’t much we can do about dependence using ourcurrent machinery, since no simple transformation can removedependence.
2. Analysis of dependent data requires tools like time series orspatial statistics, which is beyond the scope of this class
Chapter 6 - 46
Example: Standard GravityThe National Bureau of Standards performed eight series ofexperiments between 1924 and 1935 to determine g , the standardgravity.
The data are given in the table below (in deviations from9.8m/s2 × 105 so that, for example, the first measurement of g is9.80076 m/s2), with series 1 representing the earliest set ofexperiments and series 8 the last.
Series Measurements
1 76 82 83 54 35 46 87 682 87 95 98 100 109 109 100 81 75 68 673 105 83 76 75 51 76 93 75 624 95 90 76 76 87 79 77 715 76 76 78 79 72 68 75 786 78 78 78 86 87 81 73 67 75 82 837 82 79 81 79 77 79 79 78 79 82 76 73 648 84 86 85 82 77 76 77 80 83 81 78 78 78
Chapter 6 - 47
Example: Standard Gravity
The National Bureau of Standards1 is the government agency thatmeasures things. The following statement is taken from the NISTwebsite:
Founded in 1901, NIST is a non-regulatory federalagency within the U.S. Department of Commerce. NISTmission is to promote U.S. innovation and industrialcompetitiveness by advancing measurement science,standards, and technology in ways that enhanceeconomic security and improve our quality of life.
Thus, it is safe to assume that the NBS scientists were trying hardto measure the same quantity g (e.g., all experiments were done inthe same location) throughout all eight series of experiments.
1now called the National Institute of Standards and TechnologyChapter 6 - 48
> gee = read.table("g.txt", header=T)
> attach(gee)
> plot(series,g)
Variance decreases with series,which makes sense since theaccuracy of measurementimproves as time goes by.
●
●●
●
●
●
●
●
●
●●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●●●
●●
●● ●●●
●●
●
●
●
●
●● ●●●●●●●●●●
●●
●
●●●●
●●●●●●●●●
1 2 3 4 5 6 7 8
4060
8010
0
series
Sta
ndar
d G
ravi
ty g
(m
/s^2
)
What happens if we ignore this fact, and do an ANOVA F -test?
> lmg = lm(g ~ as.factor(series), data=gee)
> anova(lmg)
Df Sum Sq Mean Sq F value Pr(>F)
as.factor(series) 7 2819 402.7 3.568 0.00236 **
Residuals 73 8239 112.9
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
ANOVA rejects the null hypothesis that the 8 series have the samemean.
Chapter 6 - 49
Question
What is implication if the null hypothesis is rejected? Will youconclude that
(a) g had changed over time, or
(b) the ANOVA F -test failed?
If your choice is (a), how do you explain the change of g?If your choice is (b), why the ANOVA F -test fails?
The ANOVA F -test here is not reliable because at least of itsassumptions are not met — constant variance.
> round(tapply(g,series,sd),2)
1 2 3 4 5 6 7 8
19.25 15.29 15.76 8.30 3.65 5.84 4.74 3.36
Will a transformation work here? Try Box-Cox?
Chapter 6 - 50
Brown-Forsythe Modified F -testSince the variances are not constant, let’s try the Brown-Forsythemodified F -test
BF =SSTrt∑g
i=1 s2i (1− ni/N)
The numerator SSTrt = 2819 can be found in the ANOVA tableabove. The denominator is found using R (see the codes below) tobe 888.5747
> sds = tapply(g,series,sd)
> nis = tapply(g,series,length)
> di = sds^2*(1-nis/length(g))
> BFbottom = sum(di)
> BFbottom
[1] 888.5747
> BF = 2819/BFbottom
The BF -statistic is thus
BF =2819
888.5747= 3.1725
Chapter 6 - 51
Under the null hypothesis of equal treatment means, BF isapproximately distributed as an F -distribution with g − 1 and νdegrees of freedom, where
ν =(∑g
i=1 di )2∑g
i=1 d2i /(ni − 1)
in which di = s2i (1− ni/N)
which is calculated as 29.46249 in the R code below.
> nu = (BFbottom)^2/sum(di^2/(nis-1))
> nu
[1] 29.46249
> 1 - pf(BF, df1 = 7, df2 = nu) # P-value of the BF-test
[1] 0.01263341
However, the BF-test, not relying on the constant varianceassumption, also rejects the null hypothesis of equal mean at aP-value 0.0126. How can this be true?
Chapter 6 - 52
Maybe earlier measurements are less reliable, what if we excludethe first series?
> lmg2 = lm(g ~ as.factor(series), subset = (series > 1))
> anova(lmg2)
Df Sum Sq Mean Sq F value Pr(>F)
as.factor(series) 6 1428.6 238.092 2.7835 0.01787 *
Residuals 66 5645.5 85.538
The Brown-Forsythe Modified F -test gives BF = 2.616 andP-value = 0.038.
What if we remove the first and second series?
> lmg2 = lm(g ~ as.factor(series), subset = (series>2))
> anova(lmg2)
Df Sum Sq Mean Sq F value Pr(>F)
as.factor(series) 5 222.8 44.553 0.7545 0.5863
Residuals 56 3306.6 59.046
The Brown-Forsythe Modified F -test gives BF = 0.658 andP-value = 0.659.
Can we conclude series 3-8 have a common mean, but not series 1and 2?
Chapter 6 - 53
The observations in each series are in fact given in time ordertaken. We can thus make a time plot.
0 20 40 60 80
4060
8010
0
Index
Sta
ndar
d G
ravi
ty g
(m
/s^2
)
●
● ●
●
●
●
●
●
●
●●
●
● ●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●●
●
● ●● ●
●
●
●● ● ● ●
● ●
●
●
●
●
● ● ●●
●●
●● ● ● ●
●
●●
●
●● ●
●
● ● ●●
●●
● ● ●
The vertical lines separate measurements of difference series.
We can see a lot of measurements are close to the previousmeasurements, which indicates a positive serial correlation.
It’s not surprising that scientists in NIST might unconsciouslymatch their results with the previous measurement, since theprevious measurement is often regarded as the most accurate onetill then.
Chapter 6 - 54
> plot(g[2:81],g[1:80], pch=20,ylab = "g",xlab="g (Lag = 1)")
> cor(g[2:81],g[1:80])
[1] 0.5002454
> plot(g[3:81],g[1:79], pch=20,ylab = "g",xlab="g (Lag = 2)")
> cor(g[3:81],g[1:79])
[1] 0.1232966
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●●
●
●●●●
●●
●●●● ●
●●
●
●
●
●
●●●●●●
●●●●●
●
●●
●
●●●●
●●●●
●●●●
40 60 80 100
4060
8010
0
g[2:81]
g[1:
80]
●
●●
●
●
●
●
●
●
●●●
●●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●●
●
●●●●
●●
●●● ●●
●●
●
●
●
●
●●●●●●●●●●●
●
●●
●
●●●●
●●●●
●●●
40 60 80 10040
6080
100
g[3:81]
g[1:
79]
Lag 1 autocorrelation = 0.5002 Lag 2 autocorrelation = 0.1233
Chapter 6 - 55
Effect of Dependent Errors
With positive serial correlation, the mean of SSTrt is higher than(8− 1)σ2, which makes F -statistic larger and easier to reject thenull.
Chapter 6 - 56
We cannot check the independence assumption since we don’thave other information. If the data are collected in the order theyare recorded, we can check the autocorrelation. The time plotbelow shows no clear pattern.
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●
●●
●
●
●
●
●
●
0 10 20 30 40 50 60
−0.
6−
0.2
0.2
0.6
Index
Res
idua
ls
The lag-1 autocorrelation coefficient is −0.113, which meansbasically, no correlation.
> cor(brady3$res[-65],brady3$res[-1])
[1] -0.1132061
If there are other variables available, one should plot the residualsagainst each of these variables.
Chapter 6 - 57
Coming Up Next...
Chapter 6 - 58