Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between...
Transcript of Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between...
![Page 1: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/1.jpg)
Regression and Programming in R
Anja Bråthen Kristoffersen
Biomedical Research Group
![Page 2: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/2.jpg)
R Reference Card
http://cran.r-project.org/doc/contrib/Short-refcard.pdf
![Page 3: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/3.jpg)
Simple linear regression
• Describes the relationship between two variables x and y
• The numbers α and β are called parameters, and ϵ is the error term.
2014.01.15 3
xy
![Page 4: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/4.jpg)
Example data
• Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA.
head(faithful)
eruptions waiting
1 3.600 79
2 1.800 54
3 3.333 74
4 2.283 62
5 4.533 85
6 2.883 55
2014.01.15 4
![Page 5: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/5.jpg)
2014.01.15 5
P-value
Estimated
parameters
Coefficient of Determination
![Page 6: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/6.jpg)
Coefficient of determination
• the quotient of the variances of the fitted values and observed values of the dependent variable.
2014.01.15 6
2
2
2ˆ
yy
yyr
i
i
![Page 7: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/7.jpg)
Prediction
• develop a 95% confidence interval of the mean eruption duration for the waiting time of 80 minutes
newdata <- data.frame(waiting=80)
predict(eruption.lm, newdata, interval="confidence") fit lwr upr 1 4.1762 4.1048 4.2476
2014.01.15 7
![Page 8: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/8.jpg)
Residuals
• The difference between the observed data of the dependent variable y and the fitted values ŷ.
2014.01.15 8
yyResidual ˆ
eruption.res <- resid(eruption.lm)
plot(faithful$waiting, eruption.res, ylab="Residuals",
xlab="Waiting Time", main="Old Faithful Eruptions")
abline(0,0, col = "red", lwd = 2)
![Page 9: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/9.jpg)
2014.01.15 9
![Page 10: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/10.jpg)
Standardized residual
• the residual divided by its standard deviation
eruption.stdres = rstandard(eruption.lm)
plot(faithful$waiting, eruption.stdres,
ylab="Standardized Residuals", xlab="Waiting Time",
main="Old Faithful Eruptions")
abline(0, 0, col ="red", lwd = 2)
2014.01.15 10
i
i
i
residual of deviationstandard
residualresidualedStandardiz
![Page 11: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/11.jpg)
2014.01.15 11
![Page 12: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/12.jpg)
Normal Probability Plot of Residuals
qqnorm(eruption.stdres,
ylab="Standardized Residuals",
xlab="Normal Scores",
main="Old Faithful Eruptions")
qqline(eruption.stdres, col = "red", lwd=3)
2014.01.15 12
![Page 13: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/13.jpg)
Generalized additive model
• Can replace the linear relationship between response and variable, as in linear regression, with a non-linear relationship. A spline.
2014.01.15 13
![Page 14: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/14.jpg)
GAM
install.packages("mgcv")
library(mgcv)
This is mgcv 1.5-5 . For overview type
`help("mgcv-package")'.
eruption.gam <- gam(eruptions ~ 1+s(waiting), data=faithful)
plot(eruption.gam)
2014.01.15 14
![Page 15: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/15.jpg)
plot(eruption.gam)
2014.01.15 15
![Page 16: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/16.jpg)
2014.01.15 16
Coefficient of
Determination
p-value
![Page 17: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/17.jpg)
eruption.res.gam <- resid(eruption.gam)
plot(faithful$waiting, eruption.res.gam, ylab="Residuals",
xlab="Waiting Time", main="Old Faithful Eruptions")
abline(0,0, col="red", lwd=2)
2014.01.15 17
![Page 18: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/18.jpg)
eruption.res.gam <- resid(eruption.gam)
qqnorm(eruption.res.gam)
qqline(eruption.res.gam, col="red", lwd=3)
2014.01.15 18
![Page 19: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/19.jpg)
Multiple regresion
• describes a dependent variable y (response) by independent variables x1, x2, ..., xp (p > 1) is expressed by the equation
where the numbers α and βk (k = 1, 2, ..., p) are the parameters, and ϵ is the error term.
2014.01.15 19
k
kkxy
![Page 20: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/20.jpg)
Multiple regresion Explore your data
• Explore your data before starting the analysis
– Are the responses correlated?
– Use plot (trellis graphics, boxplot)
– Use correlation (pearson, spearman)
2014.01.15 20
![Page 21: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/21.jpg)
2014.01.15 21
DLBCL patient data
• Response: Germinal.cnter.B.cell.signature
• Explanatory variables – Lymph.node.signature
– Proliferation.signature
– BMP6
– MHC.class.II.signature
2014.01.15 21
![Page 22: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/22.jpg)
pairs(dat[,8:11])
2014.01.15 22
![Page 23: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/23.jpg)
cor(dat[,8:11])
2014.01.15 23
![Page 24: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/24.jpg)
2014.01.15 24
Adjusted Coefficient of Determination
p-vaules
Not significant
p values
![Page 25: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/25.jpg)
Adjusted Coefficient of Determination
2014.01.15 25
1
1)1(1
22
pn
nRR
adj
Hvor n er antall observasjoner og
p er antall parametere brukt i modellen
![Page 26: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/26.jpg)
Comparing models
• As long as analysis are done on the same responses you can compare your models by information criteria:
• AIC (Akaike's An Information Criterion)
-2*log-likelihood + 2*npar
• BIC (Schwarz's Bayesian criterion)
-2*log-likelihood + log(n)*npar
• Goal: as small AIC or BIC as possible, i.e. explain most with less parameters
2014.01.15 26
![Page 27: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/27.jpg)
27
The AIC value is less, chose this model
All p values are significant
Adjusted coefficient of
determination hardly
changed
![Page 28: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/28.jpg)
fit2.res <- resid(fit2) fit2.fitted <- fitted(fit2) pairs(fit2.fitted, fit2.res, col = "darkgreen", pch="*")
2014.01.15 28
![Page 29: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/29.jpg)
Logistic regression
• We use the logistic regression equation to predict the probability of a dependent variable that takes on only the values 0 and 1. Suppose x1, x2, ..., xp (p > 1) are the independent variables, α, βk (k = 1, 2, ..., p) are the parameters, and E(y) is the expected value of the dependent variable y
2014.01.15 29
k
kkx
e
yE
1
1)(
![Page 30: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/30.jpg)
DLBCL patient data
• Response: alive or dead
• Explanatory variables – Subgroup
– IPI.Group
– Germinal.center.B.cell.signature
– Lymph.node.signature
– Proliferation.signature
– BMP6
– MHC.class.II.signature
2014.01.15 30
![Page 31: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/31.jpg)
DLBCL.glm <- glm(Status.at.follow.up ~ Subgroup + IPI.Group + Germinal.center.B.cell.signature + Lymph.node.signature + Proliferation.signature + BMP6 + MHC.class.II.signature, family= "binomial", data = dat)
2014.01.15 31
Here
logistic
regression
is chosen
![Page 32: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/32.jpg)
DLBCL2.glm <- glm(Status.at.follow.up ~ IPI.Group, family= "binomial", data = dat) summary(DLBCL2.glm) High Low Medium missing NA's 32 82 108 1 17
2014.01.15 32
The survival of
those containing
the IPI.group Low
is significantly
different from
those that are in
the group High
The estimate
is negative,
hence pations
in group Low
have less
probability to
die then those
in group High
![Page 33: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/33.jpg)
Backward an forward inclusion of response variables
• Forward
– Start with all response variables, exclude the one that is least significant, compare AIC values.
• Backward
– Start with one regression for each response variable
– Include more and more response variables that are significant in the model, compare AIC values
2014.01.15 33
![Page 34: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/34.jpg)
Programming in R
2014.01.15 34
![Page 35: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/35.jpg)
Function
panel.hist <- function(x){
usrx <- par("usr")
on.exit(par(usrx))
par(usr = c(usrx[1:2], 0, 1.5) ) # indicates the position in the plot
hi <- hist(x, plot = FALSE) # calculate histogram without plotting it
Breaks <- hi$breaks # define breaks used in h
nB <- length(Breaks) # count the number of breaks
y <- hi$counts # find the counts in each interval
y <- y/max(y) # scale y for plotting
rect(Breaks[-nB], 0, Breaks[-1], y) #plots rectangulars in existing plot
}
pairs(dat[,7:11], col = dat[,5], cex = 0.5, pch = 24, diag.panel = panel.hist,
upper.panel=NULL)
![Page 36: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/36.jpg)
2014.01.15 36
![Page 37: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/37.jpg)
2014.01.15 37
![Page 38: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/38.jpg)
2014.01.15 38
![Page 39: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/39.jpg)
apply()
2014.01.15 39
• Use apply when you want the same thing done on every row or column of a matrix.
• apply(d, 1, functionA) will use functionA on every row of dataset d.
• apply(d, 2, functionA) will use functionA on every column of dataset d.
![Page 40: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/40.jpg)
Example: apply()
2014.01.15 40
d <- matrix(runif(90), ncol = 10, nrow = 9)
head(d)
par(mfrow = c(3,3))
apply(d, 1, hist)
![Page 41: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/41.jpg)
for – loop
• for (name in expr_1) expr_2
– name on variable: i
– expr_1 should be 1:antall
– expr_2 should be
sum(dbinom(1:6, 8, p[i])), but you have to save it somewhere:
beta8[i] <- sum(dbinom(1:6, 8, p[i])) then you have to create beta8 before the loop.
![Page 42: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/42.jpg)
for – loop
p <- seq(0.6, 1, 0.01)
antall <- length(p)
beta8 <- rep(NA, antall) #allocate space for beta8
for(i in 1:antall){
beta8[i] <- sum(dbinom(1:6, 8, p[i]))
}
power8 <- 1 - beta8
plot(p, power8, type = "l")
![Page 43: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/43.jpg)
2014.01.15 43
![Page 44: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/44.jpg)
if statment
• if (expr_1) expr_2 else expr_3 • expr_1 is a statement that is either true or
false • expr_2 is preformed if expr_1 is true • expr_3 is optional, but preformed if expr_1 is
false and else statement is included
![Page 45: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/45.jpg)
Example: if()
2014.01.15 45
x <- runif(1,0,1)
y <- rnorm(1,0,1)
if(x > y){
print(paste("x =", round(x,3), "is greater
than y =", round(y,3), sep = " "))
} else {
print(paste("x =", round(x,3), "is less than
y =", round(y,3), sep = " "))
}
[1] "x = 0.783 is greater than y 0.536"
![Page 46: Anja Bråthen Kristoffersen Biomedical Research Group · Example data • Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone](https://reader030.fdocument.pub/reader030/viewer/2022041216/5e05657fe83bb03a33641aea/html5/thumbnails/46.jpg)
When finished
• You could save your workspace in R, you could then save the workspace in different project folders. I do not do that!
• You could just save your script and run it once again if you want to work further on it.
• You could write your partial results (matrix) that you want to work further on to a .txt file, use write.table(objectname, ”path”, sep = ”\t”)
– objectname is the name of your object in R
– ”path” is your path to where you will save the object including the file name of the object.
– sep = ”\t” ensure that your txt file is tabulate separated and makes it easier to open in excel.