Data Analysis Using R: Multiple linear regression analysis

11
Data Analysis Using R: Multiple linear regression analysis Tuan V. Nguyen Garvan Institute of Medical Research, Sydney, Australia

description

Data Analysis Using R: Multiple linear regression analysis. Tuan V. Nguyen Garvan Institute of Medical Research, Sydney, Australia. Data. Effects of temperature (x1), time (x2) on CO2 production. - PowerPoint PPT Presentation

Transcript of Data Analysis Using R: Multiple linear regression analysis

Page 1: Data Analysis Using R: Multiple linear regression analysis

Data Analysis Using R:Multiple linear regression analysis

Tuan V. Nguyen

Garvan Institute of Medical Research,

Sydney, Australia

Page 2: Data Analysis Using R: Multiple linear regression analysis

Data• Effects of temperature (x1), time (x2) on CO2 production.

Id y X1 X2 X3 X4 X5 X6 X7

1 36.98 5.1 400 51.37 4.24 1484.83 2227.25 2.06

2 13.74 26.4 400 72.33 30.87 289.94 434.90 1.33

3 10.08 23.8 400 71.44 33.01 320.79 481.19 0.97

4 8.53 46.4 400 79.15 44.61 164.76 247.14 0.62

5 36.42 7.0 450 80.47 33.84 1097.26 1645.89 0.22

6 26.59 12.6 450 89.90 41.26 605.06 907.59 0.76

7 19.07 18.9 450 91.48 41.88 405.37 608.05 1.71

8 5.96 30.2 450 98.60 70.79 253.70 380.55 3.93

9 15.52 53.8 450 98.05 66.82 142.27 213.40 1.97

… … … … … … … … …

25 11.19 11.5 450 77.88 25.20 663.09 994.63 1.61

26 75.62 5.2 470 75.50 8.66 1464.11 2196.17 4.78

27 36.03 10.6 470 83.15 22.39 720.07 1080.11 5.88

Chú thích: y = sản lượng CO2; X1 = thời gian (phút); X2 = nhiệt độ (C); X3 = phần trăm hòa tan;

X4 = lượng dầu (g/100g); X5 = lượng than đá; X6 = tổng số lượng hòa tan; X7 = số hydrogen tiêu thụ.

Page 3: Data Analysis Using R: Multiple linear regression analysis

Read data into R

y <- c(36.98,13.74,10.08, 8.53,36.42,26.59,19.07, 5.96,15.52,56.61, 26.72,20.80, 6.99,45.93,43.09,15.79,21.60,35.19,26.14, 8.60, 11.63, 9.59, 4.42,38.89,11.19,75.62,36.03)x1 <- c(5.1,26.4,23.8,46.4, 7.0,12.6,18.9,30.2,53.8,5.6,15.1,20.3,48.4, 5.8,11.2,27.9,5.1,11.7,16.7,24.8,24.9,39.5,29.0, 5.5, 11.5, 5.2,10.6)x2 <- c(400,400, 400, 400, 450, 450, 450, 450, 450, 400, 400, 400, 400, 425, 425, 425, 450, 450, 450, 450, 450, 450, 450, 460, 450, 470, 470)x3 <- c(51.37,72.33,71.44,79.15,80.47,89.90,91.48,98.60,98.05,55.69, 66.29,58.94,74.74,63.71,67.14,77.65,67.22,81.48,83.88,89.38, 79.77,87.93,79.50,72.73,77.88,75.50,83.15)x4 <- c(4.24,30.87,33.01,44.61,33.84,41.26,41.88,70.79,66.82, 8.92,17.98,17.79,33.94,11.95,14.73,34.49,14.48,29.69,26.33, 37.98,25.66,22.36,31.52,17.86,25.20, 8.66,22.39)x5 <- c(1484.83, 289.94, 320.79, 164.76, 1097.26, 605.06, 405.37, 253.70, 142.27,1362.24, 507.65, 377.60, 158.05, 130.66, 682.59, 274.20, 1496.51, 652.43, 458.42, 312.25, 307.08, 193.61, 155.96,1392.08, 663.09,1464.11, 720.07)x6 <- c(2227.25, 434.90, 481.19, 247.14,1645.89, 907.59, 608.05, 380.55, 213.40,2043.36, 761.48, 566.40, 237.08,1961.49,1023.89, 411.30,2244.77, 978.64, 687.62, 468.38, 460.62, 290.42, 233.95,2088.12, 994.63,2196.17,1080.11)x7 <- c(2.06,1.33,0.97,0.62,0.22,0.76,1.71,3.93,1.97,5.08,0.60,0.90, 0.63,2.04,1.57,2.38,0.32,0.44,8.82,0.02,1.72,1.88,1.43, 1.35,1.61,4.78,5.88)REGdata <- data.frame(y, x1,x2,x3,x4,x5,x6,x7)

Page 4: Data Analysis Using R: Multiple linear regression analysis

Results of analysis

> reg <- lm(y ~ x1+x2+x3+x4+x5+x6+x7, data=REGdata)> summary(reg)

Residuals: Min 1Q Median 3Q Max -20.035 -4.681 -1.144 4.072 21.214 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 53.937016 57.428952 0.939 0.3594 x1 -0.127653 0.281498 -0.453 0.6553 x2 -0.229179 0.232643 -0.985 0.3370 x3 0.824853 0.765271 1.078 0.2946 x4 -0.438222 0.358551 -1.222 0.2366 x5 -0.001937 0.009654 -0.201 0.8431 x6 0.019886 0.008088 2.459 0.0237 *x7 1.993486 1.089701 1.829 0.0831 .---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 10.61 on 19 degrees of freedomMultiple R-Squared: 0.728, Adjusted R-squared: 0.6278 F-statistic: 7.264 on 7 and 19 DF, p-value: 0.0002674

Page 5: Data Analysis Using R: Multiple linear regression analysis

Inter-correlations

y

10 30 50 50 70 90 200 1000 0 4 8

10

40

70

10

30

50

x1

x2

40

04

40

50

70

90

x3

x4

10

40

70

20

01

00

0

x5

x6

50

02

00

0

10 40 70

04

8

400 440 10 40 70 500 2000

x7

Page 6: Data Analysis Using R: Multiple linear regression analysis

Stepwise regression

reg <- lm(y ~ ., data=REGdata)

step(reg, direction=”both”)

Page 7: Data Analysis Using R: Multiple linear regression analysis

Stepwise regressionStart: AIC= 134.07 y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 Df Sum of Sq RSS AIC- x5 1 4.54 2145.37 132.13- x1 1 23.17 2164.00 132.36- x2 1 109.34 2250.18 133.42- x3 1 130.90 2271.74 133.68<none> 2140.83 134.07- x4 1 168.31 2309.14 134.12- x7 1 377.09 2517.92 136.45- x6 1 681.09 2821.92 139.53

Step 1: AIC= 132.13 y ~ x1 + x2 + x3 + x4 + x6 + x7 Df Sum of Sq RSS AIC- x1 1 22.7 2168.1 130.4- x2 1 113.8 2259.1 131.5- x3 1 133.5 2278.9 131.8<none> 2145.4 132.1- x4 1 170.8 2316.2 132.2+ x5 1 4.5 2140.8 134.1- x7 1 375.7 2521.1 134.5- x6 1 1058.5 3203.8 141.0

Step 2: AIC= 130.42 y ~ x2 + x3 + x4 + x6 + x7 Df Sum of Sq RSS AIC- x2 1 96.8 2264.9 129.6- x3 1 122.0 2290.0 129.9<none> 2168.1 130.4- x4 1 187.4 2355.5 130.7+ x1 1 22.7 2145.4 132.1+ x5 1 4.1 2164.0 132.4- x7 1 385.0 2553.1 132.8- x6 1 1526.2 3694.3 142.8

Step 3: AIC= 129.59 y ~ x3 + x4 + x6 + x7 Df Sum of Sq RSS AIC- x3 1 25.4 2290.3 127.9- x4 1 90.9 2355.8 128.7<none> 2264.9 129.6+ x2 1 96.8 2168.1 130.4+ x5 1 8.3 2256.5 131.5+ x1 1 5.7 2259.1 131.5- x7 1 384.9 2649.7 131.8- x6 1 2015.6 4280.5 144.8

Page 8: Data Analysis Using R: Multiple linear regression analysis

Stepwise regressionStep 4: AIC= 127.9 y ~ x4 + x6 + x7 Df Sum of Sq RSS AIC- x4 1 73.5 2363.8 126.7<none> 2290.3 127.9+ x3 1 25.4 2264.9 129.6+ x1 1 11.3 2279.0 129.8+ x5 1 6.3 2284.0 129.8+ x2 1 0.3 2290.0 129.9- x7 1 486.6 2776.9 131.1- x6 1 1993.8 4284.1 142.8

Step 5: AIC= 126.75 y ~ x6 + x7 Df Sum of Sq RSS AIC<none> 2363.8 126.7+ x4 1 73.5 2290.3 127.9+ x1 1 33.4 2330.4 128.4+ x3 1 8.1 2355.8 128.7+ x5 1 7.7 2356.1 128.7+ x2 1 7.3 2356.6 128.7- x7 1 497.3 2861.2 129.9- x6 1 4477.0 6840.8 153.4

Call:lm(formula = y ~ x6 + x7, data = REGdata)Coefficients:(Intercept) x6 x7 2.52646 0.01852 2.18575

Page 9: Data Analysis Using R: Multiple linear regression analysis

Bayesian Model Average (BMA) analysis

library(BMA) xvars <- REGdata[,-1]co2 <- REGdata[,1]bma <- bicreg(xvars, co2, strict=FALSE, OR=20)summary(bma)

Page 10: Data Analysis Using R: Multiple linear regression analysis

Bayesian Model Average (BMA) analysis

16 models were selected Best 5 models (cumulative posterior probability = 0.6599 ): p!=0 EV SD model 1 model 2 model 3 model 4 model 5 Intercept 100.0 5.75672 14.6244 2.5264 6.1441 8.6120 7.5936 7.3537x1 12.4 -0.01807 0.1008 . . . -0.1393 . x2 10.4 -0.00075 0.0282 . . . . . x3 10.7 0.00011 0.0791 . . . . -0.0572x4 20.2 -0.03059 0.1020 . . -0.1419 . . x5 10.5 -0.00023 0.0030 . . . . . x6 100.0 0.01815 0.0040 0.0185 0.0193 0.0164 0.0162 0.0179x7 73.7 1.60766 1.2821 2.1857 . 2.1628 2.1233 2.2382 nVar 2 1 3 3 3 r2 0.700 0.636 0.709 0.704 0.701 BIC -25.8832 -24.0238 -23.4412 -22.9721 -22.6801post prob 0.311 0.123 0.092 0.072 0.063

Page 11: Data Analysis Using R: Multiple linear regression analysis

Bayesian Model Average (BMA) analysis

imageplot.bma(bma)

Models selected by BMA

Model #

1 2 3 4 5 6 7 8 10 13

x7

x6

x5

x4

x3

x2

x1