20150329 tokyo r47
-
Upload
takashi-kitano -
Category
Technology
-
view
2.909 -
download
3
Transcript of 20150329 tokyo r47
-
2
!
Tokyo.R#47 2015-03-28 @kashitan
-
> summary(kashitan)
TwitterID : @kashitan
:
2
-
3/13
3
-
4
-
43 R@
Tokyo.R#43 LT 32bit WindowsRandom Forest@fqz7c3
-
1.
2.
3.
4.
5.
7
-
1.
-
9
-
Wikipedia(1/2)
10
-
Wikipedia(2/2)
11
-
https://citizennet.com/blog/2012/11/10/random-forests-ensembles-and-performance-metrics/
-
Tokyo.R#21 LT @holidayworking
13
-
> library(randomForest)> mdl print(mdl)!Call: randomForest(formula = AGI ~ . - INSTWGHT, data = d.t) Type of random forest: classification Number of trees: 500No. of variables tried at each split: 6! OOB estimate of error rate: 6.2%Confusion matrix: - 50000. 50000+. class.error - 50000. 187117 23 0.0001229026 50000+. 12353 29 0.9976578905
14
-
2.
-
Tokyo.R#20@sfchaos
16
-
Tokyo.R#20@sfchaos
17
-
R
-
R
randomForest(, classwt=c(1, n))
!
!
!
19
-
> system.time(mdl.wt print(mdl.wt)!Call: randomForest(formula = AGI ~ . - INSTWGHT, data = d.t, classwt = c(1, 15)) Type of random forest: classification Number of trees: 500No. of variables tried at each split: 6! OOB estimate of error rate: 6.21%Confusion matrix: - 50000. 50000+. class.error - 50000. 187140 0 0 50000+. 12382 0 1
-
3.
-
http://d.hatena.ne.jp/shakezo/20121221/1356089207
-
R
-
ntree
!
mtry1 tuneRF()
25
-
tuneRF()
> system.time(mdl.tune
-
(mtry)6
27
-
> plot(mdl)
(ntree)100
28
-
> mdl.tuned print(mdl.tuned)!Call: randomForest(formula = AGI ~ . - INSTWGHT, data = d.t, ntree = 100, mtry = 6) Type of random forest: classification Number of trees: 100No. of variables tried at each split: 6! OOB estimate of error rate: 6.18%Confusion matrix: - 50000. 50000+. class.error - 50000. 187090 50 0.0002671797 50000+. 12271 111 0.9910353739
-
4.
-
importance() varImpPlot()
!
patialPlot()
33
-
importance()
> importance(mdl.tuned) MeanDecreaseGiniAAGE 1705.651869ACLSWKR 544.340658ADTIND 1649.357768ADTOCC 2332.457474AHGA 1823.620156AHRSPAY 228.468096AHSCOL 8.161362AMARITL 340.210957AMJIND 915.882423AMJOCC 1216.616396ARACE 175.041013AREORGN 148.378241ASEX 575.004856AUNMEM 230.627948AUNTYPE 43.356676
-
varImpPlot()
> varImpPlot(mdl.tuned)
-
patialPlot()
> partialPlot(mdl.tuned, d.t, ADTOCC," 50000+.")
37
-
patialPlot()
> partialPlot(mdl.tuned, d.t, AAGE," 50000+.")
-
5.
-
https://citizennet.com/blog/2012/11/10/random-forests-ensembles-and-performance-metrics/
R
-
!
3,888 !
P.144
40
- > library(foreach)> library(doMC)> registerDoMC(4)> system.time(+ mdl.p
-
1.
2.
3.
4.
5.
43