20150329 tokyo r47

44
レベル2を指すのための ランダムフォレスト まとめ Tokyo.R#47 20150328 @kashitan

Transcript of 20150329 tokyo r47

  • 2

    !

    Tokyo.R#47 2015-03-28 @kashitan

  • > summary(kashitan)

    TwitterID : @kashitan

    :

    2

  • 3/13

    3

  • 4

  • 43 R@

    Tokyo.R#43 LT 32bit WindowsRandom Forest@fqz7c3

  • 1.

    2.

    3.

    4.

    5.

    7

  • 1.

  • 9

  • Wikipedia(1/2)

    10

  • Wikipedia(2/2)

    11

  • https://citizennet.com/blog/2012/11/10/random-forests-ensembles-and-performance-metrics/

  • Tokyo.R#21 LT @holidayworking

    13

  • > library(randomForest)> mdl print(mdl)!Call: randomForest(formula = AGI ~ . - INSTWGHT, data = d.t) Type of random forest: classification Number of trees: 500No. of variables tried at each split: 6! OOB estimate of error rate: 6.2%Confusion matrix: - 50000. 50000+. class.error - 50000. 187117 23 0.0001229026 50000+. 12353 29 0.9976578905

    14

  • 2.

  • Tokyo.R#20@sfchaos

    16

  • Tokyo.R#20@sfchaos

    17

  • R

  • R

    randomForest(, classwt=c(1, n))

    !

    !

    !

    19

  • > system.time(mdl.wt print(mdl.wt)!Call: randomForest(formula = AGI ~ . - INSTWGHT, data = d.t, classwt = c(1, 15)) Type of random forest: classification Number of trees: 500No. of variables tried at each split: 6! OOB estimate of error rate: 6.21%Confusion matrix: - 50000. 50000+. class.error - 50000. 187140 0 0 50000+. 12382 0 1

  • 3.

  • http://d.hatena.ne.jp/shakezo/20121221/1356089207

  • R

  • ntree

    !

    mtry1 tuneRF()

    25

  • tuneRF()

    > system.time(mdl.tune

  • (mtry)6

    27

  • > plot(mdl)

    (ntree)100

    28

  • > mdl.tuned print(mdl.tuned)!Call: randomForest(formula = AGI ~ . - INSTWGHT, data = d.t, ntree = 100, mtry = 6) Type of random forest: classification Number of trees: 100No. of variables tried at each split: 6! OOB estimate of error rate: 6.18%Confusion matrix: - 50000. 50000+. class.error - 50000. 187090 50 0.0002671797 50000+. 12271 111 0.9910353739

  • 4.

  • importance() varImpPlot()

    !

    patialPlot()

    33

  • importance()

    > importance(mdl.tuned) MeanDecreaseGiniAAGE 1705.651869ACLSWKR 544.340658ADTIND 1649.357768ADTOCC 2332.457474AHGA 1823.620156AHRSPAY 228.468096AHSCOL 8.161362AMARITL 340.210957AMJIND 915.882423AMJOCC 1216.616396ARACE 175.041013AREORGN 148.378241ASEX 575.004856AUNMEM 230.627948AUNTYPE 43.356676

  • varImpPlot()

    > varImpPlot(mdl.tuned)

  • patialPlot()

    > partialPlot(mdl.tuned, d.t, ADTOCC," 50000+.")

    37

  • patialPlot()

    > partialPlot(mdl.tuned, d.t, AAGE," 50000+.")

  • 5.

  • https://citizennet.com/blog/2012/11/10/random-forests-ensembles-and-performance-metrics/

    R

  • !

    3,888 !

    P.144

    40

  • > library(foreach)> library(doMC)> registerDoMC(4)> system.time(+ mdl.p
  • 1.

    2.

    3.

    4.

    5.

    43