Julia0.3でランダムフォレスト

JuliaTokyo #1Julia 0.3でランダムフォレスト

@gepuro

自己紹介

@gepuro生息地: 電通大

専攻: 信頼性工学

よく使う手法: 生存時間解析

言語: R、たまにPython

Juliaを触ってた時間は、

24時間もない。

DecisionTree.jl

● bensadeghiさんが開発○ 他には、

■ MineSweeperSolver.jl■ METADATA.jl■ pyplot.jl

○ なども開発に関わっている。

● MITライセンス● 決定木はID3 algorithmで実装されている。

CARTで実装されたランダムフォレストは、@bicycle1885さんが開発している。https://github.com/bicycle1885/RandomForests.jl

「R言語による Random Forest 徹底入門－集団学習による分類・予測－」 - #TokyoR #11http://www.slideshare.net/hamadakoichi/introduction-torandomforest-tokyor

パッケージの追加

Pkg.add("DecisionTree")

パッケージの読み込み

using DecisionTreeusing RDatasets

irisデータを利用出来るようにするために、RDatasetsも読み込みました。

データの準備

iris = dataset("datasets", "iris")features = array(iris[:, 1:4])labels = array(iris[:, 5])

モデルの構築と予測

# モデルの構築, 引数(木の変数の数, 木の数)model = build_forest(labels, features, 2, 10)

#予測

apply_forest(model, features)

クロスバリデーション

accuracy = nfoldCV_forest(labels, features, 2, 10, 3)

何分割するかMean Accuracy: 0.94666666666666673-element Array{Float64,1}: 0.92 0.94 0.98

出力結果

ソースコードを覗いてみる(分類)function build_forest(labels::Vector, features::Matrix, nsubfeatures::Integer, ntrees::Integer, partialsampling=0.7)

partialsampling = partialsampling > 1.0 ? 1.0 : partialsamplingNlabels = length(labels)Nsamples = int(partialsampling * Nlabels)forest = @parallel (vcat) for i in [1:ntrees]

inds = rand(1:Nlabels, Nsamples) build_tree(labels[inds], features[inds,:], nsubfeatures)

endreturn Ensemble([forest])

end

● バージョン0.2の頃に合わせているのか、データの持ち方がVectorとMtrixになってる。● arrayとの違いはなんだろうか？● 引数は、nsubfeatrues, ntrees, partialsamplingの３つ● partialsamplingは、指定しなくても動作する● 並列処理に対応しているっぽい

ソースコードを覗いてみる(回帰)function build_forest{T<:FloatingPoint,U<:Real}(labels::Vector{T},features::Matrix{U}, nsubfeatures::Integer, ntrees::Integer, maxlabels=0.5, partialsampling=0.7)

partialsampling = partialsampling > 1.0 ? 1.0 : partialsamplingNlabels = length(labels)Nsamples = int(partialsampling * Nlabels)forest = @parallel (vcat) for i in [1:ntrees]

inds = rand(1:Nlabels, Nsamples) build_tree(labels[inds], features[inds,:], maxlabels, nsubfeatures)

endreturn Ensemble([forest])

end

● Javaでいうオーバーライドが出来るのかな？● 関数を宣言した直後に型を指定？● パラメータにmaxlabelsが追加されている。葉あたりの平均サンプル数を指定する

参考

● bensadeghi/DecisionTree.jl　https://github.com/bensadeghi/DecisionTree.jl● 「R言語による Random Forest 徹底入門－集団学習による分類・予測－」 -

#TokyoR #11　http://www.slideshare.net/hamadakoichi/introduction-torandomforest-tokyor

https://github.com/bensadeghi/DecisionTree.jl

http://www.slideshare.net/hamadakoichi/introduction-torandomforest-tokyor



Julia0.3でランダムフォレスト

Technology

Transcript of Julia0.3でランダムフォレスト