高速・省メモリにlibsvm形式で ダンプする方法を研究してみた

Post on 06-Jan-2017

640 views 1 download

Transcript of 高速・省メモリにlibsvm形式で ダンプする方法を研究してみた

libsvm

2016/11/27hskksk @ JapanR 2016

• :

• : R, Python, C++

• :

• :

xgboost kaggler

Bosch Production LinePerformance 15

xgboost

xgb.DMatrix

# feature

label = readRDS("label.rds")feature_set_A = readRDS("feature_set_A.rds")feature_set_B = readRDS("feature_set_B.rds")

# feature cbind

mat = cbind( feature_set_A, feature_set_B)

↑# DMatrix

dmat = xgb.DMatrix(mat, label=label)

cbind

※cbind rm(vars); gc()

xgb.DMatrix

Python

libsvm※R

1. cbind libsvm2. DMatrix

cbind libsvm

data.table::fwrite_libsvm(list_of_matrices, file)

data.table fork fwrite

# feature

label = readRDS("label.rds")feature_set_A = fread("feature_set_A.csv")feature_set_B = fread("feature_set_B.csv")

# feature list

# 1 label

matrices = list(label, feature_set_A, feature_set_B)

# libsvm

fwrite_libsvm(matrices, "libsvm.txt")

# DMatrix

dmat = xgb.DMatrix("libsvm.txt")

fwrite OpenMP

8.5GB/120sec @ Xeon 2.5GHz ✕ 8

data.table PR

https://github.com/hskksk/data.table

kaggler !!

Enjoy Kaggling with R !!