[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용

오늘 당장 딥러닝 실험하기

2015. 08. 30.

김현호

1

소개

김현호- UST 컴퓨터전공- 한국전자통신연구원자동통역연구실

- Team Popong mobile담당

- 인공지능, 기계학습, 자연어처리

- [email protected]

2

순서

1.Neural Network 이해2.Deep Neural Network

a.Pretraining

b.Rectified Linear Unit

c.Drop out

3.Theano library

4.Deep Learning code using Theano

5.Deep Learning for Natural Language

Processinga.Gensim library

b.automatic word spacing by Recurrent Neural Network

3

순서


a.Pretraining


c.Drop out

3.Theano library





4

순서


a.Pretraining


c.Drop out

3.Theano library





5

순서


a.Pretraining


c.Drop out

3.Theano library





6

순서


a.Pretraining


c.Drop out

3.Theano library




b.automatic word spacing by RNN

7

요즘딥러닝에대한관심

8

다수의딥러닝강연

9

Artificial Neural Network

15


인간신경망의구성요소인뉴런의동작방식이모티브가된기계학습시스템.

16

실제뉴런 vs 인공뉴런

17


18


19


20


신호전달방향

21


신호전달방향

22


신호전달방향

Weigh

t

23


24


25


26

Artificial Neural Network Learning

27

Artificial Neural Network Learning

28

Weight Weight Weight

Forward Propagation

29

Backward Propagation

30

Deep Neural Network

31

32

Deep Neural Network란….

Deep Neural Network란….

3층이상의 hidden layer를가진Artificial Neural Network

33

기존 Deep Learning의어려움

34

기존 Deep Learning의어려움

35

deeper than two or three level networks yieled

poorer results

Deep Learning이어려운이유

- Overfitting- Deep nets have lots of parameters

- Underfitting- Gradient descent Vanishing

36

Deep Learning의비약적발전

- Pretraining

- Drop Out

- Rectified Linear Unit

37

Pretraining 성능

38

“Why Does Unsupervised Pre-training Help Deep Learning?” 2010 bengio,

- pretraining initialization은random initialize보다better local minimum 에서시작한다.

39

Pretraining 성능

Without Pretraining With Pretraining

Pretraining방법1)Contrastive Divergence

a) http://www.quora.com/What-is-contrastive-divergence

b) https://www.youtube.com/watch?v=p4Vh_zMw-HQ&index=36&list=PL6Xpj9I5qXYEcOhn7TqghAJ6NAPrNmUBH

2)AutoEncoder

40

http://www.quora.com/What-is-contrastive-divergence

https://www.youtube.com/watch?v=p4Vh_zMw-HQ&index=36&list=PL6Xpj9I5qXYEcOhn7TqghAJ6NAPrNmUBH

Drop out

41

Rectified Linear Unit (ReLU)

42

Activation function


43

Activation function

Sigmoid function Rectified Linear Unit


44

Epoch sigmoid ReLU

1 0.7053 0.94332 0.8302 0.9647

3 0.8684 0.9723

3 0.8837 0.9737

4 0.89 0.9763

5 0.895 0.9792

... .... ...

... ... ...

11 0.9116 0.9829

12 0.9127 0.9838

13 0.9142 0.9821

14 0.9152 0.9838

15 0.9159 0.9832

Rectified Linear Unit (ReLU)실험결과

실험조건- code :

https://github.com/Newmu/The

ano-Tutorials

- data : mnist

45

https://github.com/Newmu/Theano-Tutorials

Data Sets

46

MNIST

47

Cifar-10

48

Data Sets

- MNIST- The MNIST database of handwritten digits

- 28x28 grayscale images

- 10 classes

- Cifar10- The CIFAR-10 dataset consists of 60000 32x32

colour images in 10 classes, with 6000 images per

class.

- word2vec

49

Deep Learning 실험

50

Deep Learning 실험시작

51

Theano 어원

- 여성수학자- 피타고라스의아내

52

Deep Learning Library 비교

53

출처 : http://t-robotics.blogspot.kr/2015/06/hw-sw.html#.Vd59KPntlBe

http://t-robotics.blogspot.kr/2015/06/hw-sw.html.Vd59KPntlBe

Theano

- Q) DNN을자동으로만들어주나요??

- A) 아니요, Deep Neural Network를

직접만들어야함…

54

Theano

55

- DNN model learning library

(x)

- matrix 연산등에유용한 library

(o)

Why Theano

- Definition- Theano is a Python library that allows you

to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.(http://deeplearning.net/software/theano/)

- Optimizing GPU-meta-programming code

generating array oriented optimizing math compiler

in Python(https://github.com/josephmisiti/awesome-machine-learning)

56

http://deeplearning.net/software/theano/

https://github.com/josephmisiti/awesome-machine-learning

Why Theano

- cuda code 작성하지않고, python

code로 gpu 연산수행

- grad(), updates, function()

- symbolic function

57

Why Theano - grad(), updates, function()

gradients = T.grad() 하면직접 gradient가계산된다.

ex)

x = T.scalar()

gx = T.grad(x**2, x) ← x**2를 x에대해서gradient 값을구한다. (= 2x)

58


300 updates = [

301 (param, param - learning_rate * gparam)

302 for param, gparam in zip(classifier.params, gparams)

303 ]

……..

308 train_model = theano.function(

309 inputs=[index],

310 outputs=cost,

311 updates=updates,

312 givens={

313 x: train_set_x[index * batch_size: (index + 1) * batch_size],

314 y: train_set_y[index * batch_size: (index + 1) * batch_size]

315 }

316 )

59


60

This module provides function(), commonly accessed as

theano.function, the interface for compiling graphs into

callable objects.

You’ve already seen example usage in the basic tutorial...

something like this:

>>> x = theano.tensor.dscalar()

>>> f = theano.function([x], 2*x)

>>> print f(4) # prints 8.0

http://deeplearning.net/software/theano/library/compile/function.html

http://deeplearning.net/software/theano/library/compile/function.html#module-function



61



callable objects.







inputoutput




62



callable objects.







inputoutput




x = dmatrix('x')y = dmatrix('y')z = x + yf = theano.function([x,y], z) scalarscalar

scalar


x = dmatrix('x')y = dmatrix('y')z = x + yf = theano.function([x,y], z)

Theano represents symbolic mathematical computationsas graphs

scalarscalar

scalar


x = theano.tensor.dscalar('x')y = theano.tensor.dscalar('y')z = x + yf = theano.function([x,y], z)print f(4,3)array(7.0)

scalarscalar

scalar

Install Theano

- Environment : ubuntu 14.04 64bit

- Install document : http://deeplearning.net/software/theano/install_ubun

tu.html#install-ubuntu

66

$ sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git

$ sudo pip install Theano

http://deeplearning.net/software/theano/install_ubuntu.html#install-ubuntu

Download Tutorial code

$ git clone https://github.com/lisa-lab/DeepLearningTutorials.git

Cloning into 'DeepLearningTutorials'...

remote: Counting objects: 3652, done.

remote: Total 3652 (delta 0), reused 0 (delta 0), pack-reused 3652

Receiving objects: 100% (3652/3652), 7.79 MiB | 2.32 MiB/s, done.

Resolving deltas: 100% (2161/2161), done.

Checking connectivity... done.

$ ls

DeepLearningTutorials

67

https://github.com/lisa-lab/DeepLearningTutorials.git

Run DBN

DeepLearningTutorials$ cd code

DeepLearningTutorials/code$ python DBN.py

Using gpu device 0: GeForce GTX 770

Downloading data from

http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz

... loading data

... building the model

... getting the pretraining functions

... pre-training the model

Pre-training layer 0, epoch 0, cost -98.5296





68

DBN.py

13 from logistic_sgd import LogisticRegression, load_data

303 datasets = load_data(dataset)

304

305 train_set_x, train_set_y = datasets[0]

306 valid_set_x, valid_set_y = datasets[1]

307 test_set_x, test_set_y = datasets[2]

69

DBN.py

18 # start-snippet-1

19 class DBN(object):

…….

314 print '... building the model'

315 # construct the Deep Belief Network

316 dbn = DBN(numpy_rng=numpy_rng, n_ins=28 * 28,

317 hidden_layers_sizes=[1000, 1000, 1000],

318 n_outs=10)

70

28 * 28

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

10

DBN.py

71

325 pretraining_fns = dbn.pretraining_functions(train_set_x=train_set_x,326 batch_size=batch_size,

327 k=k)

……………

353 print '... getting the finetuning functions'

354 train_fn, validate_model, test_model = dbn.build_finetune_functions(355 datasets=datasets,

356 batch_size=batch_size,

357 learning_rate=finetune_lr

358 )

DBN.py

72

228 train_fn = theano.function(

229 inputs=[index],

230 outputs=self.finetune_cost,

231 updates=updates,

232 givens={

233 self.x: train_set_x[

234 index * batch_size: (index + 1) * batch_size

235 ],

236 self.y: train_set_y[

237 index * batch_size: (index + 1) * batch_size

238 ]

239 }

240 )

DBN.py

73

380 while (epoch < training_epochs) and (not done_looping):

381 epoch = epoch + 1

382 for minibatch_index in xrange(n_train_batches):

383

384 minibatch_avg_cost = train_fn(minibatch_index)

385 iter = (epoch - 1) * n_train_batches + minibatch_index

386

387 if (iter + 1) % validation_frequency == 0:

388

389 validation_losses = validate_model()

390 this_validation_loss = numpy.mean(validation_losses)

DNN using ReLU

import theano

from theano import tensor as T

from theano.sandbox.rng_mrg import

MRG_RandomStreams as RandomStreams

import numpy as np

from load import mnist

74

DNN using ReLU

def floatX(X):

return np.asarray(X, dtype=theano.config.floatX)

def init_weights(shape):

return theano.shared(floatX(np.random.randn(*shape) * 0.01))

def rectify(X):

return T.maximum(X, 0.)

def softmax(X):

e_x = T.exp(X - X.max(axis=1).dimshuffle(0, 'x'))

return e_x / e_x.sum(axis=1).dimshuffle(0, 'x')

75

DNN using ReLU

def model(X, w_h, w_h2, w_o):

h = rectify(T.dot(X, w_h))

h2 = rectify(T.dot(h, w_h2))

py_x = softmax(T.dot(h2, w_o))

return h, h2, py_x

def prop(cost, params, lr=0.001):

grads = T.grad(cost=cost, wrt=params)

updates = []

for p, g in zip(params, grads):

updates.append((p, p - lr * g))

return updates

76

trX, teX, trY, teY = mnist(onehot=True)

X = T.fmatrix()

Y = T.fmatrix()

w_h = init_weights((784, 625))

w_h2 = init_weights((625, 625))

w_o = init_weights((625, 10))

77


X = T.fmatrix()

Y = T.fmatrix()




h, h2, py_x = model(X, w_h, w_h2, w_o)

y_x = T.argmax(py_x, axis=1)

78


X = T.fmatrix()

Y = T.fmatrix()






cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y))

params = [w_h, w_h2, w_o]

updates = prop(cost, params, lr=0.001)

79


X = T.fmatrix()

Y = T.fmatrix()






cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y))

params = [w_h, w_h2, w_o]

updates = prop(cost, params, lr=0.001)

train = theano.function(inputs=[X, Y], outputs=cost, updates=updates)

predict = theano.function(inputs=[X], outputs=y_x)

80

for i in range(100):

for start, end in zip(range(0, len(trX), 128), range(128, len(trX), 128)):

cost = train(trX[start:end], trY[start:end])

print np.mean(np.argmax(teY, axis=1) == predict(teX))

81

play with data

82

load_data()

172 def load_data(dataset):

…...

193 if os.path.isfile(new_path) or data_file == 'mnist.pkl.gz':

194 dataset = new_path

…...

204 print '... loading data'

205

206 # Load the dataset

207 f = gzip.open(dataset, 'rb')

208 train_set, valid_set, test_set = cPickle.load(f)

209 f.close()

83

data 만들기1

train_set.x.txt

84

input vector length

input

vector

size

data 만들기1

train_set.y.txt

85

input

vector

size

data 만들기1

86

from numpy import genfromtxt

import gzip, cPickle

…………….

train_set_x = genfromtxt(dir_path+"train_set.x.txt", delimiter=",")

…………………..

train_set = train_set_x, train_set_x

valid_set = valid_set_x, valid_set_x

test_set = test_set_x, test_set_x

print "writing to pkl.gz..."

data_set = [train_set, valid_set, test_set]

print "zip data into a file"

f= gzip.open(output_dir+str(i)+"_"+pkl_filename+".pkl.gz",'wb')

print "zip data file name is " + str(i)+"_"+pkl_filename+".pkl.gz"

cPickle.dump(data_set,f,protocol=2)

f.close()

for n, sentence in enumerate(file_lines):

……………………..

data_batch_fpath= vector_dir+"data_batch_"+str(n)+".npz"

……………………….

# save vector list

numpy.savez(data_batch_fpath,

data=numpy.asarray(sentence_vector_list),

labels=label_vector,

length=max_length,

dim=dimension)

87

data 만들기2

save, load model

88

save model

load model

Theano modes

89

Theano modes

90

.bashrc

226 # Theano Settings

227 export THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32,exception_verbosity=high

91

Deep Learning

For Natural Language Processing

92

Deep Learning


93

Deep Learning


- 나는밥을먹는다

Deep Learning


94

one-hot (1 of K)

representation

Deep Learning



- 나는밥을먹는다.

95

형태소단위로분리

one-hot (1 of K)

representation

Deep Learning




- 밥 = [0,0,0,0,0,0,0,………,0,0,0,0,1,0,0,0,0,0,0]

96

index 0(나) 1(가) 2(는) ... ... ... ... 999(.)

나 1 0 0 0 0 0 0 0

는 0 0 1 0 0 0 0 0

.. 0 0 0 0 0 1 0 0

.. 0 0 0 0 1 0 0 0

다 0 0 0 0 0 0 1 0


one-hot (1 of K)

representation

문자의벡터로표현

Deep Learning




97


word2vec

representation


Deep Learning




- Word2Vec model

- 밥 = [0.323112, -0.021232, …….. , 0.82123123]

98


word2vec

representation


Deep Learning


- 밥 = [0,0,0,0,0,0,0,………,0,0,0,0,1,0,0,0,0,0,0]

- 밥 = [0.323112, -0.021232, …….. , 0.82123123]

99

word2vec

representation

one-hot (1 of K)

representation

Gensim

- definition- Gensim is a Python library for topic modelling, document indexing

and similarity retrieval with large corpora

- word2vec class- word vector representation

- multi threading

- Skip Gram

- Continuous Bag of Words

100

Gensim - import, settings

101

# imports

9 from gensim.models.word2vec import LineSentence

10 from gensim.models import word2vec

32 # settings

33 THEADS = 8 # progress with multi threading

34 DIMENSION = 50

35 SKIPGRAM = 1 # 1 is skip gram, 0 is cbow

36 WINDOW_SIZE = 8

37 NTimes = 10 # repeat number of sentences

38 min_count_of_word = 5

………………..

65 from gensim import utils

Gensim - training, save model

102

97 # load raw sentence

98 sentences = LineSentence(input_train_file_path)

99 # model settings

100 model = word2vec.Word2Vec(size=dimension, workers=THEADS,

min_count=min_count_of_word, sg=SKIPGRAM, window=WINDOW_SIZE)

101

102 # build voca and train

103 number_iter = NTimes # number of iterations (epochs) over the corpus

104 model.build_vocab(sentences)

105

106 ss = utils.RepeatCorpusNTimes(sentences, number_iter)

107 model.train(ss)

108 # save model

109 model.save(model_file_name)

110 model.save_word2vec_format(model_file_name + '.bin', binary=True)

Gensim - load model, test

103

83 try:

84 model = utils.SaveLoad.load(fname=model_file_name)

85 except:

86 print "failed to load. Retrying by load_word2vec_format() !!"

87 model =word2vec.load_word2vec_format(fname=model_file_name+".bin")

297 x = model [w.decode('utf-8')]

314 mw, score = model.most_similar(positive=[x])[0]

315 print "most similar : ",mw

316 print "target vector :", x

‘서울’의 most similar words

104

most similar words similarity

대구 0.4282917082309723

광주 0.4046330451965332

부산 0.40132588148117065

울산 0.3863871693611145

수원 0.38555505871772766

청주 0.35919708013534546

안양 0.35622960329055786

주왕산 0.3543151617050171

평택 0.3505415618419647

cebu 0.34598737955093384

Auto word spacing

with Recurrent Neural Network

105

- 0 0 1 0 1 0 0


- [0.323112, -0.021232, …….. , 0.82123123]

Deep Learning 실험하면서어려웠던것들

- layer의개수, layer당 node의개수, learning

rate, epoch횟수, batch횟수, activation

function 선택등선택해야할 parameter들이많다.

- parameter 바꿔서실험결과를확인하는데에오래걸린다.

- big data이기때문에 gpu memory 문제

106

Thank you

107

Setting GPU

building lmdb for caffe

Softmax

functionBias

Negative

Log Likelihood

http://goo.gl/forms/IR45liXoQ3

http://goo.gl/forms/IR45liXoQ3

[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용

Technology

Transcript of [Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용