深層学習入門

深層学習ダヌシカ　ボレガラ

深層学習 Deep Learning

概要

• 応用

• 理論

• 実践

3

深層学習• 英語ではDeep Learning

• 特徴の数段の組み合わせを考慮することでより複雑な現象を学習する仕組み

• 神経網回路(ニューラル・ネットワーク)の層を重ねることで学習することが殆ど

• 人間の脳の仕組みに似ている？

• とにかく，様々な認識タスクで大幅に良い精度を示しており，Google, Facebook,Microsoft,…など多くの企業が研究開発をしている．

4

イメージ図

5

ピクセル色特徴

線認識形認識物体認識

教師信号

6

Deep Learning in the News

13!

Researcher Dreams Up Machines That Learn Without Humans 06.27.13

Scientists See Promise in Deep-Learning Programs John Markoff November 23, 2012

Google!taps!U!of!T!professor!to!teach!context!to!computers!03.11.13!

報道

slide credit: Bengio KDD’14

7


13!




報道


ヒントン先生(Google)

8


13!




報道


ベンジオ先生 Montreal大学

9


13!




報道


LeCun先生 (Facebook)

応用

深層学習の実世界応用• 画像認識

• 物体認識

• 情報検索

• 画像検索，機械翻訳，類似検索

• 音声

• 音声認識，話者同定

• ロボット工学

• 自動運転, ゲーム11

認識タスクの概要

12

入力低次元特徴学習アルゴリズム

How$is$computer$percep>on$done?$

Image$ LowTlevel$vision$features$

Recogni>on$

Object$detec>on$

Input Data Learning Algorithm

Low-level features

Slide$Credit:$Honglak$Lee$

Audio$classifica>on$

Audio$ LowTlevel$audio$features$

Speaker$iden>fica>on$



Recogni>on$

Object$detec>on$


Low-level features







Recogni>on$

Object$detec>on$


Low-level features





物体認識

低次元画像特徴認識



Recogni>on$

Object$detec>on$


Low-level features







Recogni>on$

Object$detec>on$


Low-level features







Recogni>on$

Object$detec>on$


Low-level features





話者同定

低次元音声特徴同定

slide credit: Honglak Lee

画像特徴量

13

Computer$vision$features$

SIFT$ Spin$image$

HoG$ RIFT$

Textons$ GLOH$Slide$Credit:$Honglak$Lee$

音声特徴量

14

Audio$features$

ZCR$

Spectrogram$ MFCC$

Rolloff$Flux$

画像認識

• デモ：http://deeplearning.cs.toronto.edu/15

Figure 4: (Left) Eight ILSVRC-2010 test images and the five labels considered most probable by our model.The correct label is written under each image, and the probability assigned to the correct label is also shownwith a red bar (if it happens to be in the top 5). (Right) Five ILSVRC-2010 test images in the first column. Theremaining columns show the six training images that produce feature vectors in the last hidden layer with thesmallest Euclidean distance from the feature vector for the test image.

In the left panel of Figure 4 we qualitatively assess what the network has learned by computing itstop-5 predictions on eight test images. Notice that even off-center objects, such as the mite in thetop-left, can be recognized by the net. Most of the top-5 labels appear reasonable. For example,only other types of cat are considered plausible labels for the leopard. In some cases (grille, cherry)there is genuine ambiguity about the intended focus of the photograph.

Another way to probe the network’s visual knowledge is to consider the feature activations inducedby an image at the last, 4096-dimensional hidden layer. If two images produce feature activationvectors with a small Euclidean separation, we can say that the higher levels of the neural networkconsider them to be similar. Figure 4 shows five images from the test set and the six images fromthe training set that are most similar to each of them according to this measure. Notice that at thepixel level, the retrieved training images are generally not close in L2 to the query images in the firstcolumn. For example, the retrieved dogs and elephants appear in a variety of poses. We present theresults for many more test images in the supplementary material.

Computing similarity by using Euclidean distance between two 4096-dimensional, real-valued vec-tors is inefficient, but it could be made efficient by training an auto-encoder to compress these vectorsto short binary codes. This should produce a much better image retrieval method than applying auto-encoders to the raw pixels [14], which does not make use of image labels and hence has a tendencyto retrieve images with similar patterns of edges, whether or not they are semantically similar.

7 Discussion

Our results show that a large, deep convolutional neural network is capable of achieving record-breaking results on a highly challenging dataset using purely supervised learning. It is notablethat our network’s performance degrades if a single convolutional layer is removed. For example,removing any of the middle layers results in a loss of about 2% for the top-1 performance of thenetwork. So the depth really is important for achieving our results.

To simplify our experiments, we did not use any unsupervised pre-training even though we expectthat it will help, especially if we obtain enough computational power to significantly increase thesize of the network without obtaining a corresponding increase in the amount of labeled data. Thusfar, our results have improved as we have made our network larger and trained it longer but we stillhave many orders of magnitude to go in order to match the infero-temporal pathway of the humanvisual system. Ultimately we would like to use very large and deep convolutional nets on videosequences where the temporal structure provides very helpful information that is missing or far lessobvious in static images.

8

Krizhevsky+ NIPS’12

http://deeplearning.cs.toronto.edu/

類似画像検索

16

Figure 4: (Left) Eight ILSVRC-2010 test images and the five labels considered most probable by our model.The correct label is written under each image, and the probability assigned to the correct label is also shownwith a red bar (if it happens to be in the top 5). (Right) Five ILSVRC-2010 test images in the first column. Theremaining columns show the six training images that produce feature vectors in the last hidden layer with thesmallest Euclidean distance from the feature vector for the test image.

In the left panel of Figure 4 we qualitatively assess what the network has learned by computing itstop-5 predictions on eight test images. Notice that even off-center objects, such as the mite in thetop-left, can be recognized by the net. Most of the top-5 labels appear reasonable. For example,only other types of cat are considered plausible labels for the leopard. In some cases (grille, cherry)there is genuine ambiguity about the intended focus of the photograph.

Another way to probe the network’s visual knowledge is to consider the feature activations inducedby an image at the last, 4096-dimensional hidden layer. If two images produce feature activationvectors with a small Euclidean separation, we can say that the higher levels of the neural networkconsider them to be similar. Figure 4 shows five images from the test set and the six images fromthe training set that are most similar to each of them according to this measure. Notice that at thepixel level, the retrieved training images are generally not close in L2 to the query images in the firstcolumn. For example, the retrieved dogs and elephants appear in a variety of poses. We present theresults for many more test images in the supplementary material.

Computing similarity by using Euclidean distance between two 4096-dimensional, real-valued vec-tors is inefficient, but it could be made efficient by training an auto-encoder to compress these vectorsto short binary codes. This should produce a much better image retrieval method than applying auto-encoders to the raw pixels [14], which does not make use of image labels and hence has a tendencyto retrieve images with similar patterns of edges, whether or not they are semantically similar.

7 Discussion

Our results show that a large, deep convolutional neural network is capable of achieving record-breaking results on a highly challenging dataset using purely supervised learning. It is notablethat our network’s performance degrades if a single convolutional layer is removed. For example,removing any of the middle layers results in a loss of about 2% for the top-1 performance of thenetwork. So the depth really is important for achieving our results.

To simplify our experiments, we did not use any unsupervised pre-training even though we expectthat it will help, especially if we obtain enough computational power to significantly increase thesize of the network without obtaining a corresponding increase in the amount of labeled data. Thusfar, our results have improved as we have made our network larger and trained it longer but we stillhave many orders of magnitude to go in order to match the infero-temporal pathway of the humanvisual system. Ultimately we would like to use very large and deep convolutional nets on videosequences where the temporal structure provides very helpful information that is missing or far lessobvious in static images.

8

Krizhevsky+ NIPS’12

今は20層の深層学習 by Google!

衛星写真から道の推定

17

Predic>ng$Roads$from$Satellite$Images$

(Mnih$and$Hinton, ICML 2012)!

衛星写真から道の推定

18

Predic>ng$Roads$from$Satellite$Images$

(Mnih$and$Hinton, ICML 2012)!

アナロジー問題• 単語の意味をどのように表すか

• 他の単語との共起頻度を要素とするベクトルで表す

• 分布説

• 単語の意味はその周辺単語で決まる

• 単語の意味表現学習する問題を「ある単語の周辺に出現する他の単語を推定する」問題として解釈する．

• v(king) - v(man) + v(woman) が v(queen)と似ている!19

理論

深層学習の歴史

• ニューラル・ネットワークは1950年代からあった

• 一層のニューラル・ネットワークでは線で分類できるものしか認識できない

• ミンスキー　1960年ころ

21

パーセプトロン• Perceptronは単一層からなるニューラル・ネットワークである

22

x1 x2 x3 xn

＋

…

w1

w2 wn wn

パーセプトロン• Perceptronは単一層からなるニューラル・ネットワークである

23

x1 x2 x3 xn

＋

…

w1

w2 wn wn

s = x1w1 + x2w2 + … + xnwn

if s > 0: return 1 else: return 0

-2.4 -2 -1.6 -1.2 -0.8 -0.4 0 0.4 0.8 1.2 1.6 2 2.4

-1.5

-1

-0.5

0.5

1

1.5

ロジスティクス関数

線型分離可能性とは

24

+

+

+

+ ++ +

---

--

-

--

---

-

２次元では線ですが，多次元だと平面となる． ax + by +c = 0

線型分離不可能な問題

25

XOR (排他的論理和)

x=0 x=1

y=0 0 1

y=1 1 0x

y

-

- -- -

--

-

-++ +

+

+ ++

+

+


26


x=0 x=1

y=0 0 1

y=1 1 0x

y

-

- -- -

--

-

-++ +

+

+ ++

+

+


27


x=0 x=1

y=0 0 1

y=1 1 0x

y

-

- -- -

--

-

-++ +

+

+ ++

+

+


28


x=0 x=1

y=0 0 1

y=1 1 0x

y

-

- -- -

--

-

-++ +

+

+ ++

+

+

どう頑張っても線を引いて分離不可能!

そんな単純なこともできないの？• XORは至るところで現るロジック

• そんな単純なものも表現できないなら何の約にも立たない

• 複数の層にすれば扱えるが重みの学習をどうして良いか分からない

• error backpropagation (誤差逆伝播法)が1980年代に考案され，使われる

• しかし，複数層で学習すると「過学習になりやすい」し遅い．そんなに沢山のパラメータが学習できるほどデータがない．

• ニューラル・ネットワークの暗黒時代29

過学習 (over-fitting)

• 学習データに当てはめすぎて，汎化できなくなる現象

• 学習時：高精度，テスト時：低精度

• 原因

• モデルの自由度が高すぎる

• 学習データの量が少なすぎる

30

過学習

31

Non-linearities: Why they’re needed •  For$logis>c$regression:$map$to$probabili>es$•  Here:$func>on$approxima>on,$$

e.g.,$regression$or$classifica>on$•  Without$non;lineari>es,$deep$neural$networks$can’t$do$anything$more$than$a$linear$transform$

•  Extra$layers$could$just$be$compiled$down$into$a$single$linear$transform$

•  Probabilis>c$interpreta>on$unnecessary$except$in$the$Boltzmann$machine/graphical$models$

•  People$o\en$use$other$non;lineari>es,$such$as$tanh,$as$we’ll$discuss$in$part$3$

31$

同じ10個の点を多項式を使って説明する場合（多項式を当てはまる場合）できるだけ小さい次元の多項式を使った方が過学習になり難い． !問題は次元数よりも，特徴の値の変化である． !オッカムのカミソリ

同じ現象を説明するならより簡単な仮説の方が望ましい

深層学習の長点• 特徴を自分で考えなくて良い，基本特徴量さえ作成しておけば，その有効な組み合わせが自動的に学習できる．

• ラベル無しのデータからでもこの「有効な組み合わせ」が学習可能

• 大域的な表現が可能 (distributed representations)

• 深層学習（大域的な表現学習）vs. クラスタリング(局所的な表現学習)

32

局所的 vs. 大域的表現

33

Local$vs.$Distributed$Representa>ons$• $Clustering,$Nearest$Neighbors,$RBF$SVM,$local$density$es>mators$$$

Learned$prototypes$

Local$regions$C1=1$

C1=0$

C2=1$

C2=1$C1=1$C2=0$

C1=0$C2=0$

• $RBMs,$Factor$models,$PCA,$Sparse$Coding,$Deep$models$

C2$C1$ C3$

• $Parameters$for$each$region.$• $#$of$regions$is$linear$with$$$$$$$$$#$of$parameters.$

Bengio, 2009, Foundations and Trends in Machine Learning!

slide credit: Salakhutdinov KDD’14

局所的 vs. 大域的表現

34slide credit: Salakhutdinov KDD’14

Local$vs.$Distributed$Representa>ons$• $Clustering,$Nearest$Neighbors,$RBF$SVM,$local$density$es>mators$$$

Learned$prototypes$

Local$regions$

C3=0$

C1=1$

C1=0$

C3=0$C3=0$

C2=1$

C2=1$C1=1$C2=0$

C1=0$C2=0$C3=0$

C1=1$C2=1$C3=1$

C1=0$C2=1$C3=1$

C1=0$C2=0$C3=1$

• $RBMs,$Factor$models,$PCA,$Sparse$Coding,$Deep$models$

• $Parameters$for$each$region.$• $#$of$regions$is$linear$with$$$$$$$$$#$of$parameters.$

C2$C1$ C3$

Bengio, 2009, Foundations and Trends in Machine Learning!

深層学習の大発見！• 層を重ねることでより表現豊なニューラル・ネットワークが学習できることが分かっていたが，いかに過学習を避けるかが課題だった

• 貪欲法(greedy layer-wise training)

• ２つの層だけで学習を行い，そのように学習させた２つずつの層を重ねるとよい!

• A Fast Learning Algorithm for Deep Belief Nets, Hinton et al., Neural Computing, 2006.

35

Layer-wise Unsupervised Learning

… input

40!

Layer-Wise Unsupervised Pre-training

…

…

input

features

41!


…

…

…

input

features

reconstruction of input

= ?

… input

42!


…

…

input

features

43!


…

…

input

features

… More abstract features

44!

…

…

input

features


reconstruction of features

= ?

… … … …

Layer-Wise Unsupervised Pre-training Layer-wise Unsupervised Learning

45!

…

…

input

features



46!

…

…

input

features


… Even more abstract

features

Layer-wise Unsupervised Learning

47!

教師なし事前学習 (unsupervised pre-training)

…

…

input

features


… Even more abstract

features

Output f(X) six

Target Y

two! = ?

Supervised Fine-Tuning

•  AddiMonal!hypothesis:!features!good!for!P(x)!good!for!P(y|x)!48!

slide credit: Bengio KDD 2014

教師有り事後学習(supervised post-training)

深層学習手法• 大きく分けて２種類の有名な手法がある

• 自己符号化器 (Autoencoders AE)

• 数学が簡単

• 実装が簡単

• 理論的な解析がし難い

• 制限ボルツマンマシン(Restricted Boltzman Machine RBM)

• 確率モデルに基づく

• 一部の条件ではAEと同じ最適化を行っている45

深層学習手法• 大きく分けて２種類の有名な手法がある

• 自己符号化器 (Autoencoders AE)

• 数学が簡単

• 実装が簡単

• 理論的な解析がし難い

• 制限ボルツマンマシン(Restricted Boltzman Machine RBM)

• 確率モデルに基づく

• 一部の条件ではAEと同じ最適化を行っている46

自己符号化器 Autoencoder

47

入力

特徴表現

Encoder (符号化器)

Decoder (復号化器)

自己符号化器 Autoencoder

48

入力

特徴表現

Encoder (符号化器)

Decoder (復号化器)

x

y

y = f(Wx + b)

z = f(WTy+ b’)

誤差=||x-z||2

自己符号化器

49

x1 x2 x3 +1

入力層

y1 y2 y3 +1

隠れ層

自己符号化器

50

x1 x2 x3 +1

入力層

y1 y2 y3 +1

隠れ層

wij : xj —> yi

b3

W32

符号化器の部分

詳細• 復号化器の行列W’は符号化器の行列Wの転置とすることで学習すべきパラメータ数を減らす

• bとb’はバイヤス項と呼ばれており，入力の値がどの基本レベルを中心に変動しているかを表わしている．

• 常にONとなっている(+1)特徴が存在すると仮定し，それに対する重みとしてバイヤスを解釈することができる．

• 関数fはそれぞれの次元毎に何らかの非線形な演算を行っている．これがなければ主成分分析(PCA)と同じく，線形な関係しか扱うことができない！

51

基本的な流れ1. 入力xを符号化器で符号化する．

1. Wx + bを計算し，それをfに入力する

2. (1)の出力yを復号化器に入力させて，入力を再現する．

1. WTy+ b’を計算し，それをfに入力する

3. (2)の結果zと元の入力xを比較する．

1. 誤差関数を問題依存する．実数特徴ならば自乗誤差，バイナリならば交差エントロピー誤差を使う．

4. (3)の誤差を最小となるようにパラメータ(W, b, b’)を調整する．

1. 誤差をパラメータで微分し，それをゼロとなるようなパラメータ値を求める．(確率的勾配法)

52

確率的勾配法• 英語：Stochastic Gradient Descent (SGD)

• 誤差が最小となる方向(勾配と逆向き)へ少しずつ移動する

53

θ tan(θ)=∂y/∂x勾配

x

(t+1) = x

(t) � ⌘

@E(x, z)

@x

|t=t

確率的勾配法• 英語：Stochastic Gradient Descent (SGD)

• 誤差が最小となる方向(勾配と逆向き)へ少しずつ移動する

54

θ tan(θ)=∂y/∂x勾配

x

(t+1) = x

(t) � ⌘

@E(x, z)

@x

|t=t

現代の多くの機械学習アルゴリズムの最適化に

使われている

まずは手計算

55

x1 x2 x3 +1

y1 y2 y3 +1

b3

W32

自己符号化器の導出

ボレガラ　ダヌシカ

1 ３つの入力ノードと 3つの隠れノードの場合

y1 = f(x1W11 + x2W12 + x3W13 + b1) (1)y2 = f(x1W21 + x2W22 + x3W23 + b2) (2)y3 = f(x1W31 + x2W32 + x3W33 + b3) (3)

1





ここでf(t) =

1

1 + exp(−t)(4)

1

行列の形で表すと

56





ここでf(t) =

1

1 + exp(−t)(4)

行列式として書くと，

W =

⎛

⎝w11 w12 w13

w21 w22 w23

w31 w32 w33

⎞

⎠ (5)

1





ここでf(t) =

1

1 + exp(−t)(4)


W =

⎛

⎝w11 w12 w13

w21 w22 w23

w31 w32 w33

⎞

⎠ (5)

b =

⎛

⎝b1b2b3

⎞

⎠ (6)

1





ここでf(t) =

1

1 + exp(−t)(4)


W =

⎛

⎝w11 w12 w13

w21 w22 w23

w31 w32 w33

⎞

⎠ (5)

b =

⎛

⎝b1b2b3

⎞

⎠ (6)

x =

⎛

⎝x1

x2

x3

⎞

⎠ (7)

y =

⎛

⎝y1y2y3

⎞

⎠ (8)

z =

⎛

⎝z1z2z3

⎞

⎠ (9)

b′ =

⎛

⎝b′1b′2b′3

⎞

⎠ (10)

1





ここでf(t) =

1

1 + exp(−t)(4)


W =

⎛

⎝w11 w12 w13

w21 w22 w23

w31 w32 w33

⎞

⎠ (5)

b =

⎛

⎝b1b2b3

⎞

⎠ (6)

x =

⎛

⎝x1

x2

x3

⎞

⎠ (7)

y =

⎛

⎝y1y2y3

⎞

⎠ (8)

z =

⎛

⎝z1z2z3

⎞

⎠ (9)

b′ =

⎛

⎝b′1b′2b′3

⎞

⎠ (10)

1





ここでf(t) =

1

1 + exp(−t)(4)


W =

⎛

⎝w11 w12 w13

w21 w22 w23

w31 w32 w33

⎞

⎠ (5)

b =

⎛

⎝b1b2b3

⎞

⎠ (6)

x =

⎛

⎝x1

x2

x3

⎞

⎠ (7)

y =

⎛

⎝y1y2y3

⎞

⎠ (8)

z =

⎛

⎝z1z2z3

⎞

⎠ (9)

b′ =

⎛

⎝b′1b′2b′3

⎞

⎠ (10)

1





ここでf(t) =

1

1 + exp(−t)(4)


W =

⎛

⎝w11 w12 w13

w21 w22 w23

w31 w32 w33

⎞

⎠ (5)

b =

⎛

⎝b1b2b3

⎞

⎠ (6)

x =

⎛

⎝x1

x2

x3

⎞

⎠ (7)

y =

⎛

⎝y1y2y3

⎞

⎠ (8)

z =

⎛

⎝z1z2z3

⎞

⎠ (9)

b′ =

⎛

⎝b′1b′2b′3

⎞

⎠ (10)

1





ここでf(t) =

1

1 + exp(−t)(4)


W =

⎛

⎝w11 w12 w13

w21 w22 w23

w31 w32 w33

⎞

⎠ (5)

b =

⎛

⎝b1b2b3

⎞

⎠ (6)

x =

⎛

⎝x1

x2

x3

⎞

⎠ (7)

y =

⎛

⎝y1y2y3

⎞

⎠ (8)

z =

⎛

⎝z1z2z3

⎞

⎠ (9)

b′ =

⎛

⎝b′1b′2b′3

⎞

⎠ (10)

y = f(Wx+ b) (11)

1

復号化の部分

57

復号化の部分は，

z1 = f(y1W11 + y2W21 + y3W31 + b′1) (12)z2 = f(y1W12 + y2W22 + y3W32 + b′2) (13)z3 = f(y1W13 + y2W23 + y3W33 + b′3) (14)

2




z = f(W⊤y + b′) (15)

2




z = f(W⊤y + b′) (15)

自乗誤差 (squared loss)を考える

||x− z||2 =d!

k=1

(xk − zk)2 (16)

2

自乗誤差

パラメータ更新

58




z = f(W⊤y + b′) (15)


||x− z||2 =d!

k=1

(xk − zk)2 (16)

例えば，w12 の値を更新するための更新式を導出する．

∂ ||x− z||2

∂w12= −2(x− z)

∂z2∂w12

∂z2∂wij

=∂

∂wijf(w12jy1 + w22y2 + w32y3 + b′2)

ここでt = w12jy1 + w22y2 + w32y3 + b′2

とすると，∂f(t)

∂w12=

∂f(t)

∂t

∂t

∂w12=

exp(−t)

(1 + exp(−t))2y1 = σ(t)(1− σ(t))y1

2




z = f(W⊤y + b′) (15)


||x− z||2 =d!

k=1

(xk − zk)2 (16)


∂ ||x− z||2

∂w12= −2(x− z)

∂z2∂w12

∂z2∂wij

=∂

∂wijf(w12jy1 + w22y2 + w32y3 + b′2)

ここでt = w12jy1 + w22y2 + w32y3 + b′2


∂w12=

∂f(t)

∂t

∂t

∂w12=

exp(−t)

(1 + exp(−t))2y1 = σ(t)(1− σ(t))y1

よって，∂ ||x− z||2

∂w12= −2(x− z)σ(t)(1− σ(t))y1

2




z = f(W⊤y + b′) (15)


||x− z||2 =d!

k=1

(xk − zk)2 (16)


∂ ||x− z||2

∂w12= −2(x− z)

∂z2∂w12

∂z2∂wij

=∂

∂wijf(w12jy1 + w22y2 + w32y3 + b′2)

ここでt = w12jy1 + w22y2 + w32y3 + b′2


∂w12=

∂f(t)

∂t

∂t

∂w12=

exp(−t)

(1 + exp(−t))2y1 = σ(t)(1− σ(t))y1

よって，∂ ||x− z||2

∂w12= −2(x− z)σ(t)(1− σ(t))y1

w12 に関する更新式は，

w(n+1)12 = w(n)

12 + 2η(x− z)σ(t)(1− σ(t))y1

2

結果を詳しく見てみる• 最後に得られた更新式の右辺ではx(入力)が分かればyが分かり，z

が分かるので(x-z)が計算できる．

• yが分かればtが計算できるのでシグモイド関数の値も計算できる

• よってWの値が更新できる

• 同様にしてbとb’に関する更新式も導入できる．

• アルゴリズム

• パラメータを全てのランダムに初期化をする

• 更新式に従って更新する（決まった回数分，収束するまで,…)

• 最終的なパラメータ値を返す.それが自己符号化器のパラメータとなる．

59

実践

Theano• Pythonで書かれた汎用的な数値解析ライブラリ

• 深層学習に向いている理由

• 自動的にGPUへの最適化をしてくれる!

• 自動微分ができる!

• 自己符号化器，制限ボルツマンマシンなど深層学習でよく出てくるモジュールが既に含まれている

61

Theanoを使ったAEの実装• ソース（Deep Learning Tutorial)

• http://deeplearning.net/tutorial/gettingstarted.html

• git clone git://github.com/lisa-lab/DeepLearningTutorials.git

• Theanoのインストール

• http://deeplearning.net/software/theano/install.html

• 必要なもの(Python >= 2.6, g++, python-dev, NumPy, SciPy, BLAS)

• 全てopen sourceソフトでDebian系だとsudo apt-get installでインストールできる．(Mac OSX: brew/macports)

62

git://github.com/lisa-lab/DeepLearningTutorials.git

http://deeplearning.net/software/theano/install.html

参考文献• http://deeplearning.net/ [深層学習ポータル]

• 人工知能学会誌，深層学習特集がある！[日本語]

• まもなく本がでます←宣伝

• http://www.iro.umontreal.ca/~bengioy/dlbook/ [教科書]

• http://deeplearning.net/software/theano/tutorial/ [Theano tutotial 英語]

• http://www.chino-js.com/ja/tech/theano-rbm/ [Theano入門：日本語，RBM実装]

• http://www.slideshare.net/tushuhei/121227deep-learning-iitsuka [Theanoを使ったRBM/AE実装]

63

http://deeplearning.net/

http://www.iro.umontreal.ca/~bengioy/dlbook/

http://deeplearning.net/software/theano/tutorial/

http://www.chino-js.com/ja/tech/theano-rbm/

http://www.slideshare.net/tushuhei/121227deep-learning-iitsuka

深層学習入門

Engineering

Transcript of 深層学習入門