Dream Note κ΅μ¬ 01κ° λ¬Έλ§₯ μ μ΄ν μΆλ‘ %28μΈκ°01~02κ°%29%5B1%5D
Deep Learning intro. - Kangwoncs.kangwon.ac.kr/.../12_deeplearning_intro.pdfΒ Β· 2016-06-17Β Β· π...
Transcript of Deep Learning intro. - Kangwoncs.kangwon.ac.kr/.../12_deeplearning_intro.pdfΒ Β· 2016-06-17Β Β· π...
π ππππ πΆ
Deep Learning intro.
π ππππ πΆ
2016.01.02.
π ππππ πΆ 2
Outline
Natural Language Processing (NLP)
Representation and Processing
Deep Learning Models
π ππππ πΆ
Natural Language Processing
π ππππ πΆ 4
Natural Language Processing (NLP)
β’ λ΅λ³
β’ κ²μ
β’ μΆλ‘
β’ λν
μΈμ΄μ΄ν΄ μΈμ΄μμ±μμ©β’ μ§λ₯νλ‘λ΄
β’ μ 보κ²μ
β’ κΈ°κ³λ²μ
β’ λ¬Έμμμ½
β’ μ§λ¬Έ
β’ λ¨μ΄μ΄ν΄
β’ μλ―Έμ΄ν΄
β’ μλνμ
π ππππ πΆ
Representation and Processing
π ππππ πΆ 6
Representation in mathematics
<0.156, 0.421, 0.954, β¦>
<0.096, 0.510, 0.991, β¦>
<0.496, 0.951, 0.321, β¦>
<0.196, 0.851, 0.119, β¦>
<β¦, 0.486, 0.854, β¦>
<β¦, 0.751, 0.912, β¦>
<β¦, 0.123, 2.554, 5.124, β¦>
<β¦, 7.451, 21.45, 8.999>
<β¦, 1.109, 11.854, 0.456>
Real World Vector Space
https://www.google.com/imghp?hl=ko
π ππππ πΆ 7
μ€λ¦¬ vs. ν λΌ
π ππππ πΆ 8
μμ₯
π ππππ πΆ 9
Neural Network for Human
https://uncyclopedia.kr/wiki/%EB%87%8C
Neural Network
Pattern recognition
Multi layer
Human: 10 layers
I see lion
π ππππ πΆ 10
Neural Network
Vector representation
Pattern of layers
+ Learning
π ππππ πΆ 11
Pattern of layers
Deep learning automatic pattern combination
Why we say deep ?
β¦ β¦ β¦ β¦ β¦ β¦
β¦
Unit
layer
n
m
Connection link: (n x n) x (m-1)
Automatic combination
π ππππ πΆ 12
How to use layers?
Input vector
Output real number or class (vector)
Vector representation βOne-hotβ
π ππππ πΆ 13
Vector representation
[Symbol]
Lion[Text representation] [One-hot representation]
<0, 0, 0, 0, 0, 1, 0, 0, 0, 0, β¦>
[Symbol representation]
<1.45, 75.12, 0.425, 0.953, β¦>
π ππππ πΆ 14
Jung, DEEP LEARNING FOR KOREAN NLP
π ππππ πΆ 15
How to define symbol to one-hot
Lion
Big cat
[Symbolic words]
<0, 0, 1, 0, 0>
<0, 1, 0, 0, 1>
[One-hot]
If it uses AND op., two words is non-match
β΄ we need symbolic vector representation
π ππππ πΆ 16
How to define symbol to one-hot
Lion
Big cat
TigerDog
Wolf
Mouse
β΄ [Symbolic representation]
<0, 0, 1, 0, 0>
<0, 1, 0, 0, 1>
<1.45, 75.12, 0.425, 0.953, β¦>
<1.78, 61.11, 0.611, 2.011, β¦>
Use cosine similarity
[Symbolic vectors] (from NNLM)
π ππππ πΆ 17
Neural Network Language Model
Feed-forward NN
parametric Estimator
overall parameter set π = (πΆ,π€)
one-hot representationβ’ [0 1 0 0 0 0 0 0 0 0]
Lookup Tableβ’ word embedding
Non-linear projectionβ’ activation function
Normalize weightβ’ softmax (length: π)
π ππππ πΆ 18
Neural Network Language Model
maxπ β πππ ππππππβπππ
πΏ = maxπ
1
π π‘ ππππ(π€π‘, π€π‘β1, β¦ , π€π‘βπ+1)
parametersβ’ β: π‘βπ ππ’ππππ ππ βπππππ π’πππ‘π
β’ π: π‘βπ ππ’ππππ ππ ππππ‘π’πππ π€ππ‘β πππβ π€πππ
β’ π: π‘βπ ππ’π‘ππ’π‘ ππππ ππ
β’ π: π‘βπ βπππππ πππ¦ππ ππππ ππ
β’ π: β β π‘π β π π€πππβπ‘π
β’ π: πΌ β π‘π β π π€πππβπ‘π
β’ π»: πΌ β π‘π β π» π€πππβπ‘π
β’ πΆ:π€πππ ππππ‘π’πππ (πππππ’π π‘ππππ)
β’ π = (π, π,π, π,π», πΆ)
π ππππ πΆ 19
NNLM for Korean
Leeck, λ₯λ¬λμμ΄μ©ννκ΅μ΄μ쑴ꡬ문λΆμ
π ππππ πΆ
Deep Learning Models
π ππππ πΆ 21
Deep learning Models
βκ°λμ£Όλ³μμ€νλ² μ€μμΉκ°μ΄λμΌ?ββ’ κ°λ/NNG μ£Όλ³/NNG μ/JX μ€νλ² μ€/NNG β¦
Feed-forward Neural Network (FFNN)
ππ‘
Y
κ°λ
NNG
μ£Όλ³
NNG
μ
JX
FFNN:
1-FFNN 2-FFNN 3-FFNN
π ππππ πΆ 22
Deep learning Models
βκ°λμ£Όλ³μμ€νλ² μ€μμΉκ°μ΄λμΌ?ββ’ ππ‘ππ₯π‘ [κ°λμ£Όλ³μμ€νλ² μ€μμΉ], [μ΄λ]
β’ ππ‘πππ [ B I I I I ], [ B ]
Recurrent Neural Network (RNN)
ππ‘
Y
unfold κ°λ
B
μ£Όλ³
I
μ
I
μ€νλ² μ€
I
μμΉ
I
RNN
π ππππ πΆ 23
Deep learning Models
βκ°λμ£Όλ³μμ€νλ² μ€μμΉκ°μ΄λμΌ?ββ’ ππ‘ππ₯π‘ [κ°λμ£Όλ³μμ€νλ² μ€μμΉ], [μ΄λ]
β’ ππ‘πππ [ B I I I I ], [ B ]
Long Short-Term Memory RNN (LSTM-RNN)β’ Using gate matrix (LSTM or GRU)
ππ‘
Y
unfold κ°λ
B
μ£Όλ³
I
μ
I
μ€νλ² μ€
I
μμΉ
I
LSTM-RNN
π ππππ πΆ 24
Deep learning Models
βκ°λμ£Όλ³μμ€νλ² μ€μμΉκ°μ΄λμΌ?ββ’ ππ‘ππ₯π‘ [κ°λμ£Όλ³μμ€νλ² μ€μμΉ], [μ΄λ]
β’ ππ‘πππ [ B I I I I ], [ B ]
LSTM-RNN CRF β’ Using gate matrix (LSTM or GRU)
ππ‘
Y
unfold κ°λ
B
μ£Όλ³
I
μ
I
μ€νλ² μ€
I
μμΉ
I
LSTM-RNN
Viterbi or Beam search
π ππππ πΆ 25
Deep learning Models
βκ°λμ£Όλ³μμ€νλ² μ€μμΉκ°μ΄λμΌ?ββ’ ππ‘ππ₯π‘ [κ°λμ£Όλ³μμ€νλ² μ€μμΉ], [μ΄λ]
β’ ππ‘πππ [ B I I I I ], [ B ]
Bidirectional LSTM-RNN CRF (Bi-LSTM-RNN CRF)β’ Using gate matrix (LSTM or GRU)
Viterbi or Beam search
κ°λ
B
μ£Όλ³
I
μ
I
μ€νλ² μ€
I
μμΉ
I
forward
backward
π ππππ πΆ 26
Deep learning Models
Sequence-to-sequence model
Two different LSTM: Input/output sentence LSTM
Using the Shallow LSTM
Reverse input sentence
Training: Decoding & Rescoring
π ππππ πΆ 27
Deep learning Models
Encoder-Decoder Architecture
π ππππ πΆ 28
Pointer Networks
β’ Seq2seqμ attention mechanism μκΈ°λ°μΌλ‘νλ₯λ¬λλͺ¨λΈ
β’ μ λ ₯μ΄μμμΉ(μΈλ±μ€)λ₯ΌμΆλ ₯μ΄λ‘νλλͺ¨λΈ
β’ X = {A:0, B:1, C:2, D:3, <EOS>:4}
β’ Y = {3, 2, 0, 4}
A B C D <EOS> D C A <EOS>
Encoding Decoding
Deep learning Models
π ππππ πΆ 29
Deep learning Models
Siamese Neural Network
π ππππ πΆ 30
References
Jung, DEEP LEARNING FOR KOREAN NLP
Lee, λ₯λ¬λμμ΄μ©ννκ΅μ΄μ쑴ꡬ문λΆμ
Park, Point networks for Coreference Resolution
Park, Bi-LSTM-RNN CRF for Mention Detection
π ππππ πΆ 31
QA
κ°μ¬ν©λλ€.
λ°μ²μ, μ΅μκΈΈ, λ°μ°¬λ―Ό, μ΅μ¬ν, νλ€μ
π ππππ πΆ , κ°μλνκ΅
Email: [email protected]