Introduction to Neural Networks -...

37
Media IC & System Lab Introduction to Neural Networks 柯揚

Transcript of Introduction to Neural Networks -...

Page 1: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Introduction to Neural Networks

柯揚

Page 2: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Agenda

• Origin of Neural Networks

• Math of Neural Networks

• Machine Learning

• Deep Learning

• Convolutional Neural Networks

• Recurrent Neural Networks

2

Page 3: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Origin

Theory

PredictionVerification

Observation

3

Page 4: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Origin

4

Page 5: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Origin

• What if we don’t have a theory?• In physics, we look for variables and their relations: 𝑓 = 𝑚𝑎

• What if we have variables but cannot figure out their relations?

• Examples:• Recognize a person

• Variables: Pixels of an image → Identity

• Solve a riddle• Variables: Riddle description → Solution description

• …

5

Page 6: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Origin

Take a look at nature

6

Page 7: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Origin

• Hubel and Wiesel

7

Page 8: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Origin →Math

• Single cells that do a simple computation:• Hidden:

• 𝑦𝑖 = 𝜎 σ𝑗=1,2,3𝑤𝑖,𝑗𝐻 𝑥𝑗 + 𝑏𝑖

𝐻 for 𝑖 = 1,2,3,4

• 𝜎:ℝ → ℝ … activation function, traditionally tanh 𝑥

• σ𝑗=1,2,3𝑤𝑖,𝑗𝐻 𝑥𝑗 … is a scalar product: < 𝒘𝒊

𝑯, 𝒙 >

• 𝒛 = 𝜎 𝑊𝐻𝒙 + 𝒃𝑯 in matrix-vector notation

• Output:• 𝒚 = 𝑊𝑂𝒛 + 𝒃𝑶

• We call 𝑊𝐻 and 𝑊𝑂 weights, because they give (more or less) weight to the input• 𝒃𝑯 and 𝒃𝑶 are called bias, because they bias (偏誤,偏差) the output to one

direction

• Two matrix multiplications. How many MAC (multiply-accumulate operations)?

8

𝑥1

𝑥2

𝑥3

𝑧1

𝑧2

𝑧3

𝑧4

𝑦1

𝑦2

Page 9: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Math

• Neural Network:• Compute 𝒛 = 𝜎 𝑊𝐻𝒙 + 𝒃𝑯 and 𝒚 = 𝑊𝑂𝒛 + 𝒃𝑶

• 𝒙 are our inputs. 𝒚 are our outputs.

• How do we get our parameters: 𝜃 = 𝑊𝐻 , 𝒃𝑯,𝑊𝑂, 𝒃𝑶 ?

• The neural network should do something very well, so optimise it to do that…

9

Page 10: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Math

• Example: Predict the temperature tomorrow from the same day of the last five years• Input 𝑥 ∈ ℝ5 temperature vector for the last five years

• Output 𝑦 ∈ ℝ temperature tomorrow

• Network produces predictions ො𝑦 ∈ ℝ

• To measure how well our Neural Network is doing, we need some kind of loss measure• Take mean-squared-error in this example

10

T @ July 17th, 2018

T @ July 17th, 2017

T @ July 17th, 2016

T @ July 17th, 2015

T @ July 17th, 2014

July 17th , 2019

Page 11: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Math →Machine Learning

• We have:• A Neural Network 𝑓 𝑥; 𝜃 = ො𝑦

• A Loss: ℒ 𝑌, 𝑌 =1

𝐷σ𝑖=1𝐷 𝑦𝑖 − ො𝑦𝑖

2

• 𝑌 = 𝑦1, … , 𝑦𝐷 𝑌 = ො𝑦1, … , ො𝑦𝐷

• How do we get our truth, 𝑌?• Collect data…

• An observation is the best truth you can get.

11

Page 12: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Machine Learning

• Learning from Observations

• Recipe for practitioners:1. Collect (lots of) examples 𝑋 → 𝑌 (data + labels)

2. Choose a suitable function 𝑓 𝑥; 𝜃

3. Choose a loss function ℒ

4. Optimize 𝜃 to minimize ℒ

12

Page 13: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Machine Learning

• 1. Collect Data• Most algorithms assume that data is IID

• Independent & Identically distributed

• An example:• Tell dogs apart from cats

• Give your algorithm 990 images of dogs, 10 images of cats

• If your algorithm always answers “dog”, what’s the error?

→ Data is not IID

13

Page 14: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Machine Learning

• 2. Choose a suitable function 𝑓 𝑥; 𝜃

• How to choose a good function?

• Avoid under/overfitting:

• How to check this?• Train on one part of the training data 𝑋𝑇𝑟𝑎𝑖𝑛, 𝑌𝑇𝑟𝑎𝑖𝑛• Validate on the other part 𝑋𝑇𝑒𝑠𝑡 , 𝑌𝑇𝑒𝑠𝑡

14

Page 15: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Machine Learning

• 3. Loss ℒ & 4. Optimisation

• Optimisation in high school: 𝜕ℒ 𝑌,𝑓 𝑋;𝜃

𝜕𝜃= 0

• No closed form solution…

• Instead gradient descent:

15

Page 16: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Machine Learning

• 3. Loss ℒ & 4. Optimisation

• Gradient Descent: iterative minimisation, take the steepest direction in each step:

• 𝜃𝑡+1 = 𝜃𝑡 − 𝜂𝜕ℒ

𝜕𝜃

• 𝜂 … step size (how far you walk before you check your direction)

•𝜕ℒ

𝜕𝜃… gradient (you need a loss function that is differentiable)

• Does not give a global optimum!

16

Page 17: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Machine Learning → Deep Learning

• Deep Learning: Machine Learning with Deep Neural Networks• Deep = a lot of layers

• Each layer is a function 𝑦𝑙 = 𝜎 𝑊𝑙𝑥𝑙 + 𝑏𝑙 where• 𝑥𝑙 is each layer’s input (previous layer’s output)• 𝑦𝑙 is each layer’s output• 𝜃𝑙 = 𝑊𝑙 , 𝑏𝑙 are the parameters of each layer

• Three-layer neural network:

• 𝑦3 = 𝜎 𝑊3𝜎 𝑊2𝜎 𝑊1𝑥1 + 𝑏1 + 𝑏2 + 𝑏3

• How do we compute the gradient to update the parameters?

•𝜕ℒ

𝜕𝜃3=

𝜕ℒ

𝜕𝑦3

𝜕𝑦3

𝜕𝜃3with

𝜕𝑦3

𝜕𝑏3= 𝜎′ and

𝜕𝑦3

𝜕𝑊3= 𝜎′𝑥3

•𝜕ℒ

𝜕𝜃2=

𝜕ℒ

𝜕𝑦3

𝜕𝑦3

𝜕𝑦2

𝜕𝑦2

𝜕𝜃2with

𝜕𝑦2

𝜕𝑏2= 𝜎′ and

𝜕𝑦2

𝜕𝑊2= 𝜎′𝑥2

•𝜕ℒ

𝜕𝜃1=

𝜕ℒ

𝜕𝑦3

𝜕𝑦3

𝜕𝑦2

𝜕𝑦2

𝜕𝑦1

𝜕𝑦1

𝜕𝜃1with

𝜕𝑦1

𝜕𝑏1= 𝜎′ and

𝜕𝑦1

𝜕𝑊1= 𝜎′𝑥1

17

Page 18: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Deep Learning

• How do we compute the gradient to update the parameters?

•𝜕ℒ

𝜕𝜃3=

𝜕ℒ

𝜕𝑦3

𝜕𝑦3

𝜕𝜃3

•𝜕ℒ

𝜕𝜃2=

𝜕ℒ

𝜕𝑦3

𝜕𝑦3

𝜕𝑦2

𝜕𝑦2

𝜕𝜃2

•𝜕ℒ

𝜕𝜃1=

𝜕ℒ

𝜕𝑦3

𝜕𝑦3

𝜕𝑦2

𝜕𝑦2

𝜕𝑦1

𝜕𝑦1

𝜕𝜃1

• Gradient computation with chain rules yields shared operations• Compute them only once:

•𝜕ℒ

𝜕𝜃3=

𝜕ℒ

𝜕𝑦3

𝜕𝑦3

𝜕𝜃3

•𝜕ℒ

𝜕𝜃2=

𝜕𝑦3

𝜕𝑦2

𝜕𝑦2

𝜕𝜃2

•𝜕ℒ

𝜕𝜃1=

𝜕𝑦2

𝜕𝑦1

𝜕𝑦1

𝜕𝜃1

→ Backpropagation

18

Page 19: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Deep Learning

• In practice, for standard operations, this isn’t implemented manually

• Modern deep learning frameworks support automatic differentiation→ Tell them what 𝑓 looks like and they will take care of the rest

19

Page 20: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Deep Learning

• The non-linearity… 𝜎

• Let’s look at the last layer’s gradient: 𝜕ℒ

𝜕𝜃3=

𝜕ℒ

𝜕𝑦3

𝜕𝑦3

𝜕𝜃3• 𝑦3 = 𝜎 𝑊3𝑥3 + 𝑏3

• For each parameter: 𝜕𝑦3

𝜕𝑏3= 𝜎′ and

𝜕𝑦3

𝜕𝑊3= 𝜎′𝑥3

• 𝜎′ is shorthand for 𝜎′ 𝑊3𝑥3 + 𝑏3• Traditionally: 𝜎 𝑥 = tanh 𝑥 (from biology)

• What if 𝑊3𝑥3 + 𝑏3 is very far away from 0?

20

Page 21: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Deep Learning

• The non-linearity… 𝜎

• Vanishing gradient:• Gradient may become very small in a network

• Modern non-linearity:

• 𝜎 𝑥 = 𝑅𝑒𝐿𝑈 𝑥 = ቊ𝑥 𝑥 ≥ 00 𝑒𝑙𝑠𝑒

• 𝜎′ 𝑥 = 𝑅𝑒𝐿𝑈′ 𝑥 = ቊ1 𝑥 ≥ 00 𝑒𝑙𝑠𝑒

• Easy to compute and only one half vanishes…

21

Page 22: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Deep Learning

• Optimisation

• (Iterative) Gradient Descent: 𝜃𝑡+1 = 𝜃𝑡 − 𝜂𝜕ℒ

𝜕𝜃

• In each iteration: compute gradients for all examples

• With big data, that takes a lot of time…

→Stochastic Gradient Descent (SGD):• Randomly choose a small set of examples (called a “batch”)

• Compute the gradient only for a subset of examples

22

Page 23: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Deep Learning → Convolutional Neural Networks• How to choose the right 𝑓?

• Sample from a salary questionnaire:• Age, Gender, Experience, Education, Salary, Bonus, …• Can I change their order without changing the meaning of the sample?

• Salary, Gender, Education, Bonus, Age, Experience, …

• Natural Language Sentence:• 我喜歡機器學習• Can I change the order?

• 機器喜歡我學習

23

Page 24: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Convolutional Neural Networks

• In a lot of data, we have direct neighborhood relations• Images• Audio• Video• Natural Language• …

• All data that is sampled according to a single (or multiple) changing variable(s)• Time series: sampled at different times• Images: sampled at different locations

24

Page 25: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Convolutional Neural Networks

• Convolution: neighborhood operator

25

Page 26: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Convolutional Neural Networks

• How to use convolutions in a neural network?• Example for images:

• Data has four dimensions: (𝐵, 𝐶, 𝐻,𝑊)• 𝐵 “batch”: the examples in a batch

• 𝐶 “channels”: the channels of each example (RGB for an image)

• 𝐻,𝑊 “spatial dimensions”: the intensities of a single channel of one example

• The weights of a single layer have four dimensions: 𝐹, 𝐶, 𝐾𝐻, 𝐾𝑊• 𝐹 “filter”: the filters of a layer

• 𝐶 “channels”: the channels of a layer, correspond to the channels of the data

• 𝐾𝐻 , 𝐾𝑊 “the kernel dimensions”: the actual kernel that is applied to a single channel

26

Page 27: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Convolutional Neural Networks

• How to use convolutions in a neural network?• Input: Single Image

• 1 × 3 × 4 × 4

• Weight: Single Kernel• 1 × 3 × 2 × 2

• There’s one kernel for each channel!

• Output: Single Image with Single Channel• 1 × 1 × 3 × 3

• 𝑰𝑹 ∗ 𝑲𝑶 + 𝑰𝑮 ∗ 𝑲𝑮 + 𝑰𝑩 ∗ 𝑲𝑷

27

Page 28: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Convolutional Neural Networks

• How to use convolutions in a neural network?• 𝑰𝑹 ∗ 𝑲𝑶 + 𝑰𝑮 ∗ 𝑲𝑮 + 𝑰𝑩 ∗ 𝑲𝑷 of size 1 × 1 × 3 × 3

• Use the non-linearity again: 𝑅𝑒𝐿𝑈 𝑰𝑹 ∗ 𝑲𝑶 + 𝑰𝑮 ∗ 𝑲𝑮 + 𝑰𝑩 ∗ 𝑲𝑷

• Use multiple filters:

• Filter 1: 1 × 3 × 2 × 2

• Filter 2: 1 × 3 × 2 × 2

• Filter 3: 1 × 3 × 2 × 2

• Together: 3 × 3 × 2 × 2 1 × 3 × 3 × 3

28

1 × 1 × 3 × 3

1 × 1 × 3 × 3

1 × 1 × 3 × 3

Page 29: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Convolutional Neural Networks

• How to use convolutions in a neural network?• One layer:

• Input: (𝐵, 𝐶, 𝐻,𝑊)

• Filters: 𝐹, 𝐶, 𝐾𝐻 , 𝐾𝑊• Output: (𝐵, 𝐹, 𝐻 − 𝐾𝐻 + 1,𝑊 − 𝐾𝑊 + 1)

• Apply non-linearity to each of the outputs

• Each layer is characterized by • #Filters

• Kernel size

• Channels are the filters of the previous layer!

29

Page 30: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Convolutional Neural Networks

• Calculation exercise• How many MACs for the example with three filters?

• Input 1 × 3 × 4 × 4

• Filter 3 × 3 × 2 × 2

• For 𝑁 images?• Input 𝑁 × 3 × 4 × 4

• Filter 3 × 3 × 2 × 2

30

Page 31: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Convolutional Neural Networks

• Padding: Add 0s around the outside of the input data to preserve size• Example

• 3 × 3 kernel

• Pad input with 1 line of zeros on the outside

• Pooling: Reduce spatial size• Higher layers work on more “abstract” data

• Max pooling: Keep the maximum out of a 𝐾𝐻 × 𝐾𝑊 patch (no calculation)

• Average pooling: Compute the mean value in a 𝐾𝐻 × 𝐾𝑊 patch

• For 𝐾𝐻 × 𝐾𝑊 = 2 × 2 the data is reduced by factor 2:• 1 × 3 × 4 × 4 → 1 × 3 × 2 × 2

31

Page 32: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Convolutional Neural Networks

• Calculation exercise• How many MACs with padding and 2 × 2 max pooling?

• Input 1 × 3 × 4 × 4

• Filter 4 × 3 × 3 × 3

• What’s the output size?

• How many MACs with padding and 2 × 2 average pooling?• Input 1 × 3 × 4 × 4

• Filter 4 × 3 × 3 × 3

• What’s the output size?

32

Page 33: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Convolutional Neural Networks

• Multiple Layers:• Input 1 × 3 × 4 × 4

• Layer 1, Filter 4 × 3 × 3 × 3, w/ padding

• 2 × 2 Max pooling

• Layer 2, Filter 1 × 4 × 1 × 1, w/ padding

• Q• What’s the output of layer 1?

• After max pooling?

• And after layer 2?

33

Page 34: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Classification Loss Functions

• Classification loss

• “Softmax”: 𝑃 Class 𝑐|ෝ𝒚 =exp ො𝑦𝑐

σ𝑘=1𝐶 exp ො𝑦𝑘

• Converts prediction vector ෝ𝒚 to probabilities

• Cross entropy: ℒ = −σ𝑘=1𝐶 𝑦𝑘 log 𝑃 𝑘 ෝ𝒚 + 1 − 𝑦𝑘 log 1 − 𝑃 𝑘 ෝ𝒚

• 𝒚 is 1-hot-vector• It’s 0 everywhere except for where the true class is

• Cats and Dogs and Mice example:

• (1,0,0) is label for cat

• (0,1,0) for dog

• (0,0,1) for mouse

34

Page 35: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Recurrent Neural Networks

• Time series / language modelling• Input/output for each time step

• Weights are the same in each timestep (similar to convolution)

• Update state: ℎ𝑡 = 𝜎 𝑉ℎ𝑡−1 + 𝑈𝑥𝑡−1 + 𝑏ℎ• Compute output: 𝑜𝑡 = 𝜎 𝑊ℎ𝑡 + 𝑏𝑜

35

Page 36: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Recurrent Neural Networks

• Disadvantage• Difficult to parallelise

• Use of GPU is less efficient

• Multiple layers:• RNN in one direction first (left to right)

• Second RNN in the other direction using 𝑜𝑡 as inputs

36

Page 37: Introduction to Neural Networks - 國立臺灣大學media.ee.ntu.edu.tw/crash_course/2019/Intro_NN.pdfIntroduction to Neural Networks 柯揚 Media IC & System Lab Agenda •Origin

Media IC &System Lab

Summary

• Neural Networks:• Layers of linear functions + non-linearities

• Convolutional neural networks:• Layers of convolutions + non-linearities

• Recurrent neural networks:• Time steps of linear functions + non-linearities

• Machine Learning• Learning from observations• DFLO= Data + Function + Loss + Optimisation

37