Deep Learning Theory and Practice - Computer Action...
Transcript of Deep Learning Theory and Practice - Computer Action...
![Page 1: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/1.jpg)
Deep Learning Theory and PracticeLecture 5
Introduction to deep neural networks
Dr. Ted Willke [email protected]
Tuesday, January 21, 2020
![Page 2: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/2.jpg)
Review of Lecture 4• Adaline ‘neuron’ minimizes the squared loss
2
update the weights w(t + 1) = w(t) + η ⋅ (y* − s(t)) ⋅ x*
compute s = wTx
w(1) = 0
for iteration t = 1, 2, 3, . . .
pick a point (at random) (x*, y*) ∈ D
t ← t + 1
Forward pass of ‘signal’:
Backward pass of updates:
The Adaline Algorithm:
1:
2:
3:
4:
5:
6:
![Page 3: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/3.jpg)
Review of Lecture 4• Logistic regression: Better classification
3
hw(x) = θ (wTx)Uses , where θ(s) =1
1 + e−s
yGives us the probability of being the label:
• Learning should strive to maximize this jointprobability over the training data:
P(y1, . . . , yN |x1, . . . , xN) =N
∏n=1
P(yn |xn) .
• The principle of maximum likelihood says we can do this if we minimize this error:
1. Compute the gradient
2. Move in the direction
3. Update the weights:
4. Repeat until converged! A convex problem
v̂ = − gt
gt = ∇Ein(w(t))
w(t + 1) = w(t) + ηv̂t
∇wEin(w) → 0.
• We can’t minimize this analytically, but we can numerically/iteratively set
![Page 4: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/4.jpg)
Review: Gradient Descent
4
Ball on complicated hilly terrain
- rolls down to a local valley
this is called a local minimum
Questions:
1. How to get to the bottom of the deepest valley?
2. How to do this when we don’t have gravity :-)?
![Page 5: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/5.jpg)
Today’s Lecture
•Review of gradient descent
•What is a deep neural network?
•How do we train one?
•How do we train one efficiently?
•Tutorial: Image classification using a logistic regression network
5(Many slides adapted from Yaser Abu-Mostafa and Malik Magdon-Ismail, with permission of the authors. Thanks guys!)
![Page 6: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/6.jpg)
Today’s Lecture
•Review of gradient descent
•What is a deep neural network?
•How do we train one?
•How do we train one efficiently?
•Tutorial: Image classification using a logistic regression network
6(Many slides adapted from Yaser Abu-Mostafa and Malik Magdon-Ismail, with permission of the authors. Thanks guys!)
![Page 7: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/7.jpg)
Our has only one valley
7
Ein
… because is a convex function of . Ein(w) w
Can you prove this for logistic regression?
![Page 8: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/8.jpg)
How to ‘roll’ down
8
Assume you are at weights and you take a step of size in the direction . w(t) η v̂
w(t + 1) = w(t) + ηv̂
We get to select . v̂
Select to make as small as possible.v̂ Ein(w(t + 1))
what’s the best direction to take a step in??
![Page 9: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/9.jpg)
The gradient is the fastest way to roll down
9
Approximately the change in Ein
What choice of will maximize the gradient (minimize its negative)?v̂
![Page 10: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/10.jpg)
Maximizing the descent
10
How do we maximize ?ΔEin
ΔEin ≈ η∇Ein(w(t))Tv̂
≈ η∥∇Ein(w(t))T∥∥v̂∥ cos(θ) , where is the angle between and .θ ∇Ein ̂v
This is maximized when , i.e., when points in direction of .cos(θ) = 1 ∇Ein(w(t))Tv̂
Therefore, we can take the largest negative step when .v̂ = −∇Ein(w(t))
∥∇Ein(w(t))∥
![Page 11: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/11.jpg)
‘Rolling down’ = iterating the negative gradient
11
![Page 12: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/12.jpg)
The ‘Goldilocks’ step size
12
![Page 13: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/13.jpg)
Fixed learning rate gradient descent
13
Define
Then
(reduces step size as minimum is approached)
Gradient descent can minimize any smooth function.
![Page 14: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/14.jpg)
Summary of linear models
Credit Analysis
Perceptron
Linear regression
Logistic regression Cross-Entropy Error (Gradient Descent)
Squared Error (Pseudo-inverse)
Classification Error(PLA)
Approve or Deny
Amount ofCredit
Probability of Default
![Page 15: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/15.jpg)
Today’s Lecture
•Review of gradient descent
•What is a deep neural network?
•How do we train one?
•How do we train one efficiently?
•Tutorial: Image classification using a logistic regression network
15
![Page 16: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/16.jpg)
The neural network - biologically inspired
16
biological function biological structure
![Page 17: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/17.jpg)
Biological inspiration, not bio-literalism
17
Engineering success can draw upon biological inspiration at many levels of abstraction.We must account for the unique demands and constraints of the in-silico system.
![Page 18: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/18.jpg)
XOR: A limitation of the linear model
18
![Page 19: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/19.jpg)
XOR: A limitation of the linear model
19
f = h1h2 + h1h2
h1(x) = sign(wT1 x) h2(x) = sign(wT
2 x)
![Page 20: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/20.jpg)
Perceptrons for OR and AND
20
OR(x1, x2) = sign(x1 + x2 + 1.5) AND(x1, x2) = sign(x1 + x2 − 1.5)
![Page 21: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/21.jpg)
Representing using OR and AND
21
f
f = h1h2 + h1h2
![Page 22: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/22.jpg)
Representing using OR and AND
22
f
f = h1h2 + h1h2
![Page 23: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/23.jpg)
The multilayer perceptron
23
wT2 x
wT0 x
3 layers ‘feedforward’
hidden layers
![Page 24: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/24.jpg)
Universal Approximation
24
Any target function that can be decomposed into linear separators can be implemented by a 3-layer MLP.
f
![Page 25: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/25.jpg)
A powerful model
25
Target 8 perceptrons 16 perceptrons
Red flags for generalization and optimization.
What tradeoff is involved here?
![Page 26: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/26.jpg)
Minimizing
26
Ein
The combinatorial challenge for the MLP is even greater than that of the perceptron.
is not smooth (due to ), so cannot use gradient descent.sign( ⋅ )Ein
sign(x) ≈ tanh(x) ⟶ gradient descent to minimize Ein .
![Page 27: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/27.jpg)
The deep neural network
27
input layer l = 0 hidden layers 0 < l < L output layer l = L
![Page 28: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/28.jpg)
How the network operates
28
w(l)ij
1 ≤ l ≤ L layers0 ≤ i ≤ d(l−1) inputs1 ≤ j ≤ d(l) outputs
x(l)j = θ(s(l)
j ) = θ (d(l−1)
∑i=0
w(l)ij x(l−1)
i )
Apply to x x(0)1 . . . x(0)
d(0) → → x(L)1 = h(x)
θ(s) = tanh(s) =es − e−s
es + e−s
![Page 29: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/29.jpg)
Today’s Lecture
•Review of gradient descent
•What is a deep neural network?
•How do we train one?
•How do we train one efficiently?
•Tutorial: Image classification using a logistic regression network
29
![Page 30: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/30.jpg)
How can we efficiently train a deep network?
30
Gradient descent minimizes: Ein(w) =1N
N
∑n=1
e(h(xn), yn)
by iterative steps along −∇Ein :
∇w = − η∇Ein(w)
∇Ein is based on ALL examples (xn, yn)
‘batch’ GD
ln(1 + e−ynwTxn) logistic regression
![Page 31: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/31.jpg)
The stochastic aspect
31
𝔼n [−∇e(h(xn), yn)] =1N
N
∑n=1
e(h(xn), yn)‘Average’ direction:
= − ∇Ein :
Pick one at a time. Apply GD to . (xn, yn) e(h(xn), yn)
stochastic gradient descent (SGD)
A randomized version of GD.
![Page 32: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/32.jpg)
Benefits of SGD
32
Randomization helps.
1. cheaper computation
2. randomization
3. simple
Rule of thumb:
η = 0.1 works
(empirically adjust; exponentially)
![Page 33: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/33.jpg)
The linear signal
33
Input is a linear combination (using weights) of the outputs of the previous layer
s(l)
x(l−1) .
(recall the linear signal )s = wTx
![Page 34: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/34.jpg)
Forward propagation: Computing
34
h(x)
![Page 35: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/35.jpg)
Minimizing
35
Ein
Using makes differentiable, so we can use gradient descent (or SGD) local min.θ = tanh Ein ⟶
![Page 36: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/36.jpg)
Gradient descent
36
![Page 37: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/37.jpg)
Gradient descent of
37
Ein
We need:
![Page 38: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/38.jpg)
Numerical Approach
38
approximate
inefficient
:-(
![Page 39: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/39.jpg)
Algorithmic Approach :-)
39
is a function of ande(x) s(l) s(l) = (W(l))Tx(l−1)
(chain rule)
sensitivity
![Page 40: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/40.jpg)
Computing using the chain rule
40
δ(l)
Multiple applications of the chain rule:
![Page 41: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/41.jpg)
The backpropagation algorithm
41
![Page 42: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/42.jpg)
Algorithm for gradient descent on
42Can do batch version or sequential version (SGD).
Ein
![Page 43: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/43.jpg)
Digits Data
43
![Page 44: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/44.jpg)
Today’s Lecture
•Review of gradient descent
•What is a deep neural network?
•How do we train one?
•How do we train one efficiently?
•Tutorial: Image classification using a logistic regression network
44
![Page 45: Deep Learning Theory and Practice - Computer Action Teamweb.cecs.pdx.edu/~willke/courses/EE510W20/lectures/... · 2020. 1. 21. · Deep Learning Theory and Practice Lecture 5 Introduction](https://reader034.fdocument.pub/reader034/viewer/2022051905/5ff767c59adce4352c287168/html5/thumbnails/45.jpg)
Further reading
• Abu-Mostafa, Y. S., Magdon-Ismail, M., Lin, H.-T. (2012) Learning from data. AMLbook.com.
• Goodfellow et al. (2016) Deep Learning. https://www.deeplearningbook.org/
• Boyd, S., and Vandenberghe, L. (2018) Introduction to Applied Linear Algebra - Vectors, Matrices, and Least Squares. http://vmls-book.stanford.edu/
• VanderPlas, J. (2016) Python Data Science Handbook. https://jakevdp.github.io/PythonDataScienceHandbook/
45