Learning From Data Lecture 13 Validation and Model...
Transcript of Learning From Data Lecture 13 Validation and Model...
![Page 1: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/1.jpg)
Learning From DataLecture 13
Validation and Model Selection
The Validation Set
Model SelectionCross Validation
M. Magdon-IsmailCSCI 4100/6100
![Page 2: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/2.jpg)
recap: Regularization
Regularization combats the effects of noise by putting a leash on the algorithm.
Eaug(h) = Ein(h) +λ
NΩ(h)
Ω(h)→ smooth, simple h— noise is rough, complex.
Different regularizers give different results— can choose λ, the amount of regularization.
λ = 0 λ = 0.0001 λ = 0.01 λ = 1
x
y
DataTargetFit
x
y
x
y
x
y
Overfitting → → Underfitting
Optimal λ balances approximation and generalization, bias and variance.
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 2 /29 Peeking at Eout −→
![Page 3: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/3.jpg)
Validation: A Sneak Peek at Eout
Eout(g) = Ein(g) + overfit penalty︸ ︷︷ ︸
VC bounds this using a complexity error bar for Hregularization estimates this through a heuristic complexity penalty for g
Validation goes directly for the jugular:
Eout(g)︸ ︷︷ ︸
validation estimates this directly
= Ein(g) + overfit penalty.
In-sample estimate of Eout is the Holy Grail of learning from data.
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 3 /29 Test set −→
![Page 4: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/4.jpg)
The Test Set
D(N data points)
Dtest(K test points)
−−−−→
−−−−→
gek = e(g(xk), yk)−−−−−−−−−−−−→ e1, e2, . . . , eK
−−−−→
−−−−→
g Etest =1
K
K∑
k=1
ek
−−−−→
Eout(g)
Etest is an estimate for Eout(g)
EDtest[ek] = Eout(g)
E[Etest] =1
K
K∑
k=1
E[ek]
=1
K
K∑
k=1
Eout(g)= Eout(g)
e1, . . . , eK are independent
Var[Etest] =1
K2
K∑
k=1
Var[ek]
=1
KVar[e]
տdecreases like 1
K
bigger K =⇒ more reliable Etest.
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 4 /29 Validation set −→
![Page 5: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/5.jpg)
The Validation Set
D(N data points)
−−−→
−−−−−−−−−−−−−−−→Dtrain
(N −K training points)
Dval(K validation points)
−−−−→
−−−−→
gek = e(g (xk), yk)−−−−−−−−−−−−→ e1, e2, . . . , eK
−−−−→
−−−−→
g Eval =1
K
K∑
k=1
ek
−−−−→
Eout(g )
1. Remove K points from D
D = Dtrain ∪ Dval.
2. Learn using Dtrain −→ g .
3. Test g on Dval −→ Eval.
4. Use error Eval to estimate Eout(g ).
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 5 /29 Validation −→
![Page 6: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/6.jpg)
The Validation Set
D(N data points)
−−−→
−−−−−−−−−−−−−−−→Dtrain
(N −K training points)
Dval(K validation points)
−−−−→
−−−−→
gek = e(g (xk), yk)−−−−−−−−−−−−→ e1, e2, . . . , eK
−−−−→
−−−−→
g Eval =1
K
K∑
k=1
ek
−−−−→
Eout(g )
1. Remove K points from D
D = Dtrain ∪ Dval.
2. Learn using Dtrain −→ g .
3. Test g on Dval −→ Eval.
4. Use error Eval to estimate Eout(g ).
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 6 /29 Reliability of validation −→
![Page 7: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/7.jpg)
The Validation Set
D(N data points)
−−−→
−−−−−−−−−−−−−−−→Dtrain
(N −K training points)
Dval(K validation points)
−−−−→
−−−−→
gek = e(g (xk), yk)−−−−−−−−−−−−→ e1, e2, . . . , eK
−−−−→
−−−−→
g Eval =1
K
K∑
k=1
ek
−−−−→
Eout(g )
Eval is an estimate for Eout(g )
EDval[ek] = Eout(g )
E[Etest] =1
K
K∑
k=1
E[ek]
=1
K
K∑
k=1
Eout(g )= Eout(g )
e1, . . . , eK are independent
Var[Eval] =1
K2
K∑
k=1
Var[ek]
=1
KVar[e(g )]
տdecreases like 1
K
depends on g , not H
bigger K =⇒ more reliable Eval?
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 7 /29 Eval versus K −→
![Page 8: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/8.jpg)
Choosing KPSfrag
Size of Validation Set, K
Exp
ectedE
val
10 20 30
Rule of thumb: K∗ = N5 .
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 8 /29 Restoring D −→
![Page 9: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/9.jpg)
Restoring D
Dval
D(N)
Dtrain
(N −K)
g
(K)
Eval(g )g
CUSTOMER
Primary goal: output best hypothesis.
g was trained on all the data.
Secondary goal: estimate Eout(g).
g is behind closed doors.
Eout(g) Eout(g )
↓ ↓
Ein(g) Eval(g )︸ ︷︷ ︸
which should we use?
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 9 /29 Eval versus Ein −→
![Page 10: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/10.jpg)
Eval Versus Ein
Eout(g) ≤ Ein(g) + O
(√dvc
NlogN
)
Eout(g) ≤ Eout(g )≤Eval(g ) +O
(1√K
)
↑learning curve is decreasing
(a practical truth, not a theorem)
ւBiased error bar depends on H.
տUnbiased error bar depends on g .
Eval(g) usually wins as an estimate for Eout(g), especially when the learning curve is not steep.
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 10 /29 Model Selection −→
![Page 11: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/11.jpg)
Model Selection
The most important use of validation
H1 H2 H3 · · · HM
−−−→
−−−→
−−−→
−−−→
g1 g2 g3 · · · gM
−−−→
E1
Dtrain −−−→
Dval −−−→
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 11 /29 Validation estimate for Eout(g1) −→
![Page 12: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/12.jpg)
Validation Estimate for (H1, g1)
The most important use of validation
H1 H2 H3 · · · HM
−−−→
g1
−−−→
Eval(g1)
Dtrain −−−→
Dval −−−→
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 12 /29 Call it E1 −→
![Page 13: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/13.jpg)
Validation Estimate for (H1, g1)
The most important use of validation
H1 H2 H3 · · · HM
−−−→
g1
−−−→
E1
Dtrain −−−→
Dval −−−→
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 13 /29 Validation estimates E1, . . . , EM −→
![Page 14: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/14.jpg)
Compute Validation Estimates for All Models
The most important use of validation
H1 H2 H3 · · · HM
−−−→
−−−→
−−−→
−−−→
g1 g2 g3 · · · gM
−−−→
−−−→
−−−→
−−−→
E1 E2 E3 · · · EM
Dtrain −−−→
Dval −−−→
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 14 /29 Pick best validation error −→
![Page 15: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/15.jpg)
Pick The Best Model According to Validation Error
The most important use of validation
H1 H2 H3 · · · HM
−−−→
−−−→
−−−→
−−−→
g1 g2 g3 · · · gM
−−−→
−−−→
−−−→
−−−→
E1 E2 E3 · · · EM
Dtrain −−−→
Dval −−−→
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 15 /29 Biased Eval(gm∗) −→
![Page 16: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/16.jpg)
Eval(gm∗) is not Unbiased For Eout(gm∗)
Validation Set Size, K
Exp
ectedError
Eval (gm∗)
Eout (gm∗)
5 15 25
0.5
0.6
0.7
0.8
. . . because we choose one of the M finalists.
Eout(gm∗) ≤ Eval(gm∗) + O
(√lnM
K
)
↑VC error bar for selecting a hypothesisfrom M using a data set of size K.
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 16 /29 Restoring D −→
![Page 17: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/17.jpg)
Restoring D
H1 H2 H3 · · · HM
−−−→
−−−→
−−−→
−−−→
g1 g2 g3 · · · gM
−−−→
E1
Model with best g also has best g ← leap of faith
We can find model with best g using validation ← true modulo Eval error bar
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 17 /29 Comparing Ein and Eval −→
![Page 18: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/18.jpg)
Comparing Ein and Eval for Model Selection
Validation Set Size, K
Exp
ectedE
out
optimal
validation: gm∗
in-sample: gm
validation: gm∗
5 15 25
0.48
0.52
0.56H1 H2 HM
g1 g2 gM
· · ·
· · ·
E1 · · · EM
Dval
Dtrain
gm∗
E2
(Hm∗ , Em∗)
︸ ︷︷ ︸pick the best
D
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 18 /29 Selecting λ −→
![Page 19: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/19.jpg)
Application to Selecting λ
Which regularization parameter to use?
λ1, λ2, . . . , λM .
This is a special case of model selection over M models,
(H, λ1) (H, λ2) (H, λ3) · · · (H, λM)
−−−→
−−−→
−−−→
−−−→
g1 g2 g3 · · · gM
Picking a model amounts to chosing the optimal λ
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 19 /29 Tradeoff with K −→
![Page 20: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/20.jpg)
The Dilemma When Choosing K
Validation relies on the following chain of reasoning,
Eout(g) ≈(small K)
Eout(g ) ≈(large K)
Eval(g )
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 20 /29 K = 1? −→
![Page 21: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/21.jpg)
Can we get away with K = 1?
Yes, almost!
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 21 /29 Leave one out −→
![Page 22: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/22.jpg)
The Leave One Out Error (K = 1)
e1
x
y
E[e1] = Eout(g1)
−−−−−−−−−−→
g1
. . . but it is a wild estimate
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 22 /29 Ecv −→
![Page 23: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/23.jpg)
The Leave One Our Errors
e1
x
y
e2
x
y
e3
x
y
E[e1] = Eout(g1)
Ecv =1
N
N∑
n=1
en
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 23 /29 CV is unbiased −→
![Page 24: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/24.jpg)
Cross Validation is Unbiased
Theorem. Ecv is an unbiased estimate of Eout(N − 1).տ
Expected Eout when learning with N − 1 points.
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 24 /29 Reliability of Ecv −→
![Page 25: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/25.jpg)
Reliability of Ecv
en and em are not independent.
en depends on gn which was trained on (xm, ym).
em is evaluated on (xm, ym).
en Ecv
1 N
Effective number of fresh examplesgiving a comparable estimate of Eout
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 25 /29 Computational considerations −→
![Page 26: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/26.jpg)
Cross Validation is Computationally Intensive
N epochs of learning each on a data set of size N − 1.
• Analytic approaches, for example linear regression with weight decay
wreg = (ZtZ + λI)−1Zty
Ecv =1
N
N∑
n=1
(yn − yn
1− Hnn(λ)
)2
H(λ) = Z(ZtZ + λI)−1Zt.
• 10-fold cross validation
D1 D2 D3 D4 D5 D6 D7 D8 D9 D10
train trainvalidate
D︷ ︸︸ ︷
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 26 /29 Restoring D −→
![Page 27: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/27.jpg)
Restoring D
D1
D
g
g1
D2 · · ·
· · ·
Ecv
︸ ︷︷ ︸take average
gN
g2(x1, y1) (x2, y2) (xN , yN )
DN
e1 e2 eN· · ·
CUSTOMER
Eout(g(N))≤ Eout(N − 1) ≤ Ecv +O
(1√N
).
↑learning curve
↑nearly independent en
Ecv can be used for model selection just as Eval, for example to choose λ.
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 27 /29 Digits −→
![Page 28: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/28.jpg)
Digits Problem: ‘1’ Versus ‘Not 1’
Average Intensity
Sym
metry
1
Not 1# Features Used
Error
Eout
Ecv
Ein
5 10 15 20
0.01
0.02
0.03
x = (1, x1, x2)
z = (1, x1, x2, x21, x1x2, x
22, x
31, x
21x2, x1x
22, x
32, . . . , x
51, x
41x2, x
31x
22, x
21x
32, x1x
42, x
52)
︸ ︷︷ ︸5th order polynomial transform −→ 20 dimensional non linear feature space
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 28 /29 Validation Wins −→
![Page 29: Learning From Data Lecture 13 Validation and Model Selectionmagdon/courses/LFD-Slides/SlidesLect13.pdf · Learning From Data Lecture 13 Validation and Model Selection ... c AML Creator:](https://reader031.fdocument.pub/reader031/viewer/2022022007/5ad2baa97f8b9a92258d7a24/html5/thumbnails/29.jpg)
Validation Wins In the Real World
Average Intensity
Sym
metry
Average Intensity
Sym
metry
no validation (20 features)
Ein = 0%
Eout = 2.5%
cross validation (6 features)
Ein = 0.8%
Eout = 1.5%
c© AML Creator: Malik Magdon-Ismail Validation and Model Selection: 29 /29