Post on 21-Aug-2020
Convexity
Instructor: Taylor Berg-KirkpatrickSlides: Sanjoy Dasgupta
Course website:http://cseweb.ucsd.edu/classes/wi19/cse151-b/
Convexity
ba
A function f : Rd → R is convex if for all a, b ∈ Rd and0 < θ < 1,
f (θa + (1− θ)b) ≤ θf (a) + (1− θ)f (b).
It is strictly convex if strict inequality holds for all a 6= b.
f is concave ⇔ −f is convex
Convexity
ba
A function f : Rd → R is convex if for all a, b ∈ Rd and0 < θ < 1,
f (θa + (1− θ)b) ≤ θf (a) + (1− θ)f (b).
It is strictly convex if strict inequality holds for all a 6= b.
f is concave ⇔ −f is convex
Convexity
ba
A function f : Rd → R is convex if for all a, b ∈ Rd and0 < θ < 1,
f (θa + (1− θ)b) ≤ θf (a) + (1− θ)f (b).
It is strictly convex if strict inequality holds for all a 6= b.
f is concave ⇔ −f is convex
Checking convexity
A function on one variable, f : R→ R, is convex if its secondderivative is ≥ 0 everywhere.
Example: f (z) = z2
Checking convexity
A function on one variable, f : R→ R, is convex if its secondderivative is ≥ 0 everywhere.
Example: f (z) = z2
First and second derivatives of multivariate functions
For a function f : Rd → R,
• the first derivative is a vector with d entries:
∇f (z) =
∂f∂z1...∂f∂zd
• the second derivative is a d × d matrix, the Hessian H(z):
Hjk =∂2f
∂zj∂zk
Example
Find the second derivative matrix of f (z) = ‖z‖2.
f (z) =d∑
i=1
z2i
∇f (z) =
2z1...
2zd
∇2f (z) = 2I
Example
Find the second derivative matrix of f (z) = ‖z‖2.
f (z) =d∑
i=1
z2i
∇f (z) =
2z1...
2zd
∇2f (z) = 2I
Example
Find the second derivative matrix of f (z) = ‖z‖2.
f (z) =d∑
i=1
z2i
∇f (z) =
2z1...
2zd
∇2f (z) = 2I
Example
Find the second derivative matrix of f (z) = ‖z‖2.
f (z) =d∑
i=1
z2i
∇f (z) =
2z1...
2zd
∇2f (z) = 2I
Second-derivative test for convexity
A function f : Rd → R is convex if its matrix of second derivativesis positive semidefinite everywhere.
Recall: every square matrix M encodes a quadratic function:
x 7→ xTMx =d∑
i ,j=1
Mijxixj
(M is a d × d matrix and x is a vector in Rd)
Sometimes xTMx is always ≥ 0, no matter what x you plug in.
Second-derivative test for convexity
A function f : Rd → R is convex if its matrix of second derivativesis positive semidefinite everywhere.
Recall: every square matrix M encodes a quadratic function:
x 7→ xTMx =d∑
i ,j=1
Mijxixj
(M is a d × d matrix and x is a vector in Rd)
Sometimes xTMx is always ≥ 0, no matter what x you plug in.
A hierarchy of square matrices
Square
Positivesemidefinite
Positive definite
Symmetric
M 2 Rd⇥d
M = MT
zT Mz � 0 for all z 2 Rd
zT Mz > 0 for all z 6= 0
[
[
[
A symmetric matrix M is positive semidefinite (psd) if:
zTMz ≥ 0 for all vectors z
PSD or not?
•(
1 11 1
)
(x1 x2
)(1 11 1
)(x1x2
)=(x1 x2
)(x1 + x2x1 + x2
)
= x21 + 2x1x2 + x22
= (x1 + x2)2
•(
1 22 1
)
(x1 x2
)(1 22 1
)(x1x2
)= x21 + 4x1x2 + x22
= (x1 + x2)2 + 2x1x2
A symmetric matrix M is positive semidefinite (psd) if:
zTMz ≥ 0 for all vectors z
PSD or not?
•(
1 11 1
)
(x1 x2
)(1 11 1
)(x1x2
)=(x1 x2
)(x1 + x2x1 + x2
)
= x21 + 2x1x2 + x22
= (x1 + x2)2
•(
1 22 1
)
(x1 x2
)(1 22 1
)(x1x2
)= x21 + 4x1x2 + x22
= (x1 + x2)2 + 2x1x2
A symmetric matrix M is positive semidefinite (psd) if:
zTMz ≥ 0 for all vectors z
PSD or not?
•(
1 11 1
)
(x1 x2
)(1 11 1
)(x1x2
)=(x1 x2
)(x1 + x2x1 + x2
)
= x21 + 2x1x2 + x22
= (x1 + x2)2
•(
1 22 1
)
(x1 x2
)(1 22 1
)(x1x2
)= x21 + 4x1x2 + x22
= (x1 + x2)2 + 2x1x2
A symmetric matrix M is positive semidefinite (psd) if:
zTMz ≥ 0 for all vectors z
PSD or not?
•(
1 11 1
)
(x1 x2
)(1 11 1
)(x1x2
)=(x1 x2
)(x1 + x2x1 + x2
)
= x21 + 2x1x2 + x22
= (x1 + x2)2
•(
1 22 1
)
(x1 x2
)(1 22 1
)(x1x2
)= x21 + 4x1x2 + x22
= (x1 + x2)2 + 2x1x2
A symmetric matrix M is positive semidefinite (psd) if:
zTMz ≥ 0 for all vectors z
When is a diagonal matrix PSD?
A =
a1a2
. . .
an
xTAx = a1x21 + a2x
22 + · · ·+ anx
2n
A symmetric matrix M is positive semidefinite (psd) if:
zTMz ≥ 0 for all vectors z
When is a diagonal matrix PSD?
A =
a1a2
. . .
an
xTAx = a1x21 + a2x
22 + · · ·+ anx
2n
A symmetric matrix M is positive semidefinite (psd) if:
zTMz ≥ 0 for all vectors z
If M,N are of the same size and PSD, must M + N be PSD?
xT (M + N)x = xTMx + xTNx
A symmetric matrix M is positive semidefinite (psd) if:
zTMz ≥ 0 for all vectors z
If M,N are of the same size and PSD, must M + N be PSD?
xT (M + N)x = xTMx + xTNx
Checking if a matrix is PSD
A matrix M is PSD if and only if it can be written as M = UUT
for some matrix U.
Quick check: say U ∈ Rr×d and M = UUT .
1 M is square.
2 M is symmetric.
3 Pick any z ∈ Rr . Then
zTMz = zTUUT z = (zTU)(UT z)
= (UT z)T (UT z)
= ‖UT z‖2 ≥ 0.
Another useful fact: any covariance matrix is PSD.
Checking if a matrix is PSD
A matrix M is PSD if and only if it can be written as M = UUT
for some matrix U.
Quick check: say U ∈ Rr×d and M = UUT .
1 M is square.
2 M is symmetric.
3 Pick any z ∈ Rr . Then
zTMz = zTUUT z = (zTU)(UT z)
= (UT z)T (UT z)
= ‖UT z‖2 ≥ 0.
Another useful fact: any covariance matrix is PSD.
Checking if a matrix is PSD
A matrix M is PSD if and only if it can be written as M = UUT
for some matrix U.
Quick check: say U ∈ Rr×d and M = UUT .
1 M is square.
2 M is symmetric.
3 Pick any z ∈ Rr . Then
zTMz = zTUUT z = (zTU)(UT z)
= (UT z)T (UT z)
= ‖UT z‖2 ≥ 0.
Another useful fact: any covariance matrix is PSD.
Another useful fact: any covariance matrix is PSD.
Σ =1
N(X −M)T (X −M)
=
(1√N
(X −M)
)T ( 1√N
(X −M)
)
where M ∈ RN×d and
M =
µ...µ
Second-derivative test for convexity
A function (of several variables) is convex if its second-derivativematrix is positive semidefinite everywhere.
More formally:Suppose that for f : Rd → R, the second partial derivatives existeverywhere and are continuous functions of z . Then:
1 H(z) is a symmetric matrix
2 f is convex ⇔ H(z) is positive semidefinite for all z ∈ Rd
Example
Is f (x) = ‖x‖2 convex?
• Recall:
∇2f (x) = 2I
• This is a diagonal matrix with all positive entries along thediagonal.
Example
Is f (x) = ‖x‖2 convex?
• Recall:
∇2f (x) = 2I
• This is a diagonal matrix with all positive entries along thediagonal.
Example
Is f (x) = ‖x‖2 convex?
• Recall:
∇2f (x) = 2I
• This is a diagonal matrix with all positive entries along thediagonal.
Fix any vector u ∈ Rd . Is this function f : Rd → R convex?
f (z) = (u · z)2
∇f (z) = 2(u · z)u
∇2f (z) = 2
u21 u1u2 · · · u1udu1u2 u22 · · · u2ud
......
......
u1ud u2ud · · · u2d
= 2uuT
Fix any vector u ∈ Rd . Is this function f : Rd → R convex?
f (z) = (u · z)2
∇f (z) = 2(u · z)u
∇2f (z) = 2
u21 u1u2 · · · u1udu1u2 u22 · · · u2ud
......
......
u1ud u2ud · · · u2d
= 2uuT
Least-squares regression
Recall loss function: for data points (x (i), y (i)) ∈ Rd × R,
L(w) =n∑
i=1
(y (i) − (w · x (i)))2
Logistic regression
Recall loss function: for data (x (i), y (i)) ∈ Rd × {−1,+1},
L(w) =n∑
i=1
ln(1 + e−y (i)(w ·x(i)))
We earlier found the first derivative:
∂L
∂wj= −
n∑
i=1
y (i)x(i)j
1
1 + ey(i)(w ·x(i))
.
Logistic regression
Recall loss function: for data (x (i), y (i)) ∈ Rd × {−1,+1},
L(w) =n∑
i=1
ln(1 + e−y (i)(w ·x(i)))
We earlier found the first derivative:
∂L
∂wj= −
n∑
i=1
y (i)x(i)j
1
1 + ey(i)(w ·x(i))
.
Logistic regression, cont’d
Second derivative: the (j , k) entry of the Hessian H(w) is
∂L
∂wk∂wj=
n∑
i=1
(x(i)j x
(i)k
1
1 + ew ·x(i)1
1 + e−w ·x(i)
)
This is uj · uk , where vectors u1, . . . , ud ∈ Rn are defined as follows:
uj has ith coordinate x(i)j
√1
(1 + ew ·x(i))(1 + e−w ·x(i))
Therefore H(w) = UUT , where U is the matrix with rows uj .Convex!
Logistic regression, cont’d
Second derivative: the (j , k) entry of the Hessian H(w) is
∂L
∂wk∂wj=
n∑
i=1
(x(i)j x
(i)k
1
1 + ew ·x(i)1
1 + e−w ·x(i)
)
This is uj · uk , where vectors u1, . . . , ud ∈ Rn are defined as follows:
uj has ith coordinate x(i)j
√1
(1 + ew ·x(i))(1 + e−w ·x(i))
Therefore H(w) = UUT , where U is the matrix with rows uj .Convex!
Logistic regression, cont’d
Second derivative: the (j , k) entry of the Hessian H(w) is
∂L
∂wk∂wj=
n∑
i=1
(x(i)j x
(i)k
1
1 + ew ·x(i)1
1 + e−w ·x(i)
)
This is uj · uk , where vectors u1, . . . , ud ∈ Rn are defined as follows:
uj has ith coordinate x(i)j
√1
(1 + ew ·x(i))(1 + e−w ·x(i))
Therefore H(w) = UUT , where U is the matrix with rows uj .Convex!