Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors...
Transcript of Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors...
![Page 2: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference](https://reader033.fdocument.pub/reader033/viewer/2022042710/5f690a96a2bd6235dc45c8a1/html5/thumbnails/2.jpg)
Outline
LASSO
Primal & Dual formPrimal-dual correspondence
Safe Test for LASSO
Static case (example: sphere test)Dynamic case
Better safe test based on duality gap
Geometric illustrationConvergence
Empirical Results
2 / 18
![Page 3: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference](https://reader033.fdocument.pub/reader033/viewer/2022042710/5f690a96a2bd6235dc45c8a1/html5/thumbnails/3.jpg)
LASSO
X ∈ Rn×p where p� n
minβ∈Rp
1
2‖Xβ − y‖22 + λ‖β‖1 (1)
Commonly used forhigh-dimensional feature selection
Today’s topic—screening tests
Rules to early discard irrelevant features prior to starting aLASSO solver without affecting the final opt solution.
The “chicken-and-egg problem”?
3 / 18
![Page 4: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference](https://reader033.fdocument.pub/reader033/viewer/2022042710/5f690a96a2bd6235dc45c8a1/html5/thumbnails/4.jpg)
LASSO Dual
Primal: minβ∈Rp,z∈Rn
1
2‖y − z‖22 + λ‖β‖1 s.t. z = Xβ (2)
Dual: maxu∈Rn
minβ,z
1
2‖y − z‖22 + λ‖β‖1 + u> (z −Xβ)︸ ︷︷ ︸
Lagrangian L(β,z,u)︸ ︷︷ ︸Dual Objective g(u)
(3)
=⇒ maxu
[minz
(1
2‖y − z‖22 + u>z
)− λmax
β
(u>X
λβ − ‖β‖1
)](4)
=⇒ maxu
1
2‖y‖22 −
1
2‖u− y‖22 − λI{∥∥∥X>u
λ
∥∥∥∞≤1} (5)
θ=uλ====⇒ max
θ∈Rn1
2‖y‖22 −
λ2
2
∥∥∥θ − y
λ
∥∥∥2
2s.t.
∥∥X>θ∥∥∞ ≤ 1 (6)
4 / 18
![Page 5: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference](https://reader033.fdocument.pub/reader033/viewer/2022042710/5f690a96a2bd6235dc45c8a1/html5/thumbnails/5.jpg)
Karush-Khun-Tucker Condition
Primal-dual correspondence
β, z ∈ argminβ,z L(β, z, θ
)(7)
⇓
0n ∈ ∂βL(β, z, θ
)(8)
= ∂β
[1
2‖y − z‖22 + λ‖β‖1 + λθ>
(z −Xβ
)](9)
= λ∂β‖β‖1 − λX>θ (10)
⇓
x>j θ ∈ ∂βj |βj | =
{sign
(βj)
βj 6= 0
[−1, 1] βj = 0∀j ∈ [p] (11)
Key observation: |x>j θ| < 1 =⇒ βj = 0
5 / 18
![Page 6: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference](https://reader033.fdocument.pub/reader033/viewer/2022042710/5f690a96a2bd6235dc45c8a1/html5/thumbnails/6.jpg)
Safe Test
|x>j θ| < 1 =⇒ βj = 0
Challenge: dual solution θ is unknown
Workaround: relaxation
Let C be a set containing θ and define µC (xj) := supθ∈C |x>j θ|.Obviously |x>j θ| ≤ µC (xj).
Safe Test
µC (xj) < 1 =⇒ βj = 0 (12)
The test is useful when
1 µC (xj) can be evaluated efficiently.
2 C is small—thus leading to small µC (xj).
Goal: Find C satisfying both conditions above.
6 / 18
![Page 7: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference](https://reader033.fdocument.pub/reader033/viewer/2022042710/5f690a96a2bd6235dc45c8a1/html5/thumbnails/7.jpg)
Sphere Tests
Parametrize C as a closed `2-ball B (c, r) = {θ : ‖θ − c‖2 ≤ r}.
µC (xj) = µB(c,r)(xj) = supθ∈B(c,r)
|x>j θ| = |c>xj |+ r‖xj‖2 (13)
Q: How to find B(c, r) that contains θ without knowing θ?A: Any dual-feasible θ′ defines a ball over θ∥∥θ − y
λ︸︷︷︸c
∥∥2≤∥∥θ′ − y
λ
∥∥2︸ ︷︷ ︸
r
(14)
Recall: maxθ∈Rn12‖y‖
22 − λ2
2
∥∥θ − yλ
∥∥2
2s.t.
∥∥X>θ∥∥∞ ≤ 1︸ ︷︷ ︸
θ∈∆X
A trivial feasible point: θ′ = yλmax
where λmax := ‖X>y‖∞ 1.
=⇒ c =y
λ, r = ‖y‖
∣∣∣∣ 1λ − 1
λmax
∣∣∣∣ (15)
1In fact, θ′ obtained in this manner is the dual solution when λ = λmax,corresponding to all-zero primal solution.
7 / 18
![Page 8: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference](https://reader033.fdocument.pub/reader033/viewer/2022042710/5f690a96a2bd6235dc45c8a1/html5/thumbnails/8.jpg)
Dynamic Safe Tests
To iteratively apply safe tests as the algorithm proceeds
Recall ∀θ′ ∈ ∆X defines an `2-ball containing θ
Let θk ∈ ∆X be a dual-feasible point at iteration k, {θk}k∈Ndefines a sequence of balls
{B( yλ ,∥∥θk − y
λ
∥∥2
)}k∈N
Each ball defines a safe test
How θk is defined via βk?
θk := Π∆X∩span(ρk)
( yλ
)where ρk := y −Xβk
Intuition1 Dual opt: θ = Π∆X
(yλ
)2 Primal-dual correspondence: θ ∈ span (ρ) where ρ := y −Xβ
8 / 18
![Page 9: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference](https://reader033.fdocument.pub/reader033/viewer/2022042710/5f690a96a2bd6235dc45c8a1/html5/thumbnails/9.jpg)
Mind the Duality Gap
Can we better bound θ by also leveraging primal info?
∀ (β, θ) ∈ Rp ×∆X , we claim
R (β) ≤∥∥θ − y
λ
∥∥ ≤ R (θ) (16)
where
R (θ) = ‖θ − yλ‖ (dual optimality)
R (β) = 1λ
√(‖y‖2 − ‖y −Xβ‖2 − 2λ‖β‖2)+ (duality gap)
weak duality
1
2‖y‖2 − λ2
2
∥∥θ − y
λ
∥∥2 ≤ 1
2‖y −Xβ‖2 + λ‖β‖1 (17)
Therefore, θ lies in an annulus, i.e. A(yλ , R (θ) , R (β)
)9 / 18
![Page 10: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference](https://reader033.fdocument.pub/reader033/viewer/2022042710/5f690a96a2bd6235dc45c8a1/html5/thumbnails/10.jpg)
Geometric Illustration
Recall R (β) ≤∥∥θ − y
λ
∥∥ ≤ R (θ)
1[Fercoq et al., 2015]
10 / 18
![Page 11: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference](https://reader033.fdocument.pub/reader033/viewer/2022042710/5f690a96a2bd6235dc45c8a1/html5/thumbnails/11.jpg)
Fine-grained Analysis
Two geometrical observations
1
[θ, θ]⊆ A
(yλ , R (θ) , R (θ)
)Proof.
Convexity of polyhedron ∆X =⇒ convexity of ∆X ∩B( yλ , R (θ)
)=⇒ convexity of ∆X ∩A
(yλ , R (θ) , R (β)
)containing θ, θ
2 vecAngle(θ − θ, yλ − θ
)≥ 90◦
Proof.
∀θ′ ∈[θ, θ]
we have ‖ yλ − θ‖ ≤ ‖yλ − θ
′‖ as θ, θ′ ∈ ∆X and
θ = Π∆X
( yλ
). However, suppose vecAngle
(θ − θ, yλ − θ
)< 90◦,
contradiction occurs by setting θ′ := Π[θ,θ]( yλ
).
11 / 18
![Page 12: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference](https://reader033.fdocument.pub/reader033/viewer/2022042710/5f690a96a2bd6235dc45c8a1/html5/thumbnails/12.jpg)
Geometric Illustration
Recall vecAngle(θ − θ, yλ − θ
)≥ 90◦
1[Fercoq et al., 2015]
12 / 18
![Page 13: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference](https://reader033.fdocument.pub/reader033/viewer/2022042710/5f690a96a2bd6235dc45c8a1/html5/thumbnails/13.jpg)
Safe Tests Refined
Two convex relaxation schemes
Sphere Crelaxed = B(θ,
√R (θ)2 − R (β)2︸ ︷︷ ︸
r(θ,β)
)(18)
Dome Crelaxed = conv (darkBlueRegion) (19)
1[Fercoq et al., 2015]13 / 18
![Page 14: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference](https://reader033.fdocument.pub/reader033/viewer/2022042710/5f690a96a2bd6235dc45c8a1/html5/thumbnails/14.jpg)
Convergence
Proposition
Let G (β, θ) be the LASSO duality gap, ∀ (β, θ) ∈ Rp ×∆X wehave r (β, θ)2 ≤ 2
λ2G (β, θ)
1
2‖y‖2 − λ2
2
∥∥θ − y
λ
∥∥2︸ ︷︷ ︸dual obj
+G(β, θ) =1
2‖y −Xβ‖2 + λ‖β‖1︸ ︷︷ ︸
primal obj
(20)
2
λ2G(β, θ) =
∥∥θ − y
λ
∥∥2︸ ︷︷ ︸=R(θ)2
− 1
λ2
(‖y‖2 − ‖y −Xβ‖2 − 2λ‖β‖1
)︸ ︷︷ ︸
≤R(β)2
(21)
≥r (β, θ)2 (22)
=⇒ limk→∞ r (βk, θk) = 0. Convergence of domes is also implied.
14 / 18
![Page 15: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference](https://reader033.fdocument.pub/reader033/viewer/2022042710/5f690a96a2bd6235dc45c8a1/html5/thumbnails/15.jpg)
Experiment Results (a)
Proportion of active variables v.s. (1) num of iterations (2) λ
1[Fercoq et al., 2015]
15 / 18
![Page 16: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference](https://reader033.fdocument.pub/reader033/viewer/2022042710/5f690a96a2bd6235dc45c8a1/html5/thumbnails/16.jpg)
Experiment Results (b)
Leukemiapn = 7129
72 ≈ 99.0
20NewsGrouppn = 10094
961 ≈ 10.5
RCV1pn = 47236
20242 ≈ 2.3
1[Fercoq et al., 2015]
16 / 18
![Page 17: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference](https://reader033.fdocument.pub/reader033/viewer/2022042710/5f690a96a2bd6235dc45c8a1/html5/thumbnails/17.jpg)
Conclusion
Summary
LASSO - primal & dual
Safe rules - discard irrelevant variables prior to optimization
Refined safe rules based on duality gap
Additional Note
Safe rules can be applied to other models as well, e.g. supportvector machines [Ogawa et al., 2013]
17 / 18
![Page 18: Screening Tests for the LASSOhanxiaol/slides/screening.pdf · Safe screening of non-support vectors in pathwise svm computation. In Proceedings of the 30th International Conference](https://reader033.fdocument.pub/reader033/viewer/2022042710/5f690a96a2bd6235dc45c8a1/html5/thumbnails/18.jpg)
Reference I
El Ghaoui, L., Viallon, V., and Rabbani, T. (2010).Safe feature elimination in sparse supervised learning technical report no.Technical report, UCB/EECS-2010-126, EECS Department, University ofCalifornia, Berkeley.
Fercoq, O., Gramfort, A., and Salmon, J. (2015).Mind the duality gap: safer rules for the lasso.arXiv preprint arXiv:1505.03410.
Ogawa, K., Suzuki, Y., and Takeuchi, I. (2013).Safe screening of non-support vectors in pathwise svm computation.In Proceedings of the 30th International Conference on Machine Learning, pages1382–1390.
Xiang, Z. J., Wang, Y., and Ramadge, P. J. (2014).Screening tests for lasso problems.arXiv preprint arXiv:1405.4897.
18 / 18