Introduction of Online Machine Learning Algorithms
-
Upload
shao-yen-hung -
Category
Data & Analytics
-
view
72 -
download
5
Transcript of Introduction of Online Machine Learning Algorithms
Paper Report for SDM course in 2016
Ad Click Prediction: a View from the Trenches(Online Machine Learning)
報 告 者:蔡宗倫、洪紹嚴、蔡佳盈日期: 2016/12/22
2Final Presentation for SDM-2016
https://aci.info/2014/07/12/the-data-explosion-in-2014-minute-by-minute-infographic/
3Final Presentation for SDM-2016
4Final Presentation for SDM-2016
READ DATA Time Memory
read.csv 264.5 (secs) 8.73 (GB)
fread 33.18 (secs) 2.98 (GB)
read.big.matrix 205.03 (secs) 0.2 (MB)
2GB 資料,四百萬筆資料, 200個變數
lm Time Memory
read.csv X X
fread X X
read.big.matrix 2.72 (mins) 83.6 (MB)
5Final Presentation for SDM-2016
6Final Presentation for SDM-2016
7Final Presentation for SDM-2016
8Final Presentation for SDM-2016
Big Data (TB, PB, ZB)
Model
Train • Memory• Time/Accuracy
Problem
• Parallel Computation: Hadoop, MapReduce, Spark (TB, PB, ZB)
• R-package: read.table, bigmemory, ff (GB)
• Online learning algorithms
Solutions
9Final Presentation for SDM-2016
TG(2009, Microsoft)
FOBOS(2009, Google)
RDA(2010, Microsoft)
FTRL-Proximal(2011, Google)
Logistic Regression
AOGD(2007, IBM)
Adaptive online gradient descend Truncated Gradient
Online learning algorithms
Regularized dual averaging
Follow-the-regularized-Leader Proximal
Forward-Backward Splitting
10Final Presentation for SDM-2016
Big Data (TB, PB, ZB)
Model
Train
Newdata
Renewweights
• Memory• Time/accuracy
Sparsity (LASSO)
SGD/OGD (NN/GBM)
Problem
11Final Presentation for SDM-2016
TG(2009, Microsoft)
FOBOS(2009, Google)
RDA(2010, Microsoft)
FTRL-Proximal(2011, Google)
Logistic Regression
AOGD(2007, IBM)
+¿ ¿
Online learning algorithms
Adaptive online gradient descend Truncated Gradient
Regularized dual averaging
Follow-the-regularized-Leader Proximal
Forward-Backward Splitting
12
Online Gradient Descent-OGDKind of algorithms used on the online convex optimization
Can be formulated as a repeated game between a player and an adversary
At round , the player chooses an action from some convex subset , and then the
adversary chooses a convex loss function
A central question is how the regret grows with the number of rounds
of the game
Final Presentation for SDM-2016
13
Online Gradient Descent-OGDZinkevich considered the following gradient descent algorithm, with step
size
Here,
Final Presentation for SDM-2016
14
Forward-Backward Splitting (FOBOS)
(1)Loss function of Logistic Regression: =
=Batch gradient descend formula:Online gradient descend formula:
=η𝜕
𝑙 (𝑊 𝑡 ,𝑋 )𝜕𝑊 𝑡
Final Presentation for SDM-2016
15
Forward-Backward Splitting (FOBOS)
Final Presentation for SDM-2016
(1)Loss function of Logistic Regression: =
=Batch gradient descend formula:Online gradient descend formula:
(2) FOBOS 的梯度下降公式,可以細分為兩部分: 前部分:微調發生在梯度下降的結果 () 附近 後部分:處理正則化,產生稀疏性
r(w) = (regularization functions)
=
16Final Presentation for SDM-2016
(3) 要求得 (2) 最佳解的充分條件 : 0 屬於其 subgradient set 之中
(4) 因為 , (3) 可以改寫成:
(5) 換句話說,把 (4) 移項之後:
① 迭代前的狀態與梯度 backward
② 當次迭代的正則項資訊 forward
x
y
Forward-Backward Splitting (FOBOS)
17
FOBOS, RDA, FTRL-Proximal
Final Presentation for SDM-2016
(A) :過去的累積梯度量(B) : regularization functions(C) : proximal: = learning rate ( 保證微調不會離 0 或已迭代後的解太遠 )
(non-smooth convex function) : certain subgradient of
18
FOBOS, RDA, FTRL-Proximal
Final Presentation for SDM-2016
OGD 不夠稀疏 FOBOS 能產生更加好的稀疏特徵梯度下降類方法,精度比較好
RDA 可以在精度與稀疏性之間做更好的平衡稀疏性更加出色
最關鍵的不同點是累積 L1 懲罰項的處理方式FTRL-Proximal
綜合 FOBOS 的精度和 RDA 的稀疏性
19Final Presentation for SDM-2016
20Final Presentation for SDM-2016
f(x) = 0.5A + 1.1B + 3.8C + 0.1D + 11E + 41F1 2 3 4
Per-Coordinate
21Final Presentation for SDM-2016
f(x) = 0.4A + 0.8B + 3.8C + 0.8D + 0E + 41F1 2 3 4
8 5 7 3
Per-Coordinate
22Final Presentation for SDM-2016
f(x) = 0.4A + 1.2B + 3.5C + 0.9D + 0.3E + 41F1 2 3 4
8 5 7 3
Per-Coordinate
23Final Presentation for SDM-2016
Big Data (TB, PB, ZB)
Model
Train
Newdata
Renew Weights(per-coordinate)
• Memory• Time/Accuracy
Sparsity (LASSO)
SGD/OGD (NN/GBM)
Problem
FOBOS(2009, Google)
RDA(2010, Microsoft)
FTRL-Proximal(2011, Google)
Logistic Regression
+¿
24Final Presentation for SDM-2016
25Final Presentation for SDM-2016
R package: FTRLProximal
26Final Presentation for SDM-2016
27Final Presentation for SDM-2016
28Final Presentation for SDM-2016
29Final Presentation for SDM-2016
30Final Presentation for SDM-2016
31Final Presentation for SDM-2016
32Final Presentation for SDM-2016
33Final Presentation for SDM-2016
34Final Presentation for SDM-2016
https://www.kaggle.com/c/avazu-ctr-prediction
35Final Presentation for SDM-2016
5.87GB
Prediction result
36Final Presentation for SDM-2016
37Final Presentation for SDM-2016
[1] John Langford, Lihong Li & Tong Zhang. Sparse Online Learning via Truncated Gradient. Journal of Machine Learning Research, 2009.
[2] John Duchi & Yoram Singer. Efficient Online and Batch Learning using Forward Backward Splitting. Journal of Machine Learning Research, 2009.
[3] Lin Xiao. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization. Journal of Machine Learning Research, 2010.
[4] H. B. McMahan. Follow-the-regularized-leader and mirror descent: Equivalence theorems and L1 regularization. In AISTATS, 2011.
[5] H. Brendan McMahan,Gary Holt, D. Sculley et al. Ad Click Prediction: a View from the Trenches. In KDD , 2013.
[6] Peter Bartlett, Elad Hazan, and Alexander Rakhlin. Adaptive online gradient descent. Technical Report UCB/EECS-2007-82, EECS Department, University of California, Berkeley, Jun 2007.
[7] Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In ICML, pages 928–936, 2003.
Reference