Alternating Direction Method of Multipliers - Stanford University
Transcript of Alternating Direction Method of Multipliers - Stanford University
![Page 1: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/1.jpg)
Alternating Direction Method of Multipliers
Prof S. Boyd
EE364b, Stanford University
source:
Distributed Optimization and Statistical Learning via the AlternatingDirection Method of Multipliers (Boyd, Parikh, Chu, Peleato, Eckstein)
1
![Page 2: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/2.jpg)
Goals
robust methods for
◮ arbitrary-scale optimization
– machine learning/statistics with huge data-sets– dynamic optimization on large-scale network
◮ decentralized optimization
– devices/processors/agents coordinate to solve large problem, by passingrelatively small messages
2
![Page 3: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/3.jpg)
Outline
Dual decomposition
Method of multipliers
Alternating direction method of multipliers
Common patterns
Examples
Consensus and exchange
Conclusions
Dual decomposition 3
![Page 4: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/4.jpg)
Dual problem
◮ convex equality constrained optimization problem
minimize f(x)subject to Ax = b
◮ Lagrangian: L(x, y) = f(x) + yT (Ax− b)
◮ dual function: g(y) = infx L(x, y)
◮ dual problem: maximize g(y)
◮ recover x⋆ = argminx L(x, y⋆)
Dual decomposition 4
![Page 5: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/5.jpg)
Dual ascent
◮ gradient method for dual problem: yk+1 = yk + αk∇g(yk)
◮ ∇g(yk) = Ax− b, where x = argminx L(x, yk)
◮ dual ascent method is
xk+1 := argminx L(x, yk) // x-minimization
yk+1 := yk + αk(Axk+1 − b) // dual update
◮ works, with lots of strong assumptions
Dual decomposition 5
![Page 6: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/6.jpg)
Dual decomposition
◮ suppose f is separable:
f(x) = f1(x1) + · · ·+ fN (xN ), x = (x1, . . . , xN )
◮ then L is separable in x: L(x, y) = L1(x1, y) + · · ·+ LN (xN , y)− yT b,
Li(xi, y) = fi(xi) + yTAixi
◮ x-minimization in dual ascent splits into N separate minimizations
xk+1
i := argminxi
Li(xi, yk)
which can be carried out in parallel
Dual decomposition 6
![Page 7: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/7.jpg)
Dual decomposition
◮ dual decomposition (Everett, Dantzig, Wolfe, Benders 1960–65)
xk+1
i := argminxiLi(xi, y
k), i = 1, . . . , N
yk+1 := yk + αk(∑N
i=1Aix
k+1
i − b)
◮ scatter yk; update xi in parallel; gather Aixk+1
i
◮ solve a large problem
– by iteratively solving subproblems (in parallel)– dual variable update provides coordination
◮ works, with lots of assumptions; often slow
Dual decomposition 7
![Page 8: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/8.jpg)
Outline
Dual decomposition
Method of multipliers
Alternating direction method of multipliers
Common patterns
Examples
Consensus and exchange
Conclusions
Method of multipliers 8
![Page 9: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/9.jpg)
Method of multipliers
◮ a method to robustify dual ascent
◮ use augmented Lagrangian (Hestenes, Powell 1969), ρ > 0
Lρ(x, y) = f(x) + yT (Ax− b) + (ρ/2)‖Ax− b‖22
◮ method of multipliers (Hestenes, Powell; analysis in Bertsekas 1982)
xk+1 := argminx
Lρ(x, yk)
yk+1 := yk + ρ(Axk+1 − b)
(note specific dual update step length ρ)
Method of multipliers 9
![Page 10: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/10.jpg)
Method of multipliers dual update step
◮ optimality conditions (for differentiable f):
Ax⋆ − b = 0, ∇f(x⋆) +AT y⋆ = 0
(primal and dual feasibility)
◮ since xk+1 minimizes Lρ(x, yk)
0 = ∇xLρ(xk+1, yk)
= ∇xf(xk+1) +AT
(
yk + ρ(Axk+1 − b))
= ∇xf(xk+1) +AT yk+1
◮ dual update yk+1 = yk + ρ(xk+1 − b) makes (xk+1, yk+1) dual feasible
◮ primal feasibility achieved in limit: Axk+1 − b → 0
Method of multipliers 10
![Page 11: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/11.jpg)
Method of multipliers
(compared to dual decomposition)
◮ good news: converges under much more relaxed conditions(f can be nondifferentiable, take on value +∞, . . . )
◮ bad news: quadratic penalty destroys splitting of the x-update, so can’tdo decomposition
Method of multipliers 11
![Page 12: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/12.jpg)
Outline
Dual decomposition
Method of multipliers
Alternating direction method of multipliers
Common patterns
Examples
Consensus and exchange
Conclusions
Alternating direction method of multipliers 12
![Page 13: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/13.jpg)
Alternating direction method of multipliers
◮ a method
– with good robustness of method of multipliers– which can support decomposition
◮ “robust dual decomposition” or “decomposable method of multipliers”
◮ proposed by Gabay, Mercier, Glowinski, Marrocco in 1976
Alternating direction method of multipliers 13
![Page 14: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/14.jpg)
Alternating direction method of multipliers
◮ ADMM problem form (with f , g convex)
minimize f(x) + g(z)subject to Ax+Bz = c
– two sets of variables, with separable objective
◮ Lρ(x, z, y) = f(x) + g(z) + yT (Ax+Bz − c) + (ρ/2)‖Ax+Bz − c‖22
◮ ADMM:
xk+1 := argminx Lρ(x, zk, yk) // x-minimization
zk+1 := argminz Lρ(xk+1, z, yk) // z-minimization
yk+1 := yk + ρ(Axk+1 +Bzk+1 − c) // dual update
Alternating direction method of multipliers 14
![Page 15: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/15.jpg)
Alternating direction method of multipliers
◮ if we minimized over x and z jointly, reduces to method of multipliers
◮ instead, we do one pass of a Gauss-Seidel method
◮ we get splitting since we minimize over x with z fixed, and vice versa
Alternating direction method of multipliers 15
![Page 16: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/16.jpg)
ADMM and optimality conditions
◮ optimality conditions (for differentiable case):
– primal feasibility: Ax+Bz − c = 0– dual feasibility: ∇f(x) +AT y = 0, ∇g(z) +BT y = 0
◮ since zk+1 minimizes Lρ(xk+1, z, yk) we have
0 = ∇g(zk+1) +BT yk + ρBT (Axk+1 +Bzk+1 − c)
= ∇g(zk+1) +BT yk+1
◮ so with ADMM dual variable update, (xk+1, zk+1, yk+1) satisfies seconddual feasibility condition
◮ primal and first dual feasibility are achieved as k → ∞
Alternating direction method of multipliers 16
![Page 17: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/17.jpg)
ADMM with scaled dual variables
◮ combine linear and quadratic terms in augmented Lagrangian
Lρ(x, z, y) = f(x) + g(z) + yT (Ax+Bz − c) + (ρ/2)‖Ax+Bz − c‖22= f(x) + g(z) + (ρ/2)‖Ax+Bz − c+ u‖22 + const.
with uk = (1/ρ)yk
◮ ADMM (scaled dual form):
xk+1 := argminx
(
f(x) + (ρ/2)‖Ax+Bzk − c+ uk‖22)
zk+1 := argminz
(
g(z) + (ρ/2)‖Axk+1 +Bz − c+ uk‖22)
uk+1 := uk + (Axk+1 +Bzk+1 − c)
Alternating direction method of multipliers 17
![Page 18: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/18.jpg)
Convergence
◮ assume (very little!)
– f , g convex, closed, proper– L0 has a saddle point
◮ then ADMM converges:
– iterates approach feasibility: Axk +Bzk − c → 0– objective approaches optimal value: f(xk) + g(zk) → p⋆
Alternating direction method of multipliers 18
![Page 19: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/19.jpg)
Related algorithms
◮ operator splitting methods(Douglas, Peaceman, Rachford, Lions, Mercier, . . . 1950s, 1979)
◮ proximal point algorithm (Rockafellar 1976)
◮ Dykstra’s alternating projections algorithm (1983)
◮ Spingarn’s method of partial inverses (1985)
◮ Rockafellar-Wets progressive hedging (1991)
◮ proximal methods (Rockafellar, many others, 1976–present)
◮ Bregman iterative methods (2008–present)
◮ most of these are special cases of the proximal point algorithm
Alternating direction method of multipliers 19
![Page 20: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/20.jpg)
Outline
Dual decomposition
Method of multipliers
Alternating direction method of multipliers
Common patterns
Examples
Consensus and exchange
Conclusions
Common patterns 20
![Page 21: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/21.jpg)
Common patterns
◮ x-update step requires minimizing f(x) + (ρ/2)‖Ax− v‖22(with v = Bzk − c+ uk, which is constant during x-update)
◮ similar for z-update
◮ several special cases come up often
◮ can simplify update by exploit structure in these cases
Common patterns 21
![Page 22: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/22.jpg)
Decomposition
◮ suppose f is block-separable,
f(x) = f1(x1) + · · ·+ fN (xN ), x = (x1, . . . , xN )
◮ A is conformably block separable: ATA is block diagonal
◮ then x-update splits into N parallel updates of xi
Common patterns 22
![Page 23: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/23.jpg)
Proximal operator
◮ consider x-update when A = I
x+ = argminx
(
f(x) + (ρ/2)‖x− v‖22)
= proxf,ρ(v)
◮ some special cases:
f = IC (indicator fct. of set C) x+ := ΠC(v) (projection onto C)
f = λ‖ · ‖1 (ℓ1 norm) x+
i := Sλ/ρ(vi) (soft thresholding)
(Sa(v) = (v − a)+ − (−v − a)+)
Common patterns 23
![Page 24: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/24.jpg)
Quadratic objective
◮ f(x) = (1/2)xTPx+ qTx+ r
◮ x+ := (P + ρATA)−1(ρAT v − q)
◮ use matrix inversion lemma when computationally advantageous
(P + ρATA)−1 = P−1 − ρP−1AT (I + ρAP−1AT )−1AP−1
◮ (direct method) cache factorization of P + ρATA (or I + ρAP−1AT )
◮ (iterative method) warm start, early stopping, reducing tolerances
Common patterns 24
![Page 25: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/25.jpg)
Smooth objective
◮ f smooth
◮ can use standard methods for smooth minimization
– gradient, Newton, or quasi-Newton– preconditionned CG, limited-memory BFGS (scale to very large problems)
◮ can exploit
– warm start– early stopping, with tolerances decreasing as ADMM proceeds
Common patterns 25
![Page 26: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/26.jpg)
Outline
Dual decomposition
Method of multipliers
Alternating direction method of multipliers
Common patterns
Examples
Consensus and exchange
Conclusions
Examples 26
![Page 27: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/27.jpg)
Constrained convex optimization
◮ consider ADMM for generic problem
minimize f(x)subject to x ∈ C
◮ ADMM form: take g to be indicator of C
minimize f(x) + g(z)subject to x− z = 0
◮ algorithm:
xk+1 := argminx
(
f(x) + (ρ/2)‖x− zk + uk‖22)
zk+1 := ΠC(xk+1 + uk)
uk+1 := uk + xk+1 − zk+1
Examples 27
![Page 28: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/28.jpg)
Lasso
◮ lasso problem:
minimize (1/2)‖Ax− b‖22 + λ‖x‖1
◮ ADMM form:
minimize (1/2)‖Ax− b‖22 + λ‖z‖1subject to x− z = 0
◮ ADMM:
xk+1 := (ATA+ ρI)−1(AT b+ ρzk − yk)
zk+1 := Sλ/ρ(xk+1 + yk/ρ)
yk+1 := yk + ρ(xk+1 − zk+1)
Examples 28
![Page 29: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/29.jpg)
Lasso example
◮ example with dense A ∈ R1500×5000
(1500 measurements; 5000 regressors)
◮ computation times
factorization (same as ridge regression) 1.3s
subsequent ADMM iterations 0.03s
lasso solve (about 50 ADMM iterations) 2.9s
full regularization path (30 λ’s) 4.4s
◮ not bad for a very short Matlab script
Examples 29
![Page 30: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/30.jpg)
Sparse inverse covariance selection
◮ S: empirical covariance of samples from N (0,Σ), with Σ−1 sparse(i.e., Gaussian Markov random field)
◮ estimate Σ−1 via ℓ1 regularized maximum likelihood
minimize Tr(SX)− log detX + λ‖X‖1
◮ methods: COVSEL (Banerjee et al 2008), graphical lasso (FHT 2008)
Examples 30
![Page 31: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/31.jpg)
Sparse inverse covariance selection via ADMM
◮ ADMM form:
minimize Tr(SX)− log detX + λ‖Z‖1subject to X − Z = 0
◮ ADMM:
Xk+1 := argminX
(
Tr(SX)− log detX + (ρ/2)‖X − Zk + Uk‖2F)
Zk+1 := Sλ/ρ(Xk+1 + Uk)
Uk+1 := Uk + (Xk+1 − Zk+1)
Examples 31
![Page 32: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/32.jpg)
Analytical solution for X-update
◮ compute eigendecomposition ρ(Zk − Uk)− S = QΛQT
◮ form diagonal matrix X with
Xii =λi +
√
λ2i + 4ρ
2ρ
◮ let Xk+1 := QXQT
◮ cost of X-update is an eigendecomposition
Examples 32
![Page 33: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/33.jpg)
Sparse inverse covariance selection example
◮ Σ−1 is 1000× 1000 with 104 nonzeros
– graphical lasso (Fortran): 20 seconds – 3 minutes– ADMM (Matlab): 3 – 10 minutes– (depends on choice of λ)
◮ very rough experiment, but with no special tuning, ADMM is in ballparkof recent specialized methods
◮ (for comparison, COVSEL takes 25+ min when Σ−1 is a 400× 400tridiagonal matrix)
Examples 33
![Page 34: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/34.jpg)
Outline
Dual decomposition
Method of multipliers
Alternating direction method of multipliers
Common patterns
Examples
Consensus and exchange
Conclusions
Consensus and exchange 34
![Page 35: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/35.jpg)
Consensus optimization
◮ want to solve problem with N objective terms
minimize∑N
i=1fi(x)
– e.g., fi is the loss function for ith block of training data
◮ ADMM form:minimize
∑Ni=1
fi(xi)subject to xi − z = 0
– xi are local variables
– z is the global variable
– xi − z = 0 are consistency or consensus constraints– can add regularization using a g(z) term
Consensus and exchange 35
![Page 36: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/36.jpg)
Consensus optimization via ADMM
◮ Lρ(x, z, y) =∑N
i=1
(
fi(xi) + yTi (xi − z) + (ρ/2)‖xi − z‖22)
◮ ADMM:
xk+1
i := argminxi
(
fi(xi) + ykTi (xi − zk) + (ρ/2)‖xi − zk‖22)
zk+1 :=1
N
N∑
i=1
(
xk+1
i + (1/ρ)yki)
yk+1
i := yki + ρ(xk+1
i − zk+1)
◮ with regularization, averaging in z update is followed by proxg,ρ
Consensus and exchange 36
![Page 37: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/37.jpg)
Consensus optimization via ADMM
◮ using∑N
i=1yki = 0, algorithm simplifies to
xk+1
i := argminxi
(
fi(xi) + ykTi (xi − xk) + (ρ/2)‖xi − xk‖22)
yk+1
i := yki + ρ(xk+1
i − xk+1)
where xk = (1/N)∑N
i=1xki
◮ in each iteration
– gather xk
i and average to get xk
– scatter the average xk to processors– update yk
i locally (in each processor, in parallel)– update xi locally
Consensus and exchange 37
![Page 38: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/38.jpg)
Statistical interpretation
◮ fi is negative log-likelihood for parameter x given ith data block
◮ xk+1
i is MAP estimate under prior N (xk + (1/ρ)yki , ρI)
◮ prior mean is previous iteration’s consensus shifted by ‘price’ of processori disagreeing with previous consensus
◮ processors only need to support a Gaussian MAP method
– type or number of data in each block not relevant– consensus protocol yields global maximum-likelihood estimate
Consensus and exchange 38
![Page 39: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/39.jpg)
Consensus classification
◮ data (examples) (ai, bi), i = 1, . . . , N , ai ∈ Rn, bi ∈ {−1,+1}
◮ linear classifier sign(aTw + v), with weight w, offset v
◮ margin for ith example is bi(aTi w + v); want margin to be positive
◮ loss for ith example is l(bi(aTi w + v))
– l is loss function (hinge, logistic, probit, exponential, . . . )
◮ choose w, v to minimize 1
N
∑Ni=1
l(bi(aTi w + v)) + r(w)
– r(w) is regularization term (ℓ2, ℓ1, . . . )
◮ split data and use ADMM consensus to solve
Consensus and exchange 39
![Page 40: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/40.jpg)
Consensus SVM example
◮ hinge loss l(u) = (1− u)+ with ℓ2 regularization
◮ baby problem with n = 2, N = 400 to illustrate
◮ examples split into 20 groups, in worst possible way:each group contains only positive or negative examples
Consensus and exchange 40
![Page 41: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/41.jpg)
Iteration 1
−3 −2 −1 0 1 2 3−10
−8
−6
−4
−2
0
2
4
6
8
10
Consensus and exchange 41
![Page 42: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/42.jpg)
Iteration 5
−3 −2 −1 0 1 2 3−10
−8
−6
−4
−2
0
2
4
6
8
10
Consensus and exchange 42
![Page 43: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/43.jpg)
Iteration 40
−3 −2 −1 0 1 2 3−10
−8
−6
−4
−2
0
2
4
6
8
10
Consensus and exchange 43
![Page 44: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/44.jpg)
Distributed lasso example
◮ example with dense A ∈ R400000×8000 (roughly 30 GB of data)
– distributed solver written in C using MPI and GSL– no optimization or tuned libraries (like ATLAS, MKL)– split into 80 subsystems across 10 (8-core) machines on Amazon EC2
◮ computation times
loading data 30s
factorization 5m
subsequent ADMM iterations 0.5–2s
lasso solve (about 15 ADMM iterations) 5–6m
Consensus and exchange 44
![Page 45: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/45.jpg)
Exchange problem
minimize∑N
i=1fi(xi)
subject to∑N
i=1xi = 0
◮ another canonical problem, like consensus
◮ in fact, it’s the dual of consensus
◮ can interpret as N agents exchanging n goods to minimize a total cost
◮ (xi)j ≥ 0 means agent i receives (xi)j of good j from exchange
◮ (xi)j < 0 means agent i contributes |(xi)j | of good j to exchange
◮ constraint∑N
i=1xi = 0 is equilibrium or market clearing constraint
◮ optimal dual variable y⋆ is a set of valid prices for the goods
◮ suggests real or virtual cash payment (y⋆)Txi by agent i
Consensus and exchange 45
![Page 46: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/46.jpg)
Exchange ADMM
◮ solve as a generic constrained convex problem with constraint set
C = {x ∈ RnN | x1 + x2 + · · ·+ xN = 0}
◮ scaled form:
xk+1
i := argminxi
(
fi(xi) + (ρ/2)‖xi − xki + xk + uk‖22
)
uk+1 := uk + xk+1
◮ unscaled form:
xk+1
i := argminxi
(
fi(xi) + ykTxi + (ρ/2)‖xi − (xki − xk)‖22
)
yk+1 := yk + ρxk+1
Consensus and exchange 46
![Page 47: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/47.jpg)
Interpretation as tatonnement process
◮ tatonnement process: iteratively update prices to clear market
◮ work towards equilibrium by increasing/decreasing prices of goods basedon excess demand/supply
◮ dual decomposition is the simplest tatonnement algorithm
◮ ADMM adds proximal regularization
– incorporate agents’ prior commitment to help clear market– convergence far more robust convergence than dual decomposition
Consensus and exchange 47
![Page 48: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/48.jpg)
Distributed dynamic energy management
◮ N devices exchange power in time periods t = 1, . . . , T
◮ xi ∈ RT is power flow profile for device i
◮ fi(xi) is cost of profile xi (and encodes constraints)
◮ x1 + · · ·+ xN = 0 is energy balance (in each time period)
◮ dynamic energy management problem is exchange problem
◮ exchange ADMM gives distributed method for dynamic energymanagement
◮ each device optimizes its own profile, with quadratic regularization forcoordination
◮ residual (energy imbalance) is driven to zero
Consensus and exchange 48
![Page 49: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/49.jpg)
Generators
−xt0 2 4 6 8 10012345678910
t0 5 10 15 20
0 5 10 15 20
0 5 10 15 20
00.51
1.5
24605
10
◮ 3 example generators
◮ left: generator costs/limits; right: ramp constraints
◮ can add cost for power changes
Consensus and exchange 49
![Page 50: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/50.jpg)
Fixed loads
t0 5 10 15 200
5
10
15
◮ 2 example fixed loads
◮ cost is +∞ for not supplying load; zero otherwise
Consensus and exchange 50
![Page 51: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/51.jpg)
Shiftable load
t0 5 10 15 20
0
0.5
1
1.5
2
◮ total energy consumed over an interval must exceed given minimum level
◮ limits on energy consumed in each period
◮ cost is +∞ for violating constraints; zero otherwise
Consensus and exchange 51
![Page 52: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/52.jpg)
Battery energy storage system
t0 5 10 15 20
0 5 10 15 20
−0.4−0.2
00.20.40.6
0
2
4
6
◮ energy store with maximum capacity, charge/discharge limits◮ black: battery charge, red: charge/discharge profile◮ cost is +∞ for violating constraints; zero otherwise
Consensus and exchange 52
![Page 53: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/53.jpg)
Electric vehicle charging system
t0 5 10 15 200
10
20
30
40
50
60
70
80
◮ black: desired charge profile; blue: charge profile
◮ shortfall cost for not meeting desired charge
Consensus and exchange 53
![Page 54: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/54.jpg)
HVAC
t0 5 10 15 20
0 5 10 15 20
00.51
1.52
5060708090
100
◮ thermal load (e.g., room, refrigerator) with temperature limits◮ magenta: ambient temperature; blue: load temperature◮ red: cooling energy profile◮ cost is +∞ for violating constraints; zero otherwise
Consensus and exchange 54
![Page 55: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/55.jpg)
External tie
t0 5 10 15 20
0 5 10 15 20
−1
0
1
00.51
1.52
2.5
◮ buy/sell energy from/to external grid at price pext(t)± γ(t)
◮ solid: pext(t); dashed: pext(t)± γ(t)
Consensus and exchange 55
![Page 56: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/56.jpg)
Smart grid example
10 devices (already described above)
◮ 3 generators
◮ 2 fixed loads
◮ 1 shiftable load
◮ 1 EV charging systems
◮ 1 battery
◮ 1 HVAC system
◮ 1 external tie
Consensus and exchange 56
![Page 57: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/57.jpg)
Convergence
iteration: k = 1
t0 5 10 15 20
0 5 10 15 20
0 5 10 15 20
00.51
1.5
246
05
10
t0 5 10 15 20
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
◮ left: solid: optimal generator profile, dashed: profile at kth iteration
◮ right: residual vector xk
Consensus and exchange 57
![Page 58: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/58.jpg)
Convergence
iteration: k = 3
t0 5 10 15 20
0 5 10 15 20
0 5 10 15 20
00.51
1.5
246
05
10
t0 5 10 15 20
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
◮ left: solid: optimal generator profile, dashed: profile at kth iteration
◮ right: residual vector xk
Consensus and exchange 57
![Page 59: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/59.jpg)
Convergence
iteration: k = 5
t0 5 10 15 20
0 5 10 15 20
0 5 10 15 20
00.51
1.5
246
05
10
t0 5 10 15 20
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
◮ left: solid: optimal generator profile, dashed: profile at kth iteration
◮ right: residual vector xk
Consensus and exchange 57
![Page 60: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/60.jpg)
Convergence
iteration: k = 10
t0 5 10 15 20
0 5 10 15 20
0 5 10 15 20
00.51
1.5
246
05
10
t0 5 10 15 20
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
◮ left: solid: optimal generator profile, dashed: profile at kth iteration
◮ right: residual vector xk
Consensus and exchange 57
![Page 61: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/61.jpg)
Convergence
iteration: k = 15
t0 5 10 15 20
0 5 10 15 20
0 5 10 15 20
00.51
1.5
246
05
10
t0 5 10 15 20
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
◮ left: solid: optimal generator profile, dashed: profile at kth iteration
◮ right: residual vector xk
Consensus and exchange 57
![Page 62: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/62.jpg)
Convergence
iteration: k = 20
t0 5 10 15 20
0 5 10 15 20
0 5 10 15 20
00.51
1.5
246
05
10
t0 5 10 15 20
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
◮ left: solid: optimal generator profile, dashed: profile at kth iteration
◮ right: residual vector xk
Consensus and exchange 57
![Page 63: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/63.jpg)
Convergence
iteration: k = 25
t0 5 10 15 20
0 5 10 15 20
0 5 10 15 20
00.51
1.5
246
05
10
t0 5 10 15 20
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
◮ left: solid: optimal generator profile, dashed: profile at kth iteration
◮ right: residual vector xk
Consensus and exchange 57
![Page 64: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/64.jpg)
Convergence
iteration: k = 30
t0 5 10 15 20
0 5 10 15 20
0 5 10 15 20
00.51
1.5
246
05
10
t0 5 10 15 20
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
◮ left: solid: optimal generator profile, dashed: profile at kth iteration
◮ right: residual vector xk
Consensus and exchange 57
![Page 65: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/65.jpg)
Convergence
iteration: k = 35
t0 5 10 15 20
0 5 10 15 20
0 5 10 15 20
00.51
1.5
246
05
10
t0 5 10 15 20
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
◮ left: solid: optimal generator profile, dashed: profile at kth iteration
◮ right: residual vector xk
Consensus and exchange 57
![Page 66: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/66.jpg)
Convergence
iteration: k = 40
t0 5 10 15 20
0 5 10 15 20
0 5 10 15 20
00.51
1.5
246
05
10
t0 5 10 15 20
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
◮ left: solid: optimal generator profile, dashed: profile at kth iteration
◮ right: residual vector xk
Consensus and exchange 57
![Page 67: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/67.jpg)
Convergence
iteration: k = 45
t0 5 10 15 20
0 5 10 15 20
0 5 10 15 20
00.51
1.5
246
05
10
t0 5 10 15 20
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
◮ left: solid: optimal generator profile, dashed: profile at kth iteration
◮ right: residual vector xk
Consensus and exchange 57
![Page 68: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/68.jpg)
Convergence
iteration: k = 50
t0 5 10 15 20
0 5 10 15 20
0 5 10 15 20
00.51
1.5
246
05
10
t0 5 10 15 20
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
◮ left: solid: optimal generator profile, dashed: profile at kth iteration
◮ right: residual vector xk
Consensus and exchange 57
![Page 69: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/69.jpg)
Outline
Dual decomposition
Method of multipliers
Alternating direction method of multipliers
Common patterns
Examples
Consensus and exchange
Conclusions
Conclusions 58
![Page 70: Alternating Direction Method of Multipliers - Stanford University](https://reader031.fdocument.pub/reader031/viewer/2022021211/620657878c2f7b173006de74/html5/thumbnails/70.jpg)
Summary and conclusions
ADMM
◮ is the same as, or closely related to, many methods with other names
◮ has been around since the 1970s
◮ gives simple single-processor algorithms that can be competitive withstate-of-the-art
◮ can be used to coordinate many processors, each solving a substantialproblem, to solve a very large problem
Conclusions 59