Post on 19-May-2020
Prediction, Learning and Games - Chapter 7Prediction and Playing Games
Walid Krichene
November 4, 2013
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 1 / 21
The setting
K playersplayer k : Nk actions
joint action vector i ∈ [N1]× · · · × [NK ]
loss vector `(i) ∈ [0, 1]
players randomize: mixed strategy profile π ∈ ∆[N1] × · · · ×∆[NK ]
π(i) = p(1)(i1) . . . p(K)(iK )
joint random action I ∈ [N1]× [NK ], I ∼ πexpected loss E[`(k)] =
∑i π(i)`(k)(i)
by abuse of notation `(k)(π) (can think of it as a linear extension of ` from[N1]× · · · × [NK ] to ∆[N1] × · · · ×∆[NK ])
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 2 / 21
The setting
K playersplayer k : Nk actionsjoint action vector i ∈ [N1]× · · · × [NK ]
loss vector `(i) ∈ [0, 1]
players randomize: mixed strategy profile π ∈ ∆[N1] × · · · ×∆[NK ]
π(i) = p(1)(i1) . . . p(K)(iK )
joint random action I ∈ [N1]× [NK ], I ∼ πexpected loss E[`(k)] =
∑i π(i)`(k)(i)
by abuse of notation `(k)(π) (can think of it as a linear extension of ` from[N1]× · · · × [NK ] to ∆[N1] × · · · ×∆[NK ])
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 2 / 21
The setting
K playersplayer k : Nk actionsjoint action vector i ∈ [N1]× · · · × [NK ]
loss vector `(i) ∈ [0, 1]
players randomize: mixed strategy profile π ∈ ∆[N1] × · · · ×∆[NK ]
π(i) = p(1)(i1) . . . p(K)(iK )
joint random action I ∈ [N1]× [NK ], I ∼ πexpected loss E[`(k)] =
∑i π(i)`(k)(i)
by abuse of notation `(k)(π) (can think of it as a linear extension of ` from[N1]× · · · × [NK ] to ∆[N1] × · · · ×∆[NK ])
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 2 / 21
Nash equilibrium
Nash equilibriump is a Nash eq. if for all k and all p′
`(k)(p(k), p(−k)) ≤ `(k)(p′, p(−k))
no one has an incentive to unilaterally deviate.set of Nash equilibria: N
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 3 / 21
Correlated equilibria
Correlated equilibria (Aumann)Probability distribution P over [N1]× · · · × [NK ], which is not necessarily aproduct distribution
Some coordinator gives advice I ∼ PIf player unilaterally deviates, worse-off in expectation.
Equivalent to “switching j and j ′ is a bad idea”. For all j , j ′∑i :ik=j
P(i)[`(k)(j , i (−k))− `(k)(j ′, i (−k))] ≤ 0
Link to internal regret.
Structure of correlated equilibriaThink of P as a vector in ∆[N1]×···×[NK ]
Linear inequalities in P ⇒ closed convex polyhedronConvex combination of Nash eq ⇒ correlated eq. (but not ⇐)
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 4 / 21
Correlated equilibria
Correlated equilibria (Aumann)Probability distribution P over [N1]× · · · × [NK ], which is not necessarily aproduct distributionSome coordinator gives advice I ∼ PIf player unilaterally deviates, worse-off in expectation.
Equivalent to “switching j and j ′ is a bad idea”. For all j , j ′∑i :ik=j
P(i)[`(k)(j , i (−k))− `(k)(j ′, i (−k))] ≤ 0
Link to internal regret.
Structure of correlated equilibriaThink of P as a vector in ∆[N1]×···×[NK ]
Linear inequalities in P ⇒ closed convex polyhedronConvex combination of Nash eq ⇒ correlated eq. (but not ⇐)
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 4 / 21
Correlated equilibria
Correlated equilibria (Aumann)Probability distribution P over [N1]× · · · × [NK ], which is not necessarily aproduct distributionSome coordinator gives advice I ∼ PIf player unilaterally deviates, worse-off in expectation.
Equivalent to “switching j and j ′ is a bad idea”. For all j , j ′∑i :ik=j
P(i)[`(k)(j , i (−k))− `(k)(j ′, i (−k))] ≤ 0
Link to internal regret.
Structure of correlated equilibriaThink of P as a vector in ∆[N1]×···×[NK ]
Linear inequalities in P ⇒ closed convex polyhedronConvex combination of Nash eq ⇒ correlated eq. (but not ⇐)
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 4 / 21
Correlated equilibria
Correlated equilibria (Aumann)Probability distribution P over [N1]× · · · × [NK ], which is not necessarily aproduct distributionSome coordinator gives advice I ∼ PIf player unilaterally deviates, worse-off in expectation.
Equivalent to “switching j and j ′ is a bad idea”. For all j , j ′∑i :ik=j
P(i)[`(k)(j , i (−k))− `(k)(j ′, i (−k))] ≤ 0
Link to internal regret.
Structure of correlated equilibriaThink of P as a vector in ∆[N1]×···×[NK ]
Linear inequalities in P ⇒ closed convex polyhedronConvex combination of Nash eq ⇒ correlated eq. (but not ⇐)
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 4 / 21
Repeated games
k maintains p(k)t , draws action I k
t ∼ p(k)t
observes all players’ actions It = (I (1)t , . . . , I (K)
t )
Uncoupled play: player only knows his loss function. In fact, only his regret.
Regret-minimizing procedureDoes the repeated play converge to N (or other equilibrium concept) in somesense?
Summary of resultsMinimizing regret:
I average marginal distributions p(1) × · · · × p(K) converge to NI average joint distribution p(1) × p(K) converges to the Hannan set
Minimizing internal regret:I average joint distribution p(1) × p(K) converges to correlated equilibria
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 5 / 21
Repeated games
k maintains p(k)t , draws action I k
t ∼ p(k)t
observes all players’ actions It = (I (1)t , . . . , I (K)
t )
Uncoupled play: player only knows his loss function.
In fact, only his regret.
Regret-minimizing procedureDoes the repeated play converge to N (or other equilibrium concept) in somesense?
Summary of resultsMinimizing regret:
I average marginal distributions p(1) × · · · × p(K) converge to NI average joint distribution p(1) × p(K) converges to the Hannan set
Minimizing internal regret:I average joint distribution p(1) × p(K) converges to correlated equilibria
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 5 / 21
Repeated games
k maintains p(k)t , draws action I k
t ∼ p(k)t
observes all players’ actions It = (I (1)t , . . . , I (K)
t )
Uncoupled play: player only knows his loss function. In fact, only his regret.
Regret-minimizing procedureDoes the repeated play converge to N (or other equilibrium concept) in somesense?
Summary of resultsMinimizing regret:
I average marginal distributions p(1) × · · · × p(K) converge to NI average joint distribution p(1) × p(K) converges to the Hannan set
Minimizing internal regret:I average joint distribution p(1) × p(K) converges to correlated equilibria
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 5 / 21
Repeated games
k maintains p(k)t , draws action I k
t ∼ p(k)t
observes all players’ actions It = (I (1)t , . . . , I (K)
t )
Uncoupled play: player only knows his loss function. In fact, only his regret.
Regret-minimizing procedureDoes the repeated play converge to N (or other equilibrium concept) in somesense?
Summary of resultsMinimizing regret:
I average marginal distributions p(1) × · · · × p(K) converge to NI average joint distribution p(1) × p(K) converges to the Hannan set
Minimizing internal regret:I average joint distribution p(1) × p(K) converges to correlated equilibria
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 5 / 21
Regret-minimization Vs. fictitious playOn iteration t, play best response to cumulative losses
`(k)t : vector of losses `(k)
t (i (k)) = `(k)(i (k), I (−k)t )
L(k)t =
t∑τ=1
`(k)τ
Fictitious play
i (k)t+1 ∈ arg min
i∈[Nk ]L(k)
t (i)
equivalent tomin
p∈∆[Nk ]
⟨p, L(k)
t
⟩
Only converges in very special cases (e.g. 2 players, 2 actions each)Not Hannan-consistent (does not minimize average expected regret)
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 6 / 21
Regret-minimization Vs. fictitious playOn iteration t, play best response to cumulative losses
`(k)t : vector of losses `(k)
t (i (k)) = `(k)(i (k), I (−k)t )
L(k)t =
t∑τ=1
`(k)τ
Fictitious play
i (k)t+1 ∈ arg min
i∈[Nk ]L(k)
t (i)
equivalent tomin
p∈∆[Nk ]
⟨p, L(k)
t
⟩
Only converges in very special cases (e.g. 2 players, 2 actions each)Not Hannan-consistent (does not minimize average expected regret)
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 6 / 21
Regret-minimization Vs. fictitious playOn iteration t, play best response to cumulative losses
`(k)t : vector of losses `(k)
t (i (k)) = `(k)(i (k), I (−k)t )
L(k)t =
t∑τ=1
`(k)τ
Fictitious play
i (k)t+1 ∈ arg min
i∈[Nk ]L(k)
t (i)
equivalent tomin
p∈∆[Nk ]
⟨p, L(k)
t
⟩
Only converges in very special cases (e.g. 2 players, 2 actions each)Not Hannan-consistent (does not minimize average expected regret)
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 6 / 21
Regret-minimization Vs. fictitious play
Fictitious play
p(k)t+1 ∈ arg min
p∈∆[Nk ]
⟨p, L(k)
t
⟩
Regret-minimizationSome algorithms can be written as
p(k)t+1 ∈ arg min
p∈∆[Nk ]
⟨p, L(k)
t
⟩+1γ
R(p)
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 7 / 21
A Minimax theorem
Minimax Theorem via Regret minimizationf (x , y)
X convex compact, f (·, y) is convex continuousY convex compact, f (x , ·) is concave continuous.
Theninfxsupy
f (x , y) = supy
infx
f (x , y)
≥: OK. When x plays first, worse than playing second.
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 8 / 21
A Minimax theorem
Minimax Theorem via Regret minimizationf (x , y)
X convex compact, f (·, y) is convex continuousY convex compact, f (x , ·) is concave continuous.
Theninfxsupy
f (x , y) = supy
infx
f (x , y)
≥: OK. When x plays first, worse than playing second.
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 8 / 21
A Minimax theorem
Minimax Theorem via Regret minimizationf (x , y)
X convex compact, f (·, y) is convex continuousY convex compact, f (x , ·) is concave continuous.
Theninfxsupy
f (x , y) = supy
infx
f (x , y)
≥: OK. When x plays first, worse than playing second.
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 8 / 21
A Minimax theorem
proof of ≤:Idea: sequential play (xt), (yt), then
infxsupy
f (x , y) ≤ 1t
∑τ≤t
f (xτ , yτ ) ≤ supy
infx
f (x , y) + O(1√t
)
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 9 / 21
A Minimax theorem
Let ε > 0.Find a finite cover of X by balls B(a(i), ε). Use a(i) as the actions.
x plays firstApplies regret-minimizing procedureaction set {a(i)}ilosses f (a(i), yt)
y plays secondplays best responseyt ∈ argmaxy∈Y f (xt , y)
xt =1t
t∑τ=1
xτ yt =1t
t∑τ=1
yτ
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 10 / 21
A Minimax theorem
Let ε > 0.Find a finite cover of X by balls B(a(i), ε). Use a(i) as the actions.
x plays firstApplies regret-minimizing procedureaction set {a(i)}ilosses f (a(i), yt)
y plays secondplays best responseyt ∈ argmaxy∈Y f (xt , y)
xt =1t
t∑τ=1
xτ yt =1t
t∑τ=1
yτ
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 10 / 21
A Minimax theorem
Let ε > 0.Find a finite cover of X by balls B(a(i), ε). Use a(i) as the actions.
x plays firstApplies regret-minimizing procedureaction set {a(i)}ilosses f (a(i), yt)
y plays secondplays best responseyt ∈ argmaxy∈Y f (xt , y)
xt =1t
t∑τ=1
xτ yt =1t
t∑τ=1
yτ
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 10 / 21
A Minimax theorem
infxsupy
f (x , y) ≤ supy
f (xt , y)
≤ 1t
∑τ≤t
f (xτ , y) by convexity
≤ 1t
∑τ≤t
f (xτ , yτ ) y plays optimally
≤ mini
1t
∑τ≤t
f (a(i), yτ ) +
√lnN2t
by the regret bound
≤ mini
f (a(i), yt)
√lnN2t
by concavity
≤ supy
mini
f (a(i), y) +
√lnN2t
Then t →∞ and ε→ 0
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 11 / 21
Zero-sum two player game
`(row)(i , j) = −`(column)(i , j)
Matrix of losses: M such that Mi,j = `(row)(i , j)Joint strategy: (p, q)Loss of row player:
∑i,j piMi,jqj = 〈p,Mq〉
Nash equilibrium(p, q) is a Nash eq. iff for all p′, q′
〈p,Mq′〉 ≤ 〈p,Mq〉 ≤ 〈p′,Mq〉
In fact, by Von Neumann’s minimax theorem,
minp′
maxq′〈p′,Mq′〉 = max
q′minp′〈p′,Mq′〉
Call it V , the value of the game.
Value of the game
(p, q) is a Nash eq. ⇔ 〈p,Mq〉 = V
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 12 / 21
Zero-sum two player game
`(row)(i , j) = −`(column)(i , j)
Matrix of losses: M such that Mi,j = `(row)(i , j)Joint strategy: (p, q)Loss of row player:
∑i,j piMi,jqj = 〈p,Mq〉
Nash equilibrium(p, q) is a Nash eq. iff for all p′, q′
〈p,Mq′〉 ≤ 〈p,Mq〉 ≤ 〈p′,Mq〉
In fact, by Von Neumann’s minimax theorem,
minp′
maxq′〈p′,Mq′〉 = max
q′minp′〈p′,Mq′〉
Call it V , the value of the game.
Value of the game
(p, q) is a Nash eq. ⇔ 〈p,Mq〉 = V
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 12 / 21
Zero-sum two player game
`(row)(i , j) = −`(column)(i , j)
Matrix of losses: M such that Mi,j = `(row)(i , j)Joint strategy: (p, q)Loss of row player:
∑i,j piMi,jqj = 〈p,Mq〉
Nash equilibrium(p, q) is a Nash eq. iff for all p′, q′
〈p,Mq′〉 ≤ 〈p,Mq〉 ≤ 〈p′,Mq〉
In fact, by Von Neumann’s minimax theorem,
minp′
maxq′〈p′,Mq′〉 = max
q′minp′〈p′,Mq′〉
Call it V , the value of the game.
Value of the game
(p, q) is a Nash eq. ⇔ 〈p,Mq〉 = V
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 12 / 21
Repeated zero-sum two-player game
Row player plays firstGiven a sequence of column plays, j1, j2, . . . , score performance of rowsequence i1, i2, . . . by evaluating regret
maxi
T∑t=1
`(row)(it , jj)−T∑
t=1
`(row)(i , jt)
Can be extended to time varying: `(row)t
NoteImplicit assumption: oblivious opponent.This is not the loss Row would have suffered had he played a constant strategy.
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 13 / 21
Repeated zero-sum two-player game
Row player plays firstGiven a sequence of column plays, j1, j2, . . . , score performance of rowsequence i1, i2, . . . by evaluating regret
maxi
T∑t=1
`(row)(it , jj)−T∑
t=1
`(row)(i , jt)
Can be extended to time varying: `(row)t
NoteImplicit assumption: oblivious opponent.This is not the loss Row would have suffered had he played a constant strategy.
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 13 / 21
Repeated zero-sum two-player game
Row player plays firstGiven a sequence of column plays, j1, j2, . . . , score performance of rowsequence i1, i2, . . . by evaluating regret
maxi
T∑t=1
`(row)(it , jj)−T∑
t=1
`(row)(i , jt)
Can be extended to time varying: `(row)t
NoteImplicit assumption: oblivious opponent.This is not the loss Row would have suffered had he played a constant strategy.
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 13 / 21
Repeated zero-sum two-player game
Convergence of average lossesIf row strategies are Hannan-consistent, then
lim supT
1T
T∑t=1
`(row)(it , jt) ≤ V
Note: only upper bound: can do much better if column player is notadversarialproof: suffices to show
lim supT
mini
1T
T∑t=1
`(row)(i , jt) ≤ V
(then use regret bound)
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 14 / 21
Repeated zero-sum two-player game
Convergence of average lossesIf row strategies are Hannan-consistent, then
lim supT
1T
T∑t=1
`(row)(it , jt) ≤ V
Note: only upper bound: can do much better if column player is notadversarial
proof: suffices to show
lim supT
mini
1T
T∑t=1
`(row)(i , jt) ≤ V
(then use regret bound)
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 14 / 21
Repeated zero-sum two-player game
Convergence of average lossesIf row strategies are Hannan-consistent, then
lim supT
1T
T∑t=1
`(row)(it , jt) ≤ V
Note: only upper bound: can do much better if column player is notadversarialproof: suffices to show
lim supT
mini
1T
T∑t=1
`(row)(i , jt) ≤ V
(then use regret bound)
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 14 / 21
Repeated zero-sum two-player game
proof:
mini
1T
T∑t=1
`(row)(i , jt) = minp
1T
T∑t=1
〈p,Mejt 〉
= minp
⟨p,M
1T
T∑t=1
ejt
⟩≤ min
pmax
q〈p,Mq〉
= V
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 15 / 21
Repeated zero-sum two-player game
If both players minimize regret, average distributions converge to Npt = 1
t
∑τ≤t pτ
qt = 1t
∑τ≤t qτ
pt × qt → N
Does not prove convergence of actual probability sequences (pt , qt)
Typical example: sequences (pt , qt) oscillate, but averages (pt , qt) convergeto N
Also does not prove convergence of average joint distribution, p × q
p × qt(i , j) =1T
∑τ≤t
pt(i)qt(j)
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 16 / 21
Repeated zero-sum two-player game
If both players minimize regret, average distributions converge to Npt = 1
t
∑τ≤t pτ
qt = 1t
∑τ≤t qτ
pt × qt → N
Does not prove convergence of actual probability sequences (pt , qt)
Typical example: sequences (pt , qt) oscillate, but averages (pt , qt) convergeto N
Also does not prove convergence of average joint distribution, p × q
p × qt(i , j) =1T
∑τ≤t
pt(i)qt(j)
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 16 / 21
Repeated zero-sum two-player game
If both players minimize regret, average distributions converge to Npt = 1
t
∑τ≤t pτ
qt = 1t
∑τ≤t qτ
pt × qt → N
Does not prove convergence of actual probability sequences (pt , qt)
Typical example: sequences (pt , qt) oscillate, but averages (pt , qt) convergeto N
Also does not prove convergence of average joint distribution, p × q
p × qt(i , j) =1T
∑τ≤t
pt(i)qt(j)
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 16 / 21
Note: average marginals Vs. average joint
They are not the same
pt × qt(i , j) =1
T 2
T∑t=1
T∑t′=1
pt(i)qt′(j)
p × qt(i , j) =1T
T∑t=1
pt(i)qt(j)
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 17 / 21
Correlated equilibria and internal regret
Back to general K player game.Let P be the average joint distribution
Pt(i) =1t
∑τ≤t
p(1)(i1) . . . p(K)(iK )
If every player minimizes regret, Pt converges to the Hannan set
H = {P : ∀k, ∀j ,∑
i
P(i)`(k)(i (k), i (−k)) ≤∑
i
P(i)`(k)(j , i (−k))}
H contains correlated equilibria, but proper superset in most cases.
N ⊂ C ⊂ H
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 18 / 21
Correlated equilibria and internal regret
Back to general K player game.Let P be the average joint distribution
Pt(i) =1t
∑τ≤t
p(1)(i1) . . . p(K)(iK )
If every player minimizes regret, Pt converges to the Hannan set
H = {P : ∀k, ∀j ,∑
i
P(i)`(k)(i (k), i (−k)) ≤∑
i
P(i)`(k)(j , i (−k))}
H contains correlated equilibria, but proper superset in most cases.
N ⊂ C ⊂ H
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 18 / 21
Correlated equilibria and internal regret
Back to general K player game.Let P be the average joint distribution
Pt(i) =1t
∑τ≤t
p(1)(i1) . . . p(K)(iK )
If every player minimizes regret, Pt converges to the Hannan set
H = {P : ∀k, ∀j ,∑
i
P(i)`(k)(i (k), i (−k)) ≤∑
i
P(i)`(k)(j , i (−k))}
H contains correlated equilibria, but proper superset in most cases.
N ⊂ C ⊂ H
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 18 / 21
Correlated equilibria and internal regret
Convergence of average joint to CIf players minimize internal regret, then Pt → C
Recall the definition of internal regret
Internal regret
R(k)j,j′,T =
T∑t=1
p(k)j,t (`(j , I (−k))− `(j ′, I (−k)))
RT = maxj,j′
Rj,j′,T
Also recall that we can turn any regret-minimizing procedure over N actions, tointernal-regret minimizing procedure, by taking the set of pairs (size N2)
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 19 / 21
Correlated equilibria and internal regret
Convergence of average joint to CIf players minimize internal regret, then Pt → C
Recall the definition of internal regret
Internal regret
R(k)j,j′,T =
T∑t=1
p(k)j,t (`(j , I (−k))− `(j ′, I (−k)))
RT = maxj,j′
Rj,j′,T
Also recall that we can turn any regret-minimizing procedure over N actions, tointernal-regret minimizing procedure, by taking the set of pairs (size N2)
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 19 / 21
Summary of results
N ⊂ C ⊂ H
Minimizing regret:I (p(1)
t × · · · × p(K)t )→ N
I (p(1)t × · · · × p(K)
t )→ N ?I Pt → H
Minimizing internal regret:I Pt → C, updates O(N2)
NextI (Unknown game: exploration)I Blackwell approachability theoremI Potential-based approachability
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 20 / 21
Summary of results
N ⊂ C ⊂ H
Minimizing regret:I (p(1)
t × · · · × p(K)t )→ N
I (p(1)t × · · · × p(K)
t )→ N ?I Pt → H
Minimizing internal regret:I Pt → C, updates O(N2)
NextI (Unknown game: exploration)I Blackwell approachability theoremI Potential-based approachability
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 20 / 21
Next week
Finish Chapter 7 (approachability)Application to the routing game
Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 21 / 21