Prediction, Learning and Games - Chapter 7 Prediction and...

Prediction, Learning and Games - Chapter 7Prediction and Playing Games

Walid Krichene

November 4, 2013

Walid Krichene Prediction, Learning and Games - Chapter 7 Prediction and Playing GamesNovember 4, 2013 1 / 21

The setting

K playersplayer k : Nk actions

joint action vector i ∈ [N1]× · · · × [NK ]

loss vector `(i) ∈ [0, 1]

players randomize: mixed strategy profile π ∈ ∆[N1] × · · · ×∆[NK ]

π(i) = p(1)(i1) . . . p(K)(iK )

joint random action I ∈ [N1]× [NK ], I ∼ πexpected loss E[`(k)] =

∑i π(i)`(k)(i)

by abuse of notation `(k)(π) (can think of it as a linear extension of ` from[N1]× · · · × [NK ] to ∆[N1] × · · · ×∆[NK ])

The setting

K playersplayer k : Nk actionsjoint action vector i ∈ [N1]× · · · × [NK ]

π(i) = p(1)(i1) . . . p(K)(iK )

∑i π(i)`(k)(i)

The setting

K playersplayer k : Nk actionsjoint action vector i ∈ [N1]× · · · × [NK ]

π(i) = p(1)(i1) . . . p(K)(iK )

∑i π(i)`(k)(i)

Nash equilibrium

Nash equilibriump is a Nash eq. if for all k and all p′

`(k)(p(k), p(−k)) ≤ `(k)(p′, p(−k))

no one has an incentive to unilaterally deviate.set of Nash equilibria: N

Correlated equilibria

Correlated equilibria (Aumann)Probability distribution P over [N1]× · · · × [NK ], which is not necessarily aproduct distribution

Some coordinator gives advice I ∼ PIf player unilaterally deviates, worse-off in expectation.

Equivalent to “switching j and j ′ is a bad idea”. For all j , j ′∑i :ik=j

P(i)[`(k)(j , i (−k))− `(k)(j ′, i (−k))] ≤ 0

Link to internal regret.

Structure of correlated equilibriaThink of P as a vector in ∆[N1]×···×[NK ]

Linear inequalities in P ⇒ closed convex polyhedronConvex combination of Nash eq ⇒ correlated eq. (but not ⇐)

Correlated equilibria (Aumann)Probability distribution P over [N1]× · · · × [NK ], which is not necessarily aproduct distributionSome coordinator gives advice I ∼ PIf player unilaterally deviates, worse-off in expectation.

P(i)[`(k)(j , i (−k))− `(k)(j ′, i (−k))] ≤ 0

Repeated games

k maintains p(k)t , draws action I k

t ∼ p(k)t

observes all players’ actions It = (I (1)t , . . . , I (K)

Uncoupled play: player only knows his loss function. In fact, only his regret.

Regret-minimizing procedureDoes the repeated play converge to N (or other equilibrium concept) in somesense?

Summary of resultsMinimizing regret:

I average marginal distributions p(1) × · · · × p(K) converge to NI average joint distribution p(1) × p(K) converges to the Hannan set

Minimizing internal regret:I average joint distribution p(1) × p(K) converges to correlated equilibria

Repeated games

t ∼ p(k)t

Uncoupled play: player only knows his loss function.

In fact, only his regret.

Repeated games

t ∼ p(k)t

Repeated games

t ∼ p(k)t

Regret-minimization Vs. fictitious playOn iteration t, play best response to cumulative losses

`(k)t : vector of losses `(k)

t (i (k)) = `(k)(i (k), I (−k)t )

L(k)t =

t∑τ=1

`(k)τ

Fictitious play

i (k)t+1 ∈ arg min

i∈[Nk ]L(k)

equivalent tomin

p∈∆[Nk ]

⟨p, L(k)

Only converges in very special cases (e.g. 2 players, 2 actions each)Not Hannan-consistent (does not minimize average expected regret)

t (i (k)) = `(k)(i (k), I (−k)t )

L(k)t =

t∑τ=1

`(k)τ

Fictitious play

i∈[Nk ]L(k)

equivalent tomin

p∈∆[Nk ]

⟨p, L(k)

t (i (k)) = `(k)(i (k), I (−k)t )

L(k)t =

t∑τ=1

`(k)τ

Fictitious play

i∈[Nk ]L(k)

equivalent tomin

p∈∆[Nk ]

⟨p, L(k)

Regret-minimization Vs. fictitious play

Fictitious play

p(k)t+1 ∈ arg min

p∈∆[Nk ]

⟨p, L(k)

Regret-minimizationSome algorithms can be written as

p(k)t+1 ∈ arg min

p∈∆[Nk ]

⟨p, L(k)

⟩+1γ

A Minimax theorem

Minimax Theorem via Regret minimizationf (x , y)

X convex compact, f (·, y) is convex continuousY convex compact, f (x , ·) is concave continuous.

Theninfxsupy

f (x , y) = supy

f (x , y)

≥: OK. When x plays first, worse than playing second.

A Minimax theorem

Theninfxsupy

f (x , y) = supy

f (x , y)

A Minimax theorem

Theninfxsupy

f (x , y) = supy

f (x , y)

A Minimax theorem

proof of ≤:Idea: sequential play (xt), (yt), then

infxsupy

f (x , y) ≤ 1t

∑τ≤t

f (xτ , yτ ) ≤ supy

f (x , y) + O(1√t

A Minimax theorem

Let ε > 0.Find a finite cover of X by balls B(a(i), ε). Use a(i) as the actions.

x plays firstApplies regret-minimizing procedureaction set {a(i)}ilosses f (a(i), yt)

y plays secondplays best responseyt ∈ argmaxy∈Y f (xt , y)

xt =1t

t∑τ=1

xτ yt =1t

t∑τ=1

A Minimax theorem

xt =1t

t∑τ=1

xτ yt =1t

t∑τ=1

A Minimax theorem

xt =1t

t∑τ=1

xτ yt =1t

t∑τ=1

A Minimax theorem

infxsupy

f (x , y) ≤ supy

f (xt , y)

≤ 1t

∑τ≤t

f (xτ , y) by convexity

≤ 1t

∑τ≤t

f (xτ , yτ ) y plays optimally

≤ mini

∑τ≤t

f (a(i), yτ ) +

√lnN2t

by the regret bound

≤ mini

f (a(i), yt)

√lnN2t

by concavity

≤ supy

f (a(i), y) +

√lnN2t

Then t →∞ and ε→ 0

Zero-sum two player game

`(row)(i , j) = −`(column)(i , j)

Matrix of losses: M such that Mi,j = `(row)(i , j)Joint strategy: (p, q)Loss of row player:

∑i,j piMi,jqj = 〈p,Mq〉

Nash equilibrium(p, q) is a Nash eq. iff for all p′, q′

〈p,Mq′〉 ≤ 〈p,Mq〉 ≤ 〈p′,Mq〉

In fact, by Von Neumann’s minimax theorem,

minp′

maxq′〈p′,Mq′〉 = max

q′minp′〈p′,Mq′〉

Call it V , the value of the game.

Value of the game

(p, q) is a Nash eq. ⇔ 〈p,Mq〉 = V

`(row)(i , j) = −`(column)(i , j)

〈p,Mq′〉 ≤ 〈p,Mq〉 ≤ 〈p′,Mq〉

minp′

Value of the game

`(row)(i , j) = −`(column)(i , j)

〈p,Mq′〉 ≤ 〈p,Mq〉 ≤ 〈p′,Mq〉

minp′

Value of the game

Repeated zero-sum two-player game

Row player plays firstGiven a sequence of column plays, j1, j2, . . . , score performance of rowsequence i1, i2, . . . by evaluating regret

T∑t=1

`(row)(it , jj)−T∑

`(row)(i , jt)

Can be extended to time varying: `(row)t

NoteImplicit assumption: oblivious opponent.This is not the loss Row would have suffered had he played a constant strategy.

T∑t=1

`(row)(i , jt)

T∑t=1

`(row)(i , jt)

Convergence of average lossesIf row strategies are Hannan-consistent, then

lim supT

T∑t=1

`(row)(it , jt) ≤ V

Note: only upper bound: can do much better if column player is notadversarialproof: suffices to show

lim supT

T∑t=1

`(row)(i , jt) ≤ V

(then use regret bound)

lim supT

T∑t=1

Note: only upper bound: can do much better if column player is notadversarial

proof: suffices to show

lim supT

T∑t=1

`(row)(i , jt) ≤ V

lim supT

T∑t=1

Note: only upper bound: can do much better if column player is notadversarialproof: suffices to show

lim supT

T∑t=1

`(row)(i , jt) ≤ V

proof:

T∑t=1

`(row)(i , jt) = minp

T∑t=1

〈p,Mejt 〉

= minp

⟨p,M

T∑t=1

⟩≤ min

q〈p,Mq〉

If both players minimize regret, average distributions converge to Npt = 1

∑τ≤t pτ

qt = 1t

∑τ≤t qτ

pt × qt → N

Does not prove convergence of actual probability sequences (pt , qt)

Typical example: sequences (pt , qt) oscillate, but averages (pt , qt) convergeto N

Also does not prove convergence of average joint distribution, p × q

p × qt(i , j) =1T

∑τ≤t

pt(i)qt(j)

∑τ≤t pτ

qt = 1t

∑τ≤t qτ

pt × qt → N

p × qt(i , j) =1T

∑τ≤t

pt(i)qt(j)

∑τ≤t pτ

qt = 1t

∑τ≤t qτ

pt × qt → N

p × qt(i , j) =1T

∑τ≤t

pt(i)qt(j)

Note: average marginals Vs. average joint

They are not the same

pt × qt(i , j) =1

T∑t=1

T∑t′=1

pt(i)qt′(j)

p × qt(i , j) =1T

T∑t=1

pt(i)qt(j)

Correlated equilibria and internal regret

Back to general K player game.Let P be the average joint distribution

Pt(i) =1t

∑τ≤t

p(1)(i1) . . . p(K)(iK )

If every player minimizes regret, Pt converges to the Hannan set

H = {P : ∀k, ∀j ,∑

P(i)`(k)(i (k), i (−k)) ≤∑

P(i)`(k)(j , i (−k))}

H contains correlated equilibria, but proper superset in most cases.

N ⊂ C ⊂ H

Pt(i) =1t

∑τ≤t

p(1)(i1) . . . p(K)(iK )

H = {P : ∀k, ∀j ,∑

P(i)`(k)(i (k), i (−k)) ≤∑

P(i)`(k)(j , i (−k))}

N ⊂ C ⊂ H

Pt(i) =1t

∑τ≤t

p(1)(i1) . . . p(K)(iK )

H = {P : ∀k, ∀j ,∑

P(i)`(k)(i (k), i (−k)) ≤∑

P(i)`(k)(j , i (−k))}

N ⊂ C ⊂ H

Convergence of average joint to CIf players minimize internal regret, then Pt → C

Recall the definition of internal regret

Internal regret

R(k)j,j′,T =

T∑t=1

p(k)j,t (`(j , I (−k))− `(j ′, I (−k)))

RT = maxj,j′

Rj,j′,T

Also recall that we can turn any regret-minimizing procedure over N actions, tointernal-regret minimizing procedure, by taking the set of pairs (size N2)

Convergence of average joint to CIf players minimize internal regret, then Pt → C

Recall the definition of internal regret

Internal regret

R(k)j,j′,T =

T∑t=1

p(k)j,t (`(j , I (−k))− `(j ′, I (−k)))

RT = maxj,j′

Rj,j′,T

Also recall that we can turn any regret-minimizing procedure over N actions, tointernal-regret minimizing procedure, by taking the set of pairs (size N2)

Summary of results

N ⊂ C ⊂ H

Minimizing regret:I (p(1)

t × · · · × p(K)t )→ N

I (p(1)t × · · · × p(K)

t )→ N ?I Pt → H

Minimizing internal regret:I Pt → C, updates O(N2)

NextI (Unknown game: exploration)I Blackwell approachability theoremI Potential-based approachability

Summary of results

N ⊂ C ⊂ H

Minimizing regret:I (p(1)

t × · · · × p(K)t )→ N

I (p(1)t × · · · × p(K)

t )→ N ?I Pt → H

Minimizing internal regret:I Pt → C, updates O(N2)

NextI (Unknown game: exploration)I Blackwell approachability theoremI Potential-based approachability

Next week

Finish Chapter 7 (approachability)Application to the routing game

Prediction, Learning and Games - Chapter 7 Prediction and...

Documents

Transcript of Prediction, Learning and Games - Chapter 7 Prediction and...

ITE 50 Chapter7

Chap7. Méthode des rotations

C++Lang chap7

Chap7 2 Ecc Intro

chapter7 IT sec

Chapter7~9 ppt

Viddal - Chap7 MakingCaribDance

Chap7 – Equations et inéquations

Food Processing Chapter7

Chap7 java net

Mse Chap7 Pds

Chap7+ +Final

Chapter7 Data Link Control

Jstqb test analyst-chap7

Dbda chap7

Chapter7 เทคนิคพัฒนาองค์การระดับกลุ่มและองค์การ

Chap7 Les Intu00E9grales

CHAPTER7-CONVERSION BUSSINESS

Chapter7 drawings TCP/IP

Chapter7 lesson3