Uncertainty Awareness in Integrating Machine Learning and Game Theory

Uncertainty Awareness in Integrating Machine Learning and Game Theory

不確実性を通して見る機械学習とゲーム理論とのつながり

Rikiya TakahashiSmartNews, Inc.

[email protected]

Mar 5, 2017Game Theory Workshop 2017

https://www.slideshare.net/rikija/uncertainty-awareness-in-integrating-machine-learning-and-game-theory

mailto:[email protected]

About Myself● Rikiya TAKAHASHI, Ph.D. (高橋力矢)

– Engineer in SmartNews, Inc., from 2015 to current

– Research Staff Member in IBM Research – Tokyo, from 2004 to 2015

● Research Interests: machine learning, reinforcement learning,cognitive science, behavioral economics, complex systems

– Descriptive models about real human behavior

– Prescriptive decision making from descriptive models

– Robust algorithms working under high uncertainty● Limited sample size, high dimensionality, high noise

Example of Previous Work● Budget-Constrained Markov Decision Process for

Marketing-Mix Optimization (Takahashi+, 2013 & 2014)

2014/01/01 2014/01/08 … 2014/12/31EM DM TM EM DM TM … EM DM TM

Segment #1 …Segment #2 …

… …Segment #N …

EM: e-mail DM: direct mail TM: tele-marketing

$$

E-mail

TV CM

Purchase

prediction

responsestimulus

Browsing

Revenues in past16 weeks > $200?

#purchase in past8 weeks > 2?

#browsing in past4 weeks > 15?

No Yes

Strategic Segment #1

MS#1

MS#2

#EMs in past2 weeks > 2?

No Yes

MS#255

MS#256

#EMs in past2 weeks > 2?

No Yes

…..............................................................

...

Historical Data

Consumer Segmentation

Time-Series Predictive Modeling

Optimal Marketing-Mix& Targeting Rules

Example of Previous Work● Travel-Time Distribution Prediction on a Large

Road Network (Takahashi+, 2012)

A

B

rN/L

rN/LrN/L

rN/LrN/L

rN/L

ψ1(y)

ψ2(y)

ψ3(y)

ψ4(y)

ψ5(y)

ψ6(y)

intersection

link

1

0 0

00.5 00.5

0

0.85

Road Network &Travel Time Data by Taxi

Predictive Modelingof Travel Time

Distribution

Route-ChoiceRecommendation or

Traffic Simulation

Example of Previous Work● Bayesian Discrete Choice Modeling for Irrational

Compromise Effect (Takahashi & Morimura, 2015)– Explained later today

A

0

B

C

D

{A, B, C}

{B, C, D}

The option havingthe highest share

inexpensiveness

product qualityUtility Calculator

(UC)Decision Making

System (DMS)

Vectorof attributes

=A u

iA =3.26

B uiB

=3.33

C uiC

=2.30

sendsamples

utility

AB

utility sample

utility estimate

C

Agenda

1.Uncertainty Awareness as an Essence in Data-Oriented Real-World Decision Making

2.From Machine Learning to Game Theory #1 –Linking Uncertainty with Bounded Rationality

3.From Machine Learning to Game Theory #2—Open Questions Implied by Numerical Issues

Machine Learning (ML)● Set of inductive disciplines to design probabilistic

model and estimate its parameters that maximizeout-of-sample predictive accuracy– Supervised learning: model and fit P(Y|X)

– Unsupervised learning: model and fit P(X)

● What machine learners care about– Bias-variance trade-off

– Curse of dimensionality

Estimation via Bayes' theorem● Basis behind today's most ML algorithm

posterior distribution: p(θ∣D)=p(D∣θ ) p (θ )

∫θ p(D∣θ ) p(θ )d θ

predictive distribution: p( y∗∣D)=∫θ p( y∗∣θ ) p(θ∣D)d θ

posterior mode: θ =argmaxθ

[ log p(D∣θ )+log p (θ ) ]predictive distribution: p( y∗∣D)≃ p( y∗∣θ )

Maximum APosterioriestimation

Bayesianestimation

p(θ )

approximation

● Q. Why placing a prior ?– A1. To quantify uncertainty as posterior

– A2. To avoid overfitting

data:D model parameter:θ

E.g., Gaussian Process Regression (GPR)

● Bayesian Ridge Regression– Unlike MAP Ridge regression (dark gray), input-

dependent uncertainty (light gray) is quantified.

prior: ( ff ∗)∼N (0n+1 ,( K k∗k∗T K (x∗ , x∗)))

where K=(K ij≡K (xi , x j)) ,k∗=(K (x1, x

∗) ,… , K (xn , x∗))T ,

K (x , x ' )=exp (−γ∥x−x '∥2 )

data likelihood:( yy∗)∼N (( ff ∗) ,σ 2 I n+1)predictive distribution: y∗∣K , x∗ , X , y

∼N ( k∗T (σ 2 I n+K )−1 y ,

K (x∗ , x∗)−k∗T (σ 2 I n+K )

−1 k∗+σ2 )

Gap between Deduction & Induction

Today's AI is integrating both.

Do not divide the work betweeninductive & deductive researchers.

Deductive Mind

● Optimize decisions for a given environment

● Casino owner's mentality

● Game theorist, probabilist, operations researcher

Inductive Mind

● Estimate the environment from observations

● Gambler's mentality

● Statistician, machine learner, econometrician

Induction ↔ Deduction

Dataset

Typical Problem Solvingin the Real World

Estimate ofEnvironment

Inductive Process

Machine Learning, Statistics,Econometrics, etc.

PolicyDecisions

Deductive Process

Game theory, mathematicalprogramming, Markov Decision Process, etc.

D

Θ D

π D

Estimate is different from the true environment .

Θ DΘ

∀ i∈{1,… , n} π D , i=arg maxπ i

R (πi∣{π D , j}j≠i ,Θ D )


Dataset



Inductive Process


PolicyDecisions

Deductive Process


D

Θ D

π D∀ i∈{1,… , n} π D , i=arg max

π i

R (πi∣{π D , j}j≠i ,Θ D )

How the estimation-basedpolicy is different from the true optimal policy ?

π D π ∗

∀ i∈{1,… , n} π i∗=arg max

πi

R (πi∣{π j∗}j≠i ,Θ )


Dataset



Inductive Process


PolicyDecisions

Deductive Process


D

Θ D

π D

State-of-the-art AI

Dataset

By-product

Direct Optimization

Integration of MachineLearning and OptimizationAlgorithms

PolicyDecisions

D

Θ D

π D

See the DifferenceTypical Problem Solving

in the Real World:

Unnecessarily too much effortin solving each subproblem

Vulnerable to estimation error

State-of-the-art AI

Less effort of needlessintermediate estimation

Robust to estimation error

Θ D

π Dπ D

Θ D

Accurately fitted on minimalprediction error for dataset D,while minimizing the error ofthis parameter is not the goal.

Exceedingly optimizedgiven wrong assumption

Fitted but not minimizing theerror for dataset D. Oftenless complex than .

Safely optimized with lessreliance on Θ D

Θ D

See the DifferenceTypical Problem Solving

in the Real World:State-of-the-art AI

Solve a Hard Inductive Problem

Solve another Hard Deductive Problem

Solve an Easier Problemthat Involves both

Induction & Deduction

● Recommendation of simple solving– Gigerenzer & Taleb, https://www.youtube.com/watch?v=4VSqfRnxvV8

https://www.youtube.com/watch?v=4VSqfRnxvV8

Optimization under Uncertainty

● Interval Estimation(e.g., Bayesian)

– Quantify uncertainty

– Optimize over allpossible environments

● Minimal Estimation(e.g., Vapnik)

– Omit intermediate step

– Solve the minimaloptimization problem

● Two principles are effective in practice.

Vapnik's Principle (Vapnik, 1995)

When solving a problem of interest, do not solve amore general problem as an intermediate step.

—Vladimir N. Vapnik

● E.g., classification or regression : predict Y given X

– #1. Fit P(X,Y) and infer P(Y|X) by Bayes’ theorem

– #2. Only fit P(Y|X)

● #2 is better than #1 because of its less estimation error.

– Better particularly when uncertainty is high: smallsample size, high dimensionality, and/or high noise

Batch Reinforcement Learning● A good example of involving both inductive and

deductive processes.

● Also a good example of how to avoidneedlessly hard estimation.

● Basis behind the recent success of Deep Q-Network to play games (Mnih+, 2013 & 2015),and Alpha-Go (Silver+, 2016)

Markov Decision Process● Framework for long-term-optimal decision making

– S: set of states, A: set of actionsP(s'|s,a): state-transition probabilityr(s,a): immediate reward, : discounting factor

– Optimize policy for maximal cumulative reward

…

State #1(e.g., GoldCustomer)

State #2(e.g., SilverCustomer)

State #3(e.g., Normal

Customer) t=0 t=1 t=2$

$$

$$$

By Action #1(e.g., ordinary discount on flight ticket)

…

t=0 t=1 t=2$$

$

$

By Action #2(e.g., free business-class upgrade)

γ ∈[0,1]

π (a∣s)

Markov Decision Process● Easy to solve If the environment is known

– Via dynamic programming or linear programming when P(s'|s,a) & r(s,a) are given with no uncertainty

– Behave myopically at ● For each state s, choose the action a that maximizes r(s,a).

– At time (t-1), choose the optimal action that maximizesthe immediate reward at time (t-1) plus the expectedreward after time t over the state transition distribution.

● What If the environment is unknown?

t→∞

Types of Reinforcement Learning● Model-based ↔ Model-free● On policy ↔ Off policy● Value iteration ↔ policy search

● Model-based approach– 1. System identification: estimate the MDP parameters

– 2. Sample multiple MDPs from the interval estimate

– 3. Solve every MDP & take the best action of best MDP● Optimism in the face of uncertainty

Model-free approach● Remember: our aim is to get the optimal policy.

No need of estimating environment, in principle.– Act without fully identifying system: as long as we

choose the optimal action, it turned out right in the end.

● Even when doing estimation, utilize intermediatestatistic less complex than P(s'|s,a) & r(s,a).

Bellman Optimality Equation● Policy is derived if we have an estimate of Q(s,a).

– Simpler than estimating P(s'|s,a) & r(s,a)

r

Q(s ,a)=E [ r(s ,a)]+γ EP (s '∣s ,a )[maxa '

Q(s ' , a ' )]π (a∣s)={1 a=argmax

a 'Q(s ,a ' )

0 otherwise

Q(s ,a) (si , ai , si ' , r i)i=1n

● Get an estimate from episodes

Fitted Q-Iteration (Ernst+, 2005)● For k=1,2,... iterate 1) value computation and

2) regression as

∀ i∈{1,… , n} vi(k) :=r i+γ Qk

(1)(si ' ,argmaxa '

Qk(0 )(si ' , a '))

∀ f ∈{0,1} Qk+1( f ) :=argmin

Q∈H [12∑i∈J f(v i(k )−Q (si , ai))

2+R (Q)]

1)

2)

– H: hypothesis space of function, Q0 ≡ 0, R: regularization term

– Indices 1...n are randomly split into sets J0 and J

1, for avoiding

over-estimation of Q values (Double Q-Learning (Hasselt, 2010)).

● Related with Experience Replay in Deep Q-Network (Mnih+, 2013 & 2015)– See (Lange+, 2012) for more details.

Policy Gradient● Accurately fit policy　　 while roughly fit Q(s,a)

– More directness to the final aim

– Applicable for continuous action problem

π θ (a∣s)

∇θ J (θ)⏟gradient of performance

= Eπ θ [∇ θ logπ θ (a∣s)Qπ (s ,a)]⏟

expected log-policy times cumulative-reward over s and a

Policy Gradient Theorem (Sutton+, 2000)

● Variations on providing the rough estimate of Q– REINFORCE (Williams, 1992): reward samples

– Actor-Critic: regression models (e.g., NaturalGradient (Kakade, 2002), A3C (Mnih+, 2016))

Functional Approximation in Practice● Concrete functional form of Q(s,a) and/or

– Q should be a universal functional approximator:class of functions that can approximate any functionif sufficiently many parameters are introduced.

● Examples of universal approximator

Tree Ensembles

Random Forest, GradientBoosted Decision Trees

(Deep) NeuralNetworks

Mixture of Radial Basis Functions

(RBFs)

+

π (a∣s)

Functional Approximation in Practice

● Is any univ. approximator OK? – No, unfortunately.– Universal approximator is merely asymptotically unbiased.

– Better to have● Low variance in terms of bias-variance trade-off● Resistance to curse of dimensionality

● One reason of deep learning's success– Flexibility to represent multi-modal function with less

parameters than nonparametric (RBF or tree) models

– Techniques to stabilize numerical optimization● AdaGrad or ADAM, dropout, ReLU, batch normalization, etc.

Message

● Uncertainty awareness is essential on data-oriented decision making.– No division between induction and deduction

– Removing needless intermediate estimation

– Fitted Q-Iteration as an illustrative example

● Less parameters, less uncertainty

Agenda




Shrinkage Matters in the Real World.● Q. Why prior helps avoid over-fitting?

– A. shrinkage towards prior mean (e.g., 0 in Ridge reg.)

● Over-optimization ↔ Over-rationalization?– (e.g., (Takahashi and Morimura, 2015))

0 Coefficient #1

Coefficient #2

Solution of 2-dimensionalOLS &Ridge regression

Ordinary Least Squares (OLS)

Ridge : closer to prior mean 0 than OLS

Prior mean 0 is independent from training data

Discrete Choice Modelling

Goal: predict prob. of choosing an option from a choice set.

Why solving this problem?

Brand positioning among competitors

Sales promotion (yet involving some abuse)

Game Theory Workshop 2017 Uncertainty Awareness

Random Utility Theory as a Rational Model

Each human is a rational maximizer of random utility.

Theoretical basis behind many statistical marketing models.Logit models (e.g., (McFadden, 1980; Williams, 1977; McFadden and Train,2000)), Learning to rank (e.g., (Chapelle and Harchaoui, 2005)), Conjointanalysis (Green and Srinivasan, 1978), Matrix factorization (e.g., (Lawrence andUrtasun, 2009)), ...


Complexity of Real Human’s Choice

An example of choosing PC (Kivetz et al., 2004)

Each subject chooses 1 option from a choice set

A B C D ECPU [MHz] 250 300 350 400 450Mem. [MB] 192 160 128 96 64

Choice Set #subjects{A, B, C} 36:176:144{B, C, D} 56:177:115{C, D, E} 94:181:109

Can random utility theory still explain the preference reversals?

B�C or C�B?


Similarity E↵ect (Tversky, 1972)

Top-share choice can change due to correlated utilities.

E.g., one color from {Blue, Red} or {Violet, Blue, Red}?


Attraction E↵ect (Huber et al., 1982)

Introduction of an absolutely-inferior option A

� (=decoy)causes irregular increase of option A’s attractiveness.

Despite the natural guess that decoy never a↵ects the choice.

If D�A, then D�A�A

�.

If A�D, then A is superior to both A

� and D.


Compromise E↵ect (Simonson, 1989)

Moderate options within each chosen set are preferred.

Di↵erent from non-linear utility function involvingdiminishing returns (e.g.,

pinexpensiveness+

pquality).


Positioning of the Proposed WorkSim.: similarity, Attr.: attraction, Com.: compromise

Sim. Attr. Com. Mechanism Predict. for LikelihoodTest Set Maximization

SPM OK NG NG correlation OK MCMCMDFT OK OK OK dominance & indi↵erence OK MCMCPD OK OK OK nonlinear pairwise comparison OK MCMC

MMLM OK NG OK none OK Non-convexNLM OK NG NG hierarchy NG Non-convexBSY OK OK OK Bayesian OK MCMCLCA OK OK OK loss aversion OK MCMCMLBA OK OK OK nonlinear accumulation OK Non-convex

Proposed OK NG OK Bayesian OK Convex

MDFT: Multialternative Decision Field Theory (Roe et al., 2001)PD: Proportional Di↵erence Model (Gonzalez-Vallejo, 2002)MMLM: Mixed Multinomial Logit Model (McFadden and Train, 2000)SPM: Structured Probit Model (Yai, 1997; Dotson et al., 2009)NLM: Nested Logit Models (Williams, 1977; Wen and Koppelman, 2001)BSY: Bayesian Model of (Shenoy and Yu, 2013)LCA: Leaky Competing Accumulator Model (Usher and McClelland, 2004)MLBA: Multiattribute Linear Ballistic Accumulator Model (Trueblood, 2014)


Key Idea #1: a Dual Personality Model

Regard human as an estimator of her/his own utility function.

Assumption 1: DMS does not know the original utility func.1 UC computes the sample value of every option’s utility,

and sends only these samples to DMS.2 DMS statistically estimates the utility function.


Utility Calculator as Rational Personality

For every context i and option j , UC computes noiselesssample of utility v

ij

by applying utility function f

UC

: Rd

X !R.

v

ij

= f

UC

(xij

), f

UC

(x),b + w>�� (x)

b: bias term

� : Rd

X !Rd� : mapping function

w�!Rd� : vector of coe�cientsGame Theory Workshop 2017 Uncertainty Awareness

Key Idea #2: DMS is a Bayesian estimator

DMS does not know f

UC

but has utility samples {vij

}m[i ]j=1 .

Assumption 2: DMS places a choice-set-dependent GaussianProcess (GP) prior on regressing the utility function.

µi

⇠N�0m[i ], �

2K (Xi

)�

K (Xi

) = (K (xij

, xij

0))2Rm[i ]⇥m[i ]

vi

, (vi1, . . ., v

im[i ])>⇠N

�µ

i

, �2Im[i ]

�

µi

2Rm[i ]: vector of utility�2: noise levelK (·, ·): similarity functionX

i

, (xi12Rd

X , . . . , xim[i ])

>

The posterior mean is given asu⇤i

,E[µi

|vi

,Xi

,K ] = K (Xi

)�Im[i ]+K (X

i

)��1 �

b1m[i ]+�

i

w�

�.


Convex Optimization for Model Parameters

Likelihood of the entire model is tractable, assuming the choiceis given by a logit whose mean utility is the posterior mean u⇤

i

.

Thus we can fit the function f

UC

from the choice data.Conveniently, MAP estimation of f

UC

is convex for fixed K .

bb, cw� =max

b,w�

nX

i=1

`(bHi

1m[i ]+H

i

�i

w�, yi)�c

2kw�k2

where `(u⇤i

, yi

), logexp(u⇤

iy

i

)P

m[i ]j

0=1exp(u⇤ij

0)and H

i

,K (Xi

)(Im[i ]+K (X

i

))�1


Irrationality as Bayesian Shrinkage

Implication from the posterior-mean utility in (1)Each option’s utility is shrunk into prior mean 0.Strong shrinkage for an option dissimilar to the others,due to its high posterior variance (=uncertainty).

u⇤i

= K (Xi

)�Im[i ]+K (X

i

)��1

| {z }shrinkage factor

�b1

m[i ]+�i

w�

�| {z }vec. of utility samples

. (1)

Context e↵ects as Bayesian uncertainty aversion

E.g., RBF kernelK (x , x 0)=exp(��kx�x

0k2)

0 0.2 0.4 0.6 0.8

1 1.2 1.4

1 2 3 4

Fin

al E

valu

atio

n

X1=(5-X2)

DCBA

{A,B,C}{B,C,D}


Recovered Context-Dependent Choice Criteria

For a speaker dataset: successfully captured mixture ofobjective preference and subjective context e↵ects.

A B C D EPower [Watt] 50 75 100 125 150Price [USD] 100 130 160 190 220

Choice Set #subjects{A, B, C} 45:135:145{B, C, D} 58:137:111{C, D, E} 95:155: 91

2

3

4

100 150 200

Eva

luatio

n

Price [USD]

EDCBA

Obj. Eval.{A,B,C}{B,C,D}{C,D,E}

-1.1

-1

-0.9

-0.8

Ave

rage L

og-L

ikelih

ood

DatasetPC SP SM

LinLogitNpLogit

LinMixNpMixGPUA


A Result of p-beauty Contest by Real Humans

Guess 2/3 of all votes (0-100). Mean is apart from the Nashequilibrium 0 (Camerer et al., 2004; Ho et al., 2006).

Table: Average Choice in (2/3)-beauty Contests

Subject Pool Group Size Sample Size Mean[Yi

]Caltech Board 73 73 49.480 year olds 33 33 37.0

High School Students 20-32 52 32.5Economics PhDs 16 16 27.4

Portfolio Managers 26 26 24.3Caltech Students 3 24 21.5Game Theorists 27-54 136 19.1


Modeling Bounded Rationality

Early stopping at step k : Level-k thinking or CognitiveHierarchy Theory (Camerer et al., 2004)

Humans cannot predict the infinite future.Using non-stationary transitional state

Randomization of utility via noise "it

: Quantal ResponseEquilibrium (McKelvey and Palfrey, 1995)

8i 2{1, . . . , n} Y

(t)i

|Y (t�1)\i = argmax

Y

hf

i

(Y ,Y (t�1)\i ) + "

it

i

Both methods essentially work as regularization of rationality.

Shrinkage into initial values or uniform choice probabilities


Linking ML with Game Theory (GT)via Shrinkage Principle

Optimizationwithout shrinkage

Optimizationwith shrinkage

ML GT

Maximum-Likelihood estimation

Bayesian estimation Transitional Stateor Quantal Response Equilibrium

Nash Equilibrium

Optimal for training data,but less generalizationcapability to test data

Optimal for given gamebut less predictable to real-world decisions

Shrinkage towards uniformprobabilities causes suboptimalityfor the given game, but morepredictable to real-world decisions

Shrinkage towards prior causessuboptimality for training data,but more generalization capabilityto test data

Early Stopping and Regularization

ML as a Dynamical Systemto find the optimal parameters

GT as a Dynamical Systemto find the equilibrium

Parameter #1

Parameter #2Exact Maximum-likelihoodestimate (e.g., OLS)

Exact Bayesian estimateshrunk towards zero(e.g., Ridge regression)

0

t=10

t=20

t=30

t=50

An early-stoppingestimate (e.g., PartialLeast Squares)

t=0

t=1

t→∞

t=2

...

mean = 50

mean = 34

mean = 15

mean = 0NashEquilibrium

Level-2Transitional State

Message

● Bayesian shrinkage ↔ Bounded rationality

– Dual-personality model for contextual effects– Towards data-oriented & more realistic games:

export ML regularization techniques to GT

● Analyze dynamics or uncertainty-aware equilibria– Early-stopped transitional state, or

– QRE with uncertainty on each player's utility function

Agenda




Additional Implications from ML● Multiple equilibria or saddle points?

● Equilibria or “typical” transitional states?– Slow convergence

– Plateau of objective function

Recent history in ML● Waste of ~20 years for local optimality issue

– Neural Networks (NNs) have been criticized for their localoptimality in fitting the parameters.

– ML community has been sticked with convex optimizationapproaches (e.g., Support Vector Machines (Vapnik, 1995)).

– Most solutions in fitting high-dimensional NNs, however, arefound to be not local optima but saddle points (Bray & Dean,2007; Dauphin+, 2014)!

– After skipping saddle points by perturbation, most of the localoptima empirically provide similar prediction capabilities.

● Please do not make the same mistake in multi-agent optimization problems (=games)!

Why most are saddle points?● See spectrum of Hessian matrices of a random-

drawn non-linear function from a Gaussian process.

Local minima: everyeigenvalue is positive.

Local maxima: everyeigenvalue is negative.

Univariate FunctionSaddle point: bothpositive & negativeeigenvalues exist.

● In high-dimensional function, Hessian contains bothpositive & negative eigenvalues with high probability.

Bivariate Function

https://en.wikipedia.org/wiki/Saddle_point

https://en.wikipedia.org/wiki/Saddle_point

Open Questions for Multiple Equilibria● If a game is very complex involving lots of

parameters in pay-off or utility functions, then – Are most of its critical points unstable saddle points?

– Is number of equilibria much smaller than our guess?

● If we obtain a few equilibria of such complex game,– Do most of such equilibria have similar properties?

– Don't we have to obtain other equilibria?

See Dynamics: “Typical” Transitional State?

● MLers are sensitive to convergence rate in fitting.– We are in the finite-sample & high-dimensional world:

only asymptotics is powerless, and computationalestimate is not equilibrium but transitional state.

http://sebastianruder.com/optimizing-gradient-descent/(Kingma & Ba, 2015)

http://sebastianruder.com/optimizing-gradient-descent/

See Dynamics: “Typical” Transitional State?

● Mixing time of Markov processes of some gamesis exponential to the number of players.– E.g., (Axtell+, 2000) equilibrium: equality of wealth

transitional states: severe inequality

Nash demand game

Equilibrium Transitional State

● What If #players is over thousands or millions?– Severe inequality in most of the time

See Dynamics: Trapped in Plateau?● Fitting of a Deep NN is often trapped in plateaus.

– Natural gradient descent (Amari, 1997) is often usedfor quickly escaping from plateau.

– In real-world games, are people trapped in plateausrather than equilibria?

https://www.safaribooksonline.com/library/view/hands-on-machine-learning/9781491962282/ch04.html

https://www.safaribooksonline.com/library/view/hands-on-machine-learning/9781491962282/ch04.html

Conclusion● Discussed how uncertainty should be incorporated

in inductive & deductive decision making.– Quantifying uncertainty or simpler minimal estimation

● Linked Bayesian shrinkage with bounded rationality– Towards data-oriented regularized equilibrium

● Implications from high-dimensional ML– Saddle points, transitional state, and/or plateau

THANK YOU FOR ATTENDING!

Download this material fromhttps://www.slideshare.net/rikija/uncertainty-awareness-in-integrating-

machine-learning-and-game-theory

References

References I

Amari, S. (1997). Neural learning in structured parameter spaces -natural Riemannian gradient. In Advances in Neural Information

Processing Systems 9, pages 127–133. MIT Press.

Axtell, R., Epstein, J., and Young, H. (2000). The emergence of classesin a multi-agent bargaining model. Working papers, BrookingsInstitution - Working Papers.

Bray, A. J. and Dean, D. S. (2007). Statistics of critical points ofgaussian fields on large-dimensional spaces. Physics Review Letters,98:150201.

Bruza, P., Kitto, K., Nelson, D., and McEvoy, C. (2009). Is theresomething quantum-like about the human mental lexicon? Journal of

Mathematical Psychology, 53(5):362–377.

Camerer, C. F., Ho, T. H., and Chong, J. (2004). A cognitive hierarchymodel of games. Quarterly Journal of Economics, 119:861–898.


References

References II

Chapelle, O. and Harchaoui, Z. (2005). A machine learning approach toconjoint analysis. In Advances in Neural Information Processing

Systems 17, pages 257–264. MIT Press, Cambridge, MA, USA.

Clarke, E. H. (1971). Multipart pricing of public goods. Public Choice,2:19–33.

Dauphin, Y. N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., andBengio, Y. (2014). Identifying and attacking the saddle point problemin high-dimensional non-convex optimization. In Advances in Neural

Information Processing Systems 27, pages 2933–2941. CurranAssociates, Inc.

de Barros, J. A. and Suppes, P. (2009). Quantum mechanics,interference, and the brain. Journal of Mathematical Psychology,53(5):306–313.


References

References III

Dotson, J. P., Lenk, P., Brazell, J., Otter, T., Maceachern, S. N., andAllenby, G. M. (2009). A probit model with structured covariance forsimilarity e↵ects and source of volume calculations.http://ssrn.com/abstract=1396232.

Gonzalez-Vallejo, C. (2002). Making trade-o↵s: A probabilistic andcontext-sensitive model of choice behavior. Psychological Review,109:137–154.

Green, P. and Srinivasan, V. (1978). Conjoint analysis in consumerresearch: Issues and outlook. Journal of Consumer Research,5:103–123.

Ho, T. H., Lim, N., and Camerer, C. F. (2006). Modeling the psychologyof consumer and firm behavior with behavioral economics. Journal ofMarketing Research, 43(3):307–331.

Huber, J., Payne, J. W., and Puto, C. (1982). Adding asymmetricallydominated alternatives: Violations of regularity and the similarityhypothesis. Journal of Consumer Research, 9:90–98.


References

References IV

Kakade, S. M. (2002). A natural policy gradient. In Dietterich, T. G.,Becker, S., and Ghahramani, Z., editors, Advances in Neural

Information Processing Systems 14, pages 1531–1538. MIT Press.

Kingma, D. and Ba, J. (2015). Adam: A method for stochasticoptimization. In The International Conference on Learning

Representations (ICLR), San Diego.

Kivetz, R., Netzer, O., and Srinivasan, V. S. (2004). Alternative modelsfor capturing the compromise e↵ect. Journal of Marketing Research,41(3):237–257.

Lawrence, N. D. and Urtasun, R. (2009). Non-linear matrix factorizationwith gaussian processes. In Proceedings of the 26th Annual

International Conference on Machine Learning (ICML 2009), pages601–608, New York, NY, USA. ACM.

McFadden, D. and Train, K. (2000). Mixed MNL models for discreteresponse. Journal of Applied Econometrics, 15:447–470.


References

References V

McFadden, D. L. (1980). Econometric models of probabilistic choiceamong products. Journal of Business, 53(3):13–29.

McKelvey, R. and Palfrey, T. (1995). Quantal response equilibria fornormal form games. Games and Economic Behavior, 10:6–38.

Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T.,Silver, D., and Kavukcuoglu, K. (2016). Asynchronous methods fordeep reinforcement learning. In Proceedings of The 33rd International

Conference on Machine Learning (ICML 2016), pages 1928–1937.

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A., Veness, J., Bellemare,M., Graves, A., Riedmiller, M., Fidjeland, A., Ostrovski, G., Petersen,S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D.,Wierstra, D., Legg, S., and Hassabis, D. (2015). Human-level controlthrough deep reinforcement learning. Nature, 518:529–533.

Mogiliansky, A. L., Zamir, S., and Zwirn, H. (2009). Type indeterminacy:A model of the KT (kahnemantversky)-man. Journal of Mathematical

Psychology, 53(5):349–361.


References

References VI

Roe, R. M., Busemeyer, J. R., and Townsend, J. T. (2001).Multialternative decision field theory: A dynamic connectionist modelof decision making. Psychological Review, 108:370–392.

Shenoy, P. and Yu, A. J. (2013). A rational account of contextual e↵ectsin preference choice: What makes for a bargain? In Proceedings of the

Cognitive Science Society Conference.

Silver, D., Huang, A., Maddison, C., Guez, A., Sifre, L., van denDriessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V.,Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N.,Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T.,and Hassabis, D. (2016). Mastering the game of Go with deep neuralnetworks and tree search. Nature, 529:484–489.

Simonson, I. (1989). Choice based on reasons: The case of attractionand compromise e↵ects. Journal of Consumer Research, 16:158–174.


References

References VII

Sutton, R. S., McAllester, D. A., Singh, S. P., and Mansour, Y. (2000).Policy gradient methods for reinforcement learning with functionapproximation. In Advances in Neural Information Processing Systems

12, pages 1057–1063. MIT Press.

Takahashi, R. and Morimura, T. (2015). Predicting preference reversalsvia gaussian process uncertainty aversion. In Proceedings of the 18th

International Conference on Artificial Intelligence and Statistics

(AISTATS 2015), pages 958–967.

Trueblood, J. S. (2014). The multiattribute linear ballistic accumulatormodel of context e↵ects in multialternative choice. PsychologicalReview, 121(2):179–205.

Tversky, A. (1972). Elimination by aspects: A theory of choice.Psychological Review, 79:281–299.

Usher, M. and McClelland, J. L. (2004). Loss aversion and inhibition indynamical models of multialternative choice. Psychological Review,111:757–769.


References

References VIII

Wen, C.-H. and Koppelman, F. (2001). The generalized nested logitmodel. Transportation Research Part B, 35:627–641.

Williams, H. (1977). On the formulation of travel demand models andeconomic evaluation measures of user benefit. Environment and

Planning A, 9(3):285–344.

Williams, R. J. (1992). Simple statistical gradient-following algorithmsfor connectionist reinforcement learning. 8(3):229–256.

Yai, T. (1997). Multinomial probit with structured covariance for routechoice behavior. Transportation Research Part B: Methodological,31(3):195–207.


Uncertainty Awareness in Integrating Machine Learning and Game Theory

Economy & Finance

Transcript of Uncertainty Awareness in Integrating Machine Learning and Game Theory