Blue Evolution & Learning in Games Econ 243B · I Hence social learning may not be adaptive. I...

Evolution & Learning in GamesEcon 243B

Jean-Paul Carvalho

Lecture 3:Social Learning

1 / 23

Bayesian Social LearningBikhchandani, Hirshleifer & Welch (1992), Banerjee (1992)

I n agents sequentially decide between two actions, si ∈ {a, b}.

I Two states of world: θ ∈ {A,B}.

I Action a is the preferred action in state A, and b is the preferredaction in state B. Payoffs:

πi =

{1 if si = θ,−1 if si 6= θ.

2 / 23

Private Information

At the outset, each agent i receives a private signal zi, where signalsare i.i.d. conditional on θ.

For all i:

I If θ = A, zi = A with prob. q > 12 and B with prob. 1− q.

I If θ = B, zi = B with prob. q > 12 and A with prob. 1− q.

Social learning: i’s decision is based not only on zi but also oninformation learned from other players.

— Observable signals. For n large, learn state and make correctchoice with high prob. by LLN.

3 / 23

Learning by Observing ActionsActions can provide information on private signals, but less than youmight think due to information cascades.

I Suppose for the first-mover, z1 = A. By Bayes’ theorem, 1’sposterior belief is

P(θ = A|z1 = A) =P(z1 = A|θ = A)P(θ = A)

P(z1 = A)

=q 1

2

q 12 + (1− q) 1

2

= q > 12 .

I Hence 1’s expected payoffs from s1 = a isq− (1− q) = 2q− 1 > 0 and from s1 = B is 1− 2q < 0.

I Agent 1 chooses s1 = a.

4 / 23

Information Cascade

Now suppose the second-mover’s signal is z2 = A.

I 2 infers from observing s1 = a, that z1 = A.

I By Bayes’ theorem, 2’s posterior belief is

P(θ = A|z1 = A, z2 = A) =P(z1 = A, z2 = A|θ = A)P(θ = A)

P(z1 = A, z2 = A)

=q2 1

2

q2 12 + (1− q)2 1

2> q > 1

2 .

I Hence s2 = a.

Thenceforth, all subsequent agents choose si = A regardless of zi.

5 / 23

Decision Rule

General rule:

I Let d be the number of prior choices of A minus number of priorchoices of B.

I d > 1, choose A regardless of private signal.I d = 1, choose A if zi = A and anything if zi = B.I d = 0, follow private signal.I Cases are symmetric for d < 0.

Precisely which cascade emerges is path dependent.

6 / 23

Likelihood of Cascades & Information Transmission

Example. q = 0.51.

I If choose action uniformly at random when indifferent,probability of cascade after first two individuals is approx. 0.75.

I Cascade after AA and BB with prob. one, and after AB andBA with prob. 1/2.

I Probability of correct cascade is 0.5133 (calulation inBikhchandani, Hirshleifer and Welch 1992).

I Probability of correct belief without social learning is q = 0.51.

See Smith and Sorensen (2000) for generalization in which con-tinuous action space improves information transmission and canlead to full revelation of beliefs.

7 / 23

Culture‘Culture is information that people acquire from others by teaching,imitation and other forms of social learning. On a scale unknown in anyother species, people acquire skills, beliefs, and values from thepeople around them, and these strongly affect behaviour. Peopleliving in human populations are heirs to a pool of sociallytransmitted information that affects how they make a living, howthey communicate, and what they think is right and wrong.” (Boyd& Richerson 2005, p. 3).

Especially important is the ability to transmit information acrossgenerations.

I Produced by vertical (parent to child), oblique (adult non-parentto child) and horizontal (peer-to-peer) transmission.

I In complex societies, organizations (media, state, schools,clerics) play an important role in cultural transmission, and

I External information storage devices (libraries, www) and bigdata allow for more complex forms of learning.

8 / 23

The Evolution of Cultural Evolution(Henrich and McElreath 2003)

I Humans can survive in a far wider range of environments thanother primates.

I No evolved hardwired cognitive architecture for doing so, e.g.Burke and Wills.

I The main difference between human beings and other animals isthe human capacity for social learning and the accumulation ofknowledge over generations (e.g. hunting technologies, foodprocessing methods).

I Call this cultural learning.

I Necessary to understand the psychological mechanisms that makecultural learning possible and the population dynamics producedby cultural learning.

9 / 23

A Model of Cultural Learning(Giuliano and Nunn 2017, based on Rogers 1988)

I A continuum of agents.

I Discrete time: t = 1, 2, . . .

I Two states of world: θ ∈ {A,B}.

I Two actions: si ∈ {a, b}. (Action a is the preferred action in stateA, and b is the preferred action in state B.)

I Payoffs:

πi =

{β if si = θ,−β if si 6= θ.

10 / 23

Environmental Variability & Learning

State of world:

I With probability 1− ∆, the state in period t + 1 is the sameas in t.

I With probability ∆, the state is A or B, each withprobability 1/2.

Strategies:

I Social Learner (SL): copies action of an agent in theprevious generation, chosen uniformly at random.

I Individual Learner (IL): pays cost c > 0 to learn state (forsure).

Population state: x is the proportion of social learners.

11 / 23

Social Learning

The share 1− x of individual learners choose the correct action.

But so does a social learner if

(i) she copies an individual learner since the latest statechange,

(ii) she copies from a social learner who copied from anindividual learner since the latest state change,

(iii) copies from a social learner who copied from a sociallearner who copied from an individual learner since thelatest state change,

(iv) and so on.

12 / 23

Up-to-date Social Learning

In equilibrium (i.e. xt = x for all t), the probability of

I (i) is (1− x) (1− ∆),

I (ii) is x (1− x) (1− ∆)2,

I (iii) is x2 (1− x) (1− ∆)3,

and so on.

Iterating and summing gives

∞

∑t=1

xt−1 (1− x) (1− ∆)t .

13 / 23

Out-of-date Social Learning

With complementary probability, a social learner chooses anaction that was copied before the last state change:

1−∞

∑t=1

xt−1 (1− x) (1− ∆)t .

After a state change, there is a 12 chance of being in either state,

so there is a 50% chance that a social learner is correct and a50% chance she is incorrect.

14 / 23

Learning Payoffs

πSL =

(∞

∑t=1

xt−1 (1− x) (1− ∆)t

)β

+12

(1−

∞

∑t=1

xt−1 (1− x) (1− ∆)t

)β

+12

(1−

∞

∑t=1

xt−1 (1− x) (1− ∆)t

)(−β)

=(1− x) (1− ∆)1− x (1− ∆)

β ,

which declines monotonically from (1− ∆)β to 0 as x goesthrough [0, 1].

πIL = β− c.15 / 23

Equilibrium

Three regimes:

1. c ≤ ∆β. IL Monomorphic Equilibrium: x = 0.

2. c ≥ β. SL Monomorphic Equilibrium: x = 1.

3. ∆β < c < β. Polymorphic Equilibrium:

x∗ =c− ∆β

(1− ∆)c.

16 / 23

Environmental Variability & Social Learning

dx∗

d∆= − c(β− c)

[(1− ∆)c]2< 0

Therefore, environmental variability limits social learning byreducing the amount of information stored in the behavior ofprevious generations.

Giuliano and Nunn (2017) find that populations with ancestorswho lived in regions with greater environmental variability(measured by temperature variation) are more ‘traditional’today.

17 / 23

Welfare and Adaptation

I Notice that the mean payoff is the same as in a population ofpurely individual learners.

I Hence social learning may not be adaptive.

I However, Boyd and Richerson (1995) show that social learningleads to higher average payoff in the population if it allows theaccumulation of behaviors that no individual learner couldacquire in a lifetime.

I Cumulative cultural evolution is rare among animals because itonly spreads when there is a critical mass of cultural learners(Boyd and Richerson 1996).

18 / 23

Cultural Learning with Coordination

Carvalho & McBride in progress

I It is hard to conceive of situations in which individuals learnsocially but act in isolation (payoff not frequency dependent).

I Now let payoff depend on matching the state and coordinatingwith other agents. All else is the same.

I Define p as the proportion of agents choosing the “correct”action.

I Payoffs:

πi =

{β + αp, if si = θ,

−β + α (1− p) , if si 6= θ.

19 / 23

Learning Payoffs

Hence the expected payoff to a social learner is

πSL =

(∞

∑t=1

xt−1 (1− x) (1− ∆)t

)(β + αp)

+12

(1−

∞

∑t=1

xt−1 (1− x) (1− ∆)t

)(β + αp) +

+12

(1−

∞

∑t=1

xt−1 (1− x) (1− ∆)t

)(−β + α (1− p))

=(1− x) (1− ∆)1− x (1− ∆)

(β + α

(p− 1

2

))+

12

α.

The payoff to an individual learner is

πIL = β + αp− c.

20 / 23

Solving for pThe proportion of individuals choosing the correct action, p, is

p = 1− x + x

(∞

∑t=1

xt−1 (1− x) (1− ∆)t

)

+ x12

(1−

∞

∑t=1

xt−1 (1− x) (1− ∆)t

)

= 1− x + x((1− x) (1− ∆)1− x (1− ∆)

+12

(1− (1− x) (1− ∆)

1− x (1− ∆)

))= 1− 1

2x∆

1− x (1− ∆).

Clearly, p > 12 . Furthermore:

∂p∂x

= −12

∆

(1− x (1− ∆))2 < 0

with p→ 1 as x→ 0, and p→ 12 as x→ 1.

21 / 23

Learning Payoffs with Coordination

Substituting for p in the payoff functions, the expected payoffto the individual learner is

πIL = β + α

(1− 1

2x∆

1− x (1− ∆)

)− c,

and the expected payoff to the social learner is

πSL =(1− x) (1− ∆)1− x (1− ∆)

(β + α

(1− 1

2x∆

1− x (1− ∆)− 1

2

))+

12

α

=(1− x) (1− ∆)1− x (1− ∆)

(β +

12

α1− x

1− x (1− ∆)

)+

12

α.

Both payoffs are strictly decreasing in x.

22 / 23

EquilibriumIn a polymorphic equilibrium:

x∗∗ =c− ∆β−

(p (x∗∗)− 1

2

)∆α

(1− ∆) c.

Recall that p (x) > 12 . Hence x∗∗ < x∗ = c−∆β

(1−∆)c .

I Social coordination (i.e., α > 0) reduces the equilibriumproportion of social learners.

I Direct effect of α > 0: Individual learners get a higher socialcoordination payoff because they always coordinate withthe majority while not all social learners do so.

I Indirect effect of α > 0: The direct effect reduces x and thusraises both πIL and πSL.

23 / 23

Blue Evolution & Learning in Games Econ 243B · I Hence social learning may not be adaptive. I...

Documents

Transcript of Blue Evolution & Learning in Games Econ 243B · I Hence social learning may not be adaptive. I...