ujava.org Reinforcement Learning (2nd)

download ujava.org Reinforcement Learning (2nd)

If you can't read please download the document

Transcript of ujava.org Reinforcement Learning (2nd)

www.idosi.com .

www.idosi.com .

www.idosi.com .

www.idosi.com .

Reinforcement Learning (2nd)

ujava.org Workshop

2016-08-12

www.idosi.com

CEO Shindong KANG

()

ujava.org

spaceapi.org

Reinforcement Learning for Brick Game

Reinforcement Learning for Brick Game

To Flip Pancake

Crawling Robot on Carpet

Pavlov's Dog

Pavlov

Reinforcement ()

Reinforcement Learning

Forecast

Forecast with probability

Unknown model & real facts

Deep Neural Network

Bayesian Probability

Variance ()

Variance ()

Randrom Variable

Types of Randrom Variable

Discrete Probability Distribution

Continuous Probability Distribution, Probability Density Function

Density ()

Expected value ()

EV = xP/1

Expected Value for Continuous variable

Covariance ()

Covariance

Probability ()

Conditional Probability ( )

Bayes rule

Bayesian Probability ( )

Bayesian Probability ( )

P(fair|H) = ?

P(A) = P(fair) = P(B) = P(H) = P(B|A) = P(H|fair) =

1--- = -- 3

Brownian motion ( )

Brownian motion, Gaussian distribution

Snapshot of state

Markov Chain

Process Probability ( )

s1s2s3

Episode process :

s1, s2 = ?

s2, s3 = ?

s1, s3 = ?

Markov Process

Markov Process

Math Product Symbol

Markov Process

Markov Process

Markov Process

Stochastic Matrix

Stochastic Matrix

0.4 0.60.7 0.3

2 Snapshots of state

Direction using Second Order

Markov Process

3 Snapshots of state

Acceleration using 3rd order

Exploitation and Exploration ( and )

State-action exploration vs. Parameter exploration

Multi-armed bandit problem

Thompson sampling

Simulated Bandit Performance

Multi-armed bandit problem

Multi-Armed Bandit Algorithms

MAB Reward

Function's Probability Distribution

Function's Probability Distribution ?

Function's Probability Distribution

y = ax^2 +b

Function's Probability Distribution with Gaussian Distribution

y = ax^2 +b

Function's Probability Distribution with Gaussian Distribution

Gaussian Process Regreesion

Gaussian Process

From C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006

Thompson sampling

Thank you !

()Intelligent City Ltd.

Shindong KANG

[email protected]