Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science...

36
KeumGang Cha https://gitbub.com/chagmgang Idea Suggestions 6/28/2018 1 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 강화학습

Transcript of Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science...

Page 1: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

KeumGang Cha

https://gitbub.com/chagmgang

Idea Suggestions

6/28/2018 1Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering

강화학습

Page 2: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

소개

6/28/2018Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering2

• 부산대학교기계공학부학사

• 부산대학교기계공학부제어자동화시스템공학과석사

• 한국생산기술연구원연구원

• 한국과학기술원박사과정 (휴학)

• ㈜플랜아이미래연구소연구원

• 강화학습을취미로하는사람

Page 3: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

KeumGang Cha

Idea Suggestions

6/28/2018 3Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering

딥러닝

Page 4: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

딥러닝

6/28/2018Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering4

Page 5: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

딥러닝

6/28/2018Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering5

지도학습• 무엇인지알고싶을때

• 어디있는지알고싶을때등등..

• 사람으로따지면시각, 촉각, 후각, 미각

비지도학습• 생성하고싶을때

• 답이없는것을나누고싶을때

• 상상하고싶을때

• 비슷한것을만들어내고싶을때

• 사람으로따지면상상, 추론..

Page 6: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

딥러닝

6/28/2018Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering6

강화학습• 어떻게행동해야되는지알고싶을때

• 이행동으로인해다음상태가어떻게될지알고싶을때

• 원하는업무를수행하도록하고싶을때

강화학습

Page 7: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

강화학습의차이점

6/28/2018Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering7

지도학습 비지도학습

강화학습

• 환경과의상호작용

• 답을스스로도출

• 입력값에대한보상만주어짐

Page 8: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

지도학습의현주소

이미지분류및위치검출

6/28/2018Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering8

• 분류 : LeNet, AlexNet(Inception), VGG, GoogLeNet, ResNet…

• 분할 : RPN(Region Proposal Network)…

• 분류 + 분할 : RCNN, Fast-RCNN, Faster-RCNN

GoogLeNet ArchitectureFaster-RCNN

Page 9: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

비지도학습의현주소

데이터의재생성및스타일분류

6/28/2018Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering9

Image Style Transfer

• starGAN, waveGAN, cycleGAN, waveGAN ….

• AutoEncoder, Denoising Network, Unet…

Image Segmentation

waveGAN

Medical Analysis

Page 10: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

강화학습의현주소

원하는대상의최적화

6/28/2018Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering10

• Q-Network(Double, Dueling, DD)

• Prioritized Experience Replay

• Policy Gradient, ….

Page 11: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

KeumGang Cha

Idea Suggestions

6/28/2018 11Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering

왜강화학습을해야하는가

Page 12: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

희소성

6/28/2018Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering12

Page 13: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

인공지능의단계

6/28/2018Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering13

• 주변의상황을인지• 차선, 신호등, 보행자, 속도, 가속도…

• 상황에따른판단• 휠, 제동, 가속…

인지 & 판단

강화학습

지도학습

Page 14: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

KeumGang Cha

Idea Suggestions

6/28/2018 14Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering

강화학습이란?

Page 15: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

강화학습은?

6/28/2018Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering15

출처 : 파이썬과케라스로배우는강화학습(저자:이웅원)

Page 16: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

환경을정의해보자

6/28/2018Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering16

Markov Decision Process

Markov Decision Process is Markov reward process with decisions. It is environment in

which all states are Markov.

• A Markov Decision Process is a set of < 𝑆, 𝑃, 𝐴, 𝑅, 𝛾 >

• 𝑃 is a probability set of

• 𝑆 is a finite set of states

• 𝐴 is a finite set of actions

• 𝑅 is reward function, 𝑅𝑠𝑎 = 𝐸[𝑅𝑡+1|𝑆𝑡 = 𝑠, 𝐴𝑡 = 𝑎]

• 𝛾 is a discount factor 𝛾 ∈ 0,1

Bellman Equation

• 𝑄 𝑠𝑡 , 𝑎𝑡 = 𝑅𝑠𝑎 + 𝛾max𝑄 𝑠𝑡+1, 𝑎𝑡+1

Page 17: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

알고리즘

6/28/2018Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering17

가치최적화• 𝑎𝑐𝑡𝑖𝑜𝑛 = 𝑎𝑟𝑔𝑚𝑎𝑥

𝑎𝑄 𝑠𝑡 , 𝑎𝑡

• 현재상태에서가장가치가높은행동을선택

정책최적화• 𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝐽 𝜃 = Ε log 𝜋𝜃 𝑎 𝑠 𝑄 𝑠𝑡 , 𝑎𝑡

• 𝜋𝜃 𝑎 𝑠 : 현재상태에서어떠한행동을선택할확률

• 𝐽 𝜃 : 기대보상값

𝜃1

𝜃2

𝑒2

𝑒1

𝑦3

𝑦1

𝑦2

𝑝2

𝑝1

𝑑𝑠

Page 18: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

길찾기에적용해보자

6/28/2018Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering18

가치최적화(다이나믹프로그래밍)• 𝛾 is a discount factor 𝛾 ∈ 0,1

• 𝑄 𝑠𝑡 , 𝑎𝑡 = 𝑅𝑠𝑎 + 𝛾𝑄 𝑠𝑡+1, 𝑎𝑡+1

1 2 3 …

Up 0 0 0

Down 0 0 0

Left 0 0 0

Right 0 0 0

행렬로풀수있는문제

딥러닝과무슨관계가있는것인가

없습니다.

Page 19: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

DP

6/28/2018Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering19

강화학습으로다른문제를풀어보자

• 현실세계에서는모든 state들이정수, 자연수로표현되지않는다.

• 정수, 자연수로표현되더라도거대한차원의행렬을필요로한다.

• 예) CartPole => [a,b,c,d], Breakout => [210, 180, 3]

• 행동이 Discrete하게정의되지않을수있다.

• 예) 자율주행자동차 => 휠의각도, 제동의세기, 가속의세기

문제점

Page 20: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

해결책

6/28/2018Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering20

뉴럴넷으로해결

?

Page 21: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

해결책

6/28/2018Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering21

Page 22: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

KeumGang Cha

Idea Suggestions

6/28/2018 22Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering

강화학습연구분야

Page 23: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

연구분야

6/28/201823

Page 24: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

알고리즘

6/28/201824

Agent

Value Optimization Policy Optimization

Deep Q Network Policy Gradient

Double

Deep Q NetworkDeuling

Deep Q Network Actor-Critic

True Region

Policy Optimization

Proximal

Policy Optimization

Deuling

Deep Q Network

DD

Deep Q Network

Distributed

Deep Q Network

Multi Environment

Asynchronous

Synchronous

Page 25: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

성능

6/28/201825

Value Optimization

• 성능비교는 Value Optimization끼리만비교가가능

• 네트워크구성자체가다르기때문

Policy-Value Optimization

• OpenAI, ML-Unity 등많은강화학습관련회사에서

Proximal Policy Optimization(PPO)를 Baseline으로사용

Page 26: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

보상

6/28/201826

GAE(Generalized Advantaged Estimation)

HER(Hindsight Experience Replay)

Page 27: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

성능

6/28/201827

GAE

HER

Page 28: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

KeumGang Cha

Idea Suggestions

6/28/2018 28Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering

환경

Page 29: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

환경이란

6/28/201829

환경의조건• Agent와의상호작용이가능

• Agent의 Action에따른환경의변화

• Agent의 Action에따른보상

• Agent의목표미달성에따른손실제거

최적의환경(시뮬레이션)

• 게임(인간지능이가장많이필요한분야)

• 스타크래프트, 바둑, 체스, Atari, 카드게임

Page 30: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

무료로제공되는환경

6/28/201830

무료로제공되는환경

Page 31: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

KeumGang Cha

Idea Suggestions

6/28/2018 31Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering

어떤환경으로강화학습을할까

Page 32: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

여러가지환경

6/28/2018Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering32

로보틱스

시뮬레이션

Page 33: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

무엇을선택해야할까

6/28/2018Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering33

Page 34: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

왜스타크래프트2인가?

6/28/2018Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering34

Page 35: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

왜스타크래프트2인가?

6/28/2018Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering35

Page 36: Your presentation title here · 2020-01-13 · 소개 6/28/2018 Korea Advanced Institute of Science Technology (KAIST) Dept. of Civil and Environmental Engineering 2 • 부산대학교기계공학부학사

THANK YOU

Any questions or comments?

E-mail : [email protected], [email protected]

Github : http://github.com/chagmgang

6/28/2018 36Korea Advanced Institute of Science Technology (KAIST)

Dept. of Civil and Environmental Engineering