ML + 주식 phase 2

주식삽질기phase 2 (+강화학습)

신호철

- 지도학습- 정답을알고있는데이터를이용해학습

- 비지도학습- 정답없이주어진데이터로학습

- 강화학습- 행동의결과로나타나는보상을통해학습

머신러닝

Agent가문제정의를하기위한구성요소

- 상태(state)- 현재에이전트의정보.- 에이전트가상태를통해상황을판단해서행동을결정하기에충분한정보를제공.

- 행동(action)- 에이전트가어떠한상태에서취할수있는행동(사기, 팔기, …)

- 보상(reward)- 자신이했던행동들을평가할수있는단서.

- environment

Q Learning

- Q function : state-action value function- Q(state, action) → reward(quality)

- ex.- Q(state1, LEFT) = 0- Q(state1, RIGHT) = 0.5- Q(state1, UP) = 0.2- Q(state1, DOWN) = 0.1 - 위와같은경우 state1의경우에는 RIGHT일때기대값이가장크므로, RIGHT action을취하면된다는것을알수있음.

(dummy) Q-Learning algorithm

1. 모든 s,a에대해 Q(s,a)=0로초기화2. observe current state s3. 액션 a를선택하고실행4. reward(r)을받음5. observe the new state s’ : s’로이동6. update Q(s,a)

a. Q(s,a) ← r + max Q(s’,a’)

7. s ← s’8. 3으로계속반복.

Exploit vs Exploration : decaying E-greedy

for i in range(1000):

e = 0.1 / (i+1)

if random(1)<e:

action = random

else:

action = argmax(Q(s,a))

Discounted reward

- …

where 0 <gamma < 1

Deterministic vs Stochastic

- Deterministic- the output of the model is fully determined by the parameter values and the initial conditions - 항상정해진입력에대해서동일한출력이나오는환경을의미.- frozen lake 게임에서 is_slippery가 False인경우로생각하면됨.

- Stochastic : - the same set of parameter values and initial conditions will lead to an ensemble of different outputs.- frozen lake에서 is_slippery가 True인경우와유사함.- 동일한입력에대해서항상동일한결과가나오지않는환경.

Learning incrementally

- …

위의기존 Q를그대로받아들이는방식과는달리 learning rate를도입함.

- …

- tensorflow의 learning rate와동일한개념.- 기존의 Q를모두받아들이는방식과는다르게, learning rate 만큼만받아들임.

Q Network

- 기존 Q-Table 방식으로커버할수없는수많은 + 복잡한경우가존재하는케이스는?

- Q-Table 대신 Neural Network을사용- State를입력으로하고, 가능한 Action들을출력으로주도록 network을구성

training Q Network

- 예측한 Q와 optimal Q와의차이가최소가되도록.

Q-Learning Algorithm

DQN

- 그러나앞의 Q-Learning만으로는잘동작하지않음.- Correlations between samples- Non-stationary targets

- 이문제는 deepmind에서해결.- DQN papaer : https://www.nature.com/nature/journal/v518/n7540/full/nature14236.html- HOW ?

- go deep- experience replay : buffer에저장해놓고있다가 random하게샘플링해서minibatch.- separate target network & copy network.

https://www.nature.com/nature/journal/v518/n7540/full/nature14236.html

주식데이터에적용

- pandas, sqlite3, sklearn을사용해서전처리- https://github.com/dspshin/DQN_examples/blob/master/stock.ipynb

- 소스코드

- https://github.com/dspshin/DQN_examples

https://github.com/dspshin/DQN_examples/blob/master/stock.ipynb

https://github.com/dspshin/DQN_examples

소스코드간단해설

- 기반은 cartpole example- https://github.com/dspshin/DQN_examples/blob/master/cart_pole_dqn2015.py

- gym을 stock과관련된동작을하도록재정의- gym.reset(), gym.step()을재정의

- 로직의간단화를위해, 주식거래단위는 1개로고정.- 추후보다현실적으로변경필요.

- action은 3가지 -매도, 매수, 아무것도안하기- state의구성요소

- ['ratio', 'amount', 'ends', 'foreigner', 'insti', 'person', 'program', 'credit']

- https://github.com/dspshin/DQN_examples/blob/master/my_gym/gym.py

- dqn network 변경

https://github.com/dspshin/DQN_examples/blob/master/cart_pole_dqn2015.py

https://github.com/dspshin/DQN_examples/blob/master/my_gym/gym.py

실행결과

- 삼성전자에대한결과.- default profit : 기간초에사고기간말에팔았을때의이익- train based profit : train data에대해 dqn에따라사고판이익- test based profit : test data에대해 dqn에따라사고판이익

To do list

- 다른종목들에대한테스트

- 입력인자를더추가

- Network을다르게구성

The End

참고자료

- ML+주식 1차 https://docs.google.com/presentation/d/1GqpoxDd-AiBJ_FkhqUC8J-7iOijDZqJv7DVucRNFnlk

- http://hunkim.github.io/ml/-

https://docs.google.com/presentation/d/1GqpoxDd-AiBJ_FkhqUC8J-7iOijDZqJv7DVucRNFnlk

http://hunkim.github.io/ml/

ML + 주식 phase 2

Technology

Transcript of ML + 주식 phase 2