Tobias Jung, University of Mainz, Germany Daniel Polani ...tjung/Neuer Ordner/ecai06-slides.pdf ·...

19
Tobias Jung, University of Mainz, Germany Daniel Polani, University of Hertfordshire, U.K. Least Squares SVM for Least Squares TD Learning — ECAI 2006 – p.1/19

Transcript of Tobias Jung, University of Mainz, Germany Daniel Polani ...tjung/Neuer Ordner/ecai06-slides.pdf ·...

Page 1: Tobias Jung, University of Mainz, Germany Daniel Polani ...tjung/Neuer Ordner/ecai06-slides.pdf · Daniel Polani, University of Hertfordshire, U.K. ... fs~ igm =1 7 Least Squares

��� �� � ��� � � � �

� �

� � �� � � � � � � � � � � � ��

Tobias Jung, University of Mainz, Germany

Daniel Polani, University of Hertfordshire, U.K.

Least Squares SVM for Least Squares TD Learning — ECAI 2006 – p.1/19

Page 2: Tobias Jung, University of Mainz, Germany Daniel Polani ...tjung/Neuer Ordner/ecai06-slides.pdf · Daniel Polani, University of Hertfordshire, U.K. ... fs~ igm =1 7 Least Squares

� � �� � ��

� �� �� ��� � ��� � � �� � � � � � ��� �� �� ��� � �� ��� � � � �! �� �" � � � � � � "$# � � �� �� ��� � �� � � % �! � � ��� � � � � �

&('� )� ��* + � �, � ,- . �� � , � � % �! � �/ � % . � � � � � 0 + 132 4 5 6

7 �� � � % 2 8 �, � � % � �� � � � " 0 + 132 19 : 6

1 . 8, � � � ;� � " � � , , �� , � � �� �� ��� � � � � �

<>= ?' � �A@ ' �� * 4 �� � � � � � % � , B � 8 � " " �� C� � � % 2 D �� % � C �� � 8 % � �

& �� ��E �� �� � �� 8 � � � � F � 8 � , � � ; � D � D �� % �A,G

+ � �, � ,- . �� � , 4 5 % � �� � � � " 0� . F ; �, ��� � �/ �� " � � � � F � � � % � � � 4 5 0

λ

6 6

+ � �, � ,- . �� � , 19 : 0, . � �� � �� � � H � � � � �� � � � ��� �>I � � ; . � � � � � � � �� �� ��� � � �� , 6

Least Squares SVM for Least Squares TD Learning — ECAI 2006 – p.2/19

Page 3: Tobias Jung, University of Mainz, Germany Daniel Polani ...tjung/Neuer Ordner/ecai06-slides.pdf · Daniel Polani, University of Hertfordshire, U.K. ... fs~ igm =1 7 Least Squares

� � � � ��� � � � � � � � � � �

� : �� � �/ � � �, � � � �� � � , , 8 �, �! � % %� � �, �, �, � ;

�� �� ' * S = {s1, . . . , sN}

��� �� �* A = {a1, . . . , aM}

' � � ��*

R(s′|s, a)

�� � �* �� �� � ?� � � � � �� �� �' *

P (s′|s, a)

0 : �� � �/ 6

(Agent, Robot, ...)

Environment

Decision−maker

Control cycle t=0,1,2...

at st

st+1

rt+1

� � ' �� � E ' F � �, � � � � � �, � � � � � ��� �>I � � �� ; �� � � � �

� �� � ) ., . � % %� ��� ���� � ��� �� � � �

=⇒

F � �, � � � � � �, � � � � � ��� �>I � � � � ��� ' � @ � ' � � �

Least Squares SVM for Least Squares TD Learning — ECAI 2006 – p.3/19

Page 4: Tobias Jung, University of Mainz, Germany Daniel Polani ...tjung/Neuer Ordner/ecai06-slides.pdf · Daniel Polani, University of Hertfordshire, U.K. ... fs~ igm =1 7 Least Squares

� � � � ��� � � � � � � � � � � �

� � �� ' � �� � � � � � �� � � � H � � �� 2 F �� �>I � � � �, � . � �� � , . � � ;� � D �� �A,

� � �� ' '� � )' � ' �

�� � �� �

π : S −→ A

0 � � ��� � � � �, � �! # , � � � � � � �� � 6

� �� ' � ��� �� � 0

0 < γ < 1

� �, � . � � ; � � �� 6

V π(s) = Eπ

t≥0

γtR(st+1|st, at) | s0 = s, at = π(st)

∀s

�' � � @ � � * � �* V π � 8 � � , H � � � 2 � � � � �� � % � � � � � V π = TπV π# D F �� �

(

TπV)

(s) =∑

s′

P (s′|s, π(s))[

R(s′|s, π(s)) + γV (s′)]

� � �� , � � � � � � ��� � % � � % �! � π∗ = argmaxπ V π# ���

��

� � � % �! � π∗ , � � �, ;� � � " V π∗

(s) ≥ V π(s)# ∀s, ∀π

Least Squares SVM for Least Squares TD Learning — ECAI 2006 – p.4/19

Page 5: Tobias Jung, University of Mainz, Germany Daniel Polani ...tjung/Neuer Ordner/ecai06-slides.pdf · Daniel Polani, University of Hertfordshire, U.K. ... fs~ igm =1 7 Least Squares

� � � � ��� � � � � � � � � � � � �

� � � � �� D �� , � � � 8 � � � � π∗# ��

"�

� � � F � �, 8 �, � � � � ?� � �� � �� ' � �� �� �

�� � �� � �� ' � �� �� � F � �, � � � � � � � % � � % �! � π0�

� ��� � �� ; �� k = 0, 1, . . .

�� � �� � ' E �� �� �� � �� � . ��

V πk

�� � �� � �@ ?� � E ' @ ' �� � �� �/ � "� � � ��� � � % �! � πk+1

;� �� V πk

Gπk+1(s) = argmax

a

{

s′

P (s′|s, a)[

R(s′|s, a) + γV πk (s′)]

}

,∀s

� � � �� ' @ * : � � � %

P (s′|s, a), R(s′|s, a) � ., � 8 � � � � D � −→ , ��� . % � � � � �

� . � 8 �� � ; , � � �� , % �� " � �� , � � �� , ∈ Rd −→

; . � � � � � � � �� �� ��� � � � � �

=⇒

��� �� �� ��� � �� � � % �! � � ��� � � � � �# � � �� �� ��� � �� � � % �! � �/ � % . � � � � � 0 D � � F % � �, � ,- . �� � , 6

Least Squares SVM for Least Squares TD Learning — ECAI 2006 – p.5/19

Page 6: Tobias Jung, University of Mainz, Germany Daniel Polani ...tjung/Neuer Ordner/ecai06-slides.pdf · Daniel Polani, University of Hertfordshire, U.K. ... fs~ igm =1 7 Least Squares

� � � �� � � �� � � � � � ��� � � � � � � � ��

�* * @ ' / � % . � ; . � � � � � �, % � � � �� %� � �� � � � ��� �>I � �

V (s) = [φm(s)]T w

D F �� �

φm(s) = [φ1(s), . . . , φs]

� m × 1

;� � � .� � / � � ��

w

m × 1 D� � " F �/ � � ��

φi(s) : S → R

8 �, �, ; . � � � � �

� ? ?� � = �A@ �� ' ?� � �� � ' E �� �� �� � F � �, � � � � � � � % � � % �! � π0�

� ��� � �� ; �� k = 0, 1, 2, . . .

� 8, �� / � � % � � " �� � � � � �� � . � � �� H � � �

πk0 ��

"�

� " � � � � � ��� � �, D � � F � �/ �� � � � � � � 6

.....s0 s1 s2 st−1 sta0 a1

a2

at−1

at

r1 r2 rt

D F �� � si ∼ P (si|si−1, ai−1), ri = R(si|si−1, ai−1), ai = π(si)

�, � ��� � �� V πk

., � � " � F � �� � � � � �� � 0 � ? ?� � = �A@ �� ' ?� � �� � ' E �� �� �� � 6

5 �� �/ � πk+1

�, " � � � �� � � % �! � ;� �� πk

Least Squares SVM for Least Squares TD Learning — ECAI 2006 – p.6/19

Page 7: Tobias Jung, University of Mainz, Germany Daniel Polani ...tjung/Neuer Ordner/ecai06-slides.pdf · Daniel Polani, University of Hertfordshire, U.K. ... fs~ igm =1 7 Least Squares

� � � �� � � �� � � � ��� � � � � � � � � � � � � � � � � �� � � � � � �

� � �� � � � � ��� � � � �

w

� � V πk (s) = [φm(s)]T w

;� �� �� � � � � �� � s0, s1, . . . , st

� � �� � D �� �A,r1, . . . , rt�

�' � � @ � � � ' * � � �� @ � � �A@ ��� �� �� � � ? ?� � �� ) � � ��� � � � � D� � " F �,

w

8�w = argmin

w

{

t−1∑

i=0

[φm(si)]T w −

s′

P (s′|si, πk(si))[

R(s′|si, πk(si)) + γ[φm(si+1)]T w

]

}2

� �� �'� ' � @ � � �*� �� � � � �* �� �� �* D� � � ., � � F � � 8, �� / � � �� � � � � �� � G

w = argminw

{

t−1∑

i=0

[φm(si)]T w −

[

ri + γ[φm(si+1)]T w

]

}2

.

��� � �� ��� � �� � �� � � � �� � � ���� ��� �� � � � ���� −→� � � ! " � �� ! �# $� ��� � %

& � = ' � � ?� � �� � ? ?� � = �@ �� �� � � ? ?� � �� ) '( � � ) * � � ��� � � � � D� � " F �,

w

8� , � %/ � � "

w = argminw

{

t−1∑

i=0

[φm(si)]T w −

[

ri + γ[φm(si+1)]T w

]

}2

.

��� � + + �+ � � � �-,� � � . /0 1 �2 %43 �" � � � �� �5 �� �+ � 6# � �� 87 %

Least Squares SVM for Least Squares TD Learning — ECAI 2006 – p.7/19

Page 8: Tobias Jung, University of Mainz, Germany Daniel Polani ...tjung/Neuer Ordner/ecai06-slides.pdf · Daniel Polani, University of Hertfordshire, U.K. ... fs~ igm =1 7 Least Squares

�� � � �� � � � � �

( ' �*� *� � � ' * � � � ? ?� � = �A@ �� ' ?� � �� � ' E �� �� �� �

� F � � D� �� � % � � � � � " ; �� G

approximatevalue functions

V πtrue

V πoptimal

� � % % � � �� � , � � . � % � � � ��� �>I � � � � �G

approximatevalue functions

V π

TπV π

� � � � � � � � � � � � �� �� ��� � � � � �G

approximatevalue functions

V π

Tπ V π

�� � � � �� ' �' � � @ � � � ' * � � �� @ � � �A@ ��� �� �� � � ? ?� � �� ) ∥∥

∥V − TπV

2

→ minV

& � = ' � ?� � �� � ? ?� � = �@ �� �� � � ? ?� � �� ) '( � � ) *

V = argminV

∥V − TπV

2

��� �� 5 � � � �� . /�� /� � � � �� � � �� �� � � $# � � � � � " # � $ �� � $ �� ���� $ 7 7 7

Least Squares SVM for Least Squares TD Learning — ECAI 2006 – p.8/19

Page 9: Tobias Jung, University of Mainz, Germany Daniel Polani ...tjung/Neuer Ordner/ecai06-slides.pdf · Daniel Polani, University of Hertfordshire, U.K. ... fs~ igm =1 7 Least Squares

� � � � � � � � � � ���

� � � � �� � �

� � � � � � � � � � � �� � �

' � �� � F � D + 132 19 : 0� � " . % �� �>I � � � � � � � � D �� �,# ��� � � %� � � " � � � " � � , , � � �# � � ., , � � � �� � � , , � � "� � , , � � � 6 �� �

� � � % � � � � � ; . � � � � � � � �� �� ��� � � � � ��

� )' * )� �� *� � � � 0 ., � � " * �* '� � � � ' � ' * * � � * � � �� �� ��� � � � � � 6

� �/ � �G , �� �

t

�� � � � � � " � � � � {xi, yi}ti=1

� F � �, � G � ��� � � % ; . � � � � � k � F � � " � � �� � �� , 7 1

Hk# � F � ; . � � � � � , � � � � ; � �, , � 8 % �

, � % . � � � �, 0 ��

"�

� � %� � �� � � %,# � � ., , � � � � �,# � � �6

1 � % � �G � * �* '� {xi}mi=1

� ; � F � �� � � � � � " � � � �# D F �� � m � t

� �� � , � � �G � F � , � % . � � � � 8� f(·) =∑m

i=1 k(xi, ·)wi1 � %/ � G � F � - . � �� � � �! �2 8� 2 � �� � 8 % � � � � � 8 � � � � � F � D� � " F �,

w

minw∈Rm

‖Ktmw − y‖2 + λwT Kmmw

D F �� � [Ktm]ij = k(xi, xj)

t × m � � �� � �

[Kmm]ij = k(xi, xj)

� m × m � � �� � �

λ

�� � " . % �� �>I � � � � � � �� � � � ���

�+ � # � � � �5 # �� � � �� � � $# � � � � � " # �� 7 7 7

Least Squares SVM for Least Squares TD Learning — ECAI 2006 – p.9/19

Page 10: Tobias Jung, University of Mainz, Germany Daniel Polani ...tjung/Neuer Ordner/ecai06-slides.pdf · Daniel Polani, University of Hertfordshire, U.K. ... fs~ igm =1 7 Least Squares

� � � � � � � � � � � ��

� ? ?� �� �� �� �� � �' � � @ � � � ' * � � �� *

4� � � � � � " � � � �G � 8, �� / � � �� � � � � �� � s0, s1, s2, . . . , st

� % � � " D � � F� � D �� �,

r1, r2, . . . , rt

� �� � , � � �/ � % . � ; . � � � � � 8� V π(·) =∑m

i=1 k(si, ·)wi

D F �� � {si}mi=1

�, � , . 8, � �

1 � %/ � � F � - . � �� � � �! �2 8� 2 � �� � 8 % � � �� � � , � � � � � � " � � � � % % � � �min

w∈Rm‖Htmw − r‖2 + λwT Kmmw

D F �� �

km(·) =

k(s1, ·)

���

k(sm, ·)

, Htm =

[km(s0) − γkm(s1)]T

���

[km(st−1) − γkm(st)]T

, r =

r1

���

rt

� 8 � � � � " � � �� � % �>I � � � �� � � % � - . � � � � �,G w =(

HTtmHtm + λKmm

)−1HT

tmr

� $ �� , " � � � � � � � + � � � ���5 � � ��� � ��� # � � �" �� � �

{si}m

i=1

7 7 7

Least Squares SVM for Least Squares TD Learning — ECAI 2006 – p.10/19

Page 11: Tobias Jung, University of Mainz, Germany Daniel Polani ...tjung/Neuer Ordner/ecai06-slides.pdf · Daniel Polani, University of Hertfordshire, U.K. ... fs~ igm =1 7 Least Squares

�� � � � � � � � � � � � � � � � � � � �

� � � � � � �

�� � � � ��� ��# � � �� � �� �� 2 2 � %3 �� , � � � �# � 7 �� 2 2 � % �

� �� � �' * ' � ' �� �� � �, , . � � � � � � 8 � �� � , �/ � � % � 8 % � * ' � ' �� � �� � � � �

t = 1, 2, . . .1 � �� � D � � F � � � � � �� , . 8, � � 0 C � �! � � � � �� � C � ; 8 �, �, ; . � � � � �, 6

� � � ��� �

t

�� � � � � � �� �� ��� � �� � F � � � D � � � �

st

;� �� � F � .� � � � � � �! � � � � �� � G

�� � ��� � � �G � ;

k(st, st) − [km(st)]T K−1mmkm(st) > TOL

� F � � st

�, � � � � � � � , . 8, � �

�/ �� � % % �, �,G O(m2)# D F �� �m

�, � F � .� � � � � , �>I � � ; , . 8, � �

�+ � �" � ��� , � �� � � ��� , � , � � ��� 7 7 7

Least Squares SVM for Least Squares TD Learning — ECAI 2006 – p.11/19

Page 12: Tobias Jung, University of Mainz, Germany Daniel Polani ...tjung/Neuer Ordner/ecai06-slides.pdf · Daniel Polani, University of Hertfordshire, U.K. ... fs~ igm =1 7 Least Squares

� � � � �� � � � � � � � � � � � �

� � ' � � * � E ' �@ ?� ' @ ' �� �� �� � � � � ��� � � � 8, �� / �

st−1 → st

. � � �� � � D �� �

rt

� � �� . � � � ��

wtm =

Ht−1,m

htT

T

Ht−1,m

htT

+ λKmm

−1

Ht−1,m

htT

T

rt−1

rt

)' � � �� �� �

Pt−1,m = (HTt−1,mHt−1,m + λKmm)−1# st−1,m = HT

t−1,mrt−1 � �� � � ? � �� ' * / � �G

� . �� �G � �� � . % � � ; 1 F �� � � � 2 : �� � �, � � 2 � � � � 8 .� � 0 C� � .� , �/ � % � �, � ,- . �� � , C 6

� . �� �G � � � � �� � ; � F � % � , �� ; � � �� �>I � � � � �G Pt−1,m = Φ1/2

t−1,mΦT/2

t−1,m

� �* ' � � .� � � � � � � � � � % � �, � ' ?� ' * ' �� ' � ' � � 8� .� � � � � � �! � � � � �� �

� � � � �� {Pt−1,m, st−1,m,wt−1,m} −→ {Ptm, stm,wtm}

� �, �, O(m2)

� �* ' � � .� � � � � � � � � � % � �, ��� � ' ?� ' * ' �� ' � ' � � 8� .� � � � � � �! � � � � �� �

� � �

st

�, � � D 8 �, �, ; . � � � � � � � � F � � �! � � � � �� �

� � � � �� {Pt−1,m, st−1,m,wt−1,m} −→ {Pt,m+1, st,m+1,wt,m+1}

� �, �, O(m2)

� �� # � � � � � $� � � � � # $� ��� � 7 7 7Least Squares SVM for Least Squares TD Learning — ECAI 2006 – p.12/19

Page 13: Tobias Jung, University of Mainz, Germany Daniel Polani ...tjung/Neuer Ordner/ecai06-slides.pdf · Daniel Polani, University of Hertfordshire, U.K. ... fs~ igm =1 7 Least Squares

� � � � �� � � � � � ��� � � � � � � � � � �

� �* � � � % �! � 2 �/ � % . � � � � � ; �� � � � � � , � � � ;� �# � � � � 8, �� / � � �� � �, � � � � �, . � � �� � � � � � ��� � % � � % �! �

�� @ ? � � � � 4 � % � � � � � " � � : � � 0 � � � � � � � � 6/ ,�

� .� � � �� � � F 0 � �2 ��� � � % 6/ ,�

H � � � � �2 � � � 0 � � � � � 6

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Goal

A optimal Trajectory

Puddleworld(101x101 cells)

0 100 200 300 400 500

0

5

10

15

20

25

30

Trials

Mea

n a

pp

roxi

mat

ion

err

or

per

tri

al PuddleworldLSTD PolicyEvaluation

CMAC 10x10x10

RBFnet 12x12

LSSVM (nu=0.1, sigma=1/50)

LSSVM (nu=0.01, sigma=1/50)

' � � � ' � � ' * � � � ' * � : � � 0 � � � � D� � " F �, 6# H � � � � �2 � � � 0 �� � D� � " F �, 6# � .� � � �� � � F 0 � � � � � � �

D� � " F �, 6

� �# �# �" � � �� $# �5 � �� � �

−→�� �� $��� ��5 � � �5 � � ��� # �� � 7 7 7

Least Squares SVM for Least Squares TD Learning — ECAI 2006 – p.13/19

Page 14: Tobias Jung, University of Mainz, Germany Daniel Polani ...tjung/Neuer Ordner/ecai06-slides.pdf · Daniel Polani, University of Hertfordshire, U.K. ... fs~ igm =1 7 Least Squares

� � � � �� � � � � � � �� � �� � � � ��� � � �� � � � � �

< ? �* � � �� � �* � * ? � �� ' � � � � � � � ? � � � � � � ) �� �

�� @ ? � � � � �� � � 8 � � � , �� , � 0

λ

6 B � � % � � � � � " / ,�

� .� � � �� � � F

0 500 1000 1500 2000−160

−140

−120

−100

−80

−60

−40

−20

0

Trials

To

tal r

ewar

d p

er t

rial

(s

mo

oth

ed a

vera

ge

of

100

run

s)

Puddleworld OPI

online sarsa(lambda) CMAC 20x20x10

LSSVM−LSTD (nu=0.01,sigma=1/50)

0 200 400 600 800 1000−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Trials

To

tal r

ewar

d p

er t

rial

(s

mo

oth

ed a

vera

ge

of

100

run

s) LSSVM−LSTD (nu=0.01,sigma=1/50)

online sarsa(lambda) CMAC 10x10x10

online sarsa(lambda) CMAC 7x7x7

Puck−on−hill OPI

�+ � � � $�� �5 �� �� � �" � ��+ � � �# !� � # � �+ � � ! �� ���� $ �� � �� � # � �� %

Least Squares SVM for Least Squares TD Learning — ECAI 2006 – p.14/19

Page 15: Tobias Jung, University of Mainz, Germany Daniel Polani ...tjung/Neuer Ordner/ecai06-slides.pdf · Daniel Polani, University of Hertfordshire, U.K. ... fs~ igm =1 7 Least Squares

� � � � � � � � �

� � � ' �� � � � � � � �A@ ?� � E ' @ ' �� *

� % % � D *� � � ) �*� �>� � � � �* �� �� �* G H � � � 2 � � � � � � � �� �� ��� � � � � � + 1 4 5 0

λ

6 � �, �� � � � ; � � % % � � �

� � , � � . � %,� % % � D @ � �' � � �� ' ' � ' � � � � � G � �, � � �� � . " � � � �� � , � � �� 2 � � � � �/ � % . � , 0 �2 ; . � � � � � � �, �� � � � ;

9 2 ; . � � � � � 6

� � �, � � �� � * ?' � E �>* ' � � � ��� � � � � .� � � " 8 �, �, , � % � � � � � =⇒� � � . � , , �>I � � ; � �! � � � � �� � 8�

� � ��� � � �

Least Squares SVM for Least Squares TD Learning — ECAI 2006 – p.15/19

Page 16: Tobias Jung, University of Mainz, Germany Daniel Polani ...tjung/Neuer Ordner/ecai06-slides.pdf · Daniel Polani, University of Hertfordshire, U.K. ... fs~ igm =1 7 Least Squares

� � � � �� � � � � � � � � � � � � � � � � ��

�� � � �

� � �� � ' � � � F � D � � � � � ��� �>I � � F � � ��� � � F � �� � � �� , � � �� � % � F � 8 � % % 0� � � � ; �� � � � � � % � �� � � � " 6

Taker #2Taker #1

Keeper #2

Keeper #1

Center

−pass to keeper#1 or #2−hold the ball

Acting keeper with ball; may

Boundary (20m x 20m)

� ) �� � ' � ' * � �@ ' �* �� � �� �� � � ; � F � , � � �� , � � � 0 �� � ��� � �, � � �, 6

*� � � ) �*� �>� � � � �* �� �� �* 0 � � �, � � �� � � � � � �, � � � � � � � �,# � . % � � � % � ; . % %� � . � � � �� � ., � " � � �,

� � � � � � � � � �� � �� 6

� ' �� �� �A@ ' � ' � � � � � 0 ., � , C � � � � % C , � �� , �� / �� 6

Least Squares SVM for Least Squares TD Learning — ECAI 2006 – p.16/19

Page 17: Tobias Jung, University of Mainz, Germany Daniel Polani ...tjung/Neuer Ordner/ecai06-slides.pdf · Daniel Polani, University of Hertfordshire, U.K. ... fs~ igm =1 7 Least Squares

� � � � �� � � � � � � � � � � � � � ��

�� @ ? � � ' �� � � 8 � � � � � �� � � F , �� , � 0

λ

6 B � � % � � � � � " / ,�

� .� � � �� � � F

0 5 10 15 20 25 30 35 404

6

8

10

12

14

16

18

20

223vs2 keepaway (field size 20m x 20m)

Training time (hours)

Epi

sode

dur

atio

n (s

ecs)

Our approach

Stone, Sutton & Kuhlman (2005)

Random behavior

Optimized handcoded behavior

Least Squares SVM for Least Squares TD Learning — ECAI 2006 – p.17/19

Page 18: Tobias Jung, University of Mainz, Germany Daniel Polani ...tjung/Neuer Ordner/ecai06-slides.pdf · Daniel Polani, University of Hertfordshire, U.K. ... fs~ igm =1 7 Least Squares

)' @ � �*� � �� �� � �

/ � + �# � � � � � �� # � $ �� �# � � �� � �5 � # �� # , � % �� �" � ��� # � � � �� �# � � � � 2 � �5 � # �� # , � % � �

Least Squares SVM for Least Squares TD Learning — ECAI 2006 – p.18/19

Page 19: Tobias Jung, University of Mainz, Germany Daniel Polani ...tjung/Neuer Ordner/ecai06-slides.pdf · Daniel Polani, University of Hertfordshire, U.K. ... fs~ igm =1 7 Least Squares

� � � � �

� ��� ' � � ��� � �� ? �� � � �� �� ��� � �� ��� � � � �! �� �" � � � � � � "

� �' � �� 8 � � � + 132 19 : 0, . � �� � �� " � � �� � % �>I � � � � � 6 D � � F + 132 4 5 0/ �� � ; �, � � �/ �� " � � � 6

&('� )� ��* � � �� �� ��� � �� � � % �! � �/ � % . � � � � � D � � F % � �, � ,- . �� � , � � � F � �,

� � ��� � � � �, � �! B , � � F �, � �!

� � � � % 2 8 �, � � B � � � � % 2 ;� � �

+ 132 19 : B , . 8, � � � ;� � " � � , , �� ,

� � % � � � , � % � � � � � � ; , . 8, � � 0 . �, . � �� / �, � � B , . � �� / �, � � 6

' * �� * 8 � � �, , � � � � �� � � � % � � � � � " � � + 1 4 5 � � � � � � 0 ; �� � �� �� � 8 % � � , 6

�� � � � � �! � % ��� �� �/ � � � � �, � � � / , � �� � � � D ��

Least Squares SVM for Least Squares TD Learning — ECAI 2006 – p.19/19