Chuong 10 - Hoc May
Transcript of Chuong 10 - Hoc May
-
7/23/2019 Chuong 10 - Hoc May
1/52
Tr TuNhn To
Nguyn Nht Quang
Vin Cng nghThng tin v Truyn thng
Trng i hc Bch Khoa H Ni
Nm hc 2009-2010
-
7/23/2019 Chuong 10 - Hoc May
2/52
Ni dung mn hc:
Gii thiu vTr tunhn to
Tc t
Gii quyt vn : Tm kim, Tha mn rng buc Logic v suy din
Biu din tri thc
Suy din vi tri thc khng chc chn Hc my
Gii thiu vhc my
Phn lp Nave Bayes
Hc da trn cc lng ging gn nht
Lp khoch
Tr TuNhn To2
-
7/23/2019 Chuong 10 - Hoc May
3/52
Gii thiu vHc my Cc nh ngha vHc my (Machine learning)
ng) ca n [ Si mon, 1983] Mt qu trnh m mt chng trnh my tnh ci thin hiu sut ca n
trong mt cng vic thng qua kinh nghim [ Mi t chel l , 1997]
Vic lp trnh cc my tnh ti u ha mt tiu ch hiu sut da trncc dliu v dhoc kinh nghim trong qu kh[ Al paydi n, 2004]
Bi u di n mt bi ton hc my [ Mi t chel l , 1997]Hc my = Ci thin hiu qumt cng vic thng qua kinh nghim
i vi cc tiu ch nh gi hiu sut P
Thng qua (sdng) kinh nghim E
Tr TuNhn To3
-
7/23/2019 Chuong 10 - Hoc May
4/52
Cc v dca bi ton hc my (1)
Bi ton lc cc trang Web theo s
T: D on ( lc) xem nhng trangWeb no m mt ngi dng c th
c c
P: T l (%) cc trang Web c donng
Interested?
E: Mt tp cc trang Web m ngidng ch nh l thchc v mt tpcc tran Web m anh ta ch nh lkhng thchc
Tr TuNhn To4
-
7/23/2019 Chuong 10 - Hoc May
5/52
Cc v dca bi ton hc my (2)
Bi ton phn loi cc trang Web theo cc ch
: n o c c rang e eo c c c n r c
P: Tl(%) cc trang Web c phn loi chnh xc
,ch
Which
cat.?
Tr TuNhn To5
-
7/23/2019 Chuong 10 - Hoc May
6/52
Cc v dca bi ton hc my (3)
Bi ton nhn dng ch
T: Nhn dng v phn loi cct trong ccnh ch vit tay
P: T l (%) cc t c nhndng v phn loing
Which word?
tay, trong minhc gnvi mtnh danh ca mt t
rightdo in waywe the
Tr TuNhn To6
-
7/23/2019 Chuong 10 - Hoc May
7/52
Cc v dca bi ton hc my (4)
Bi ton robot li xe tng
T: Robot (c trang bcccamera quan st) li xe tngtrn ng cao tc
P: Khong cch trung bnh mrobot c thli xe tngtrc khi xy ra li (tai nn)
Which steeringcommand?
E: Mt tp cc v dc ghili khi quan st mt ngi li xe
Gostraight
Moveleft
Moveright
Slowdown
Speedup
,mi v dgm mt chui ccnh v cc lnh iu khin xe
Tr TuNhn To7
-
7/23/2019 Chuong 10 - Hoc May
8/52
Qu trnh hc myTp hc
Trainin set
Tp d liu(Dataset)
Hun luynh thng
Tp tiu
(Validation set) T iu hacc tham sca h thng
(Test set)Th nghim
h thng
Tr TuNhn To8
-
7/23/2019 Chuong 10 - Hoc May
9/52
Hc c vs. khng c gim st Hc c gim st (supervised learning)
,
nhn lp (hoc gi tr u ra mong mun) ca v d hc Bi ton hc phn lp (classification problem)
_ _ _ _ , _ _ _
Bi ton hc d on/hi quy (prediction/regression problem)
D_train = {(, )}
Hc khng c gim st (unsupervised learning) Mi v d hc chcha m t (biu din) ca v d hc - m
mun ca v d hc
Bi ton hc phn cm (Clustering problem)_ _ _ _
Tr TuNhn To9
-
7/23/2019 Chuong 10 - Hoc May
10/52
Bi ton hc my Cc thnh phn chnh (1)
La chn cc v dhc (training/learning examples) Cc thn tin hn dn u trnh h c trainin feedback c cha
ngay trong cc v dhc, hay l c cung cp gin tip (vd: tmitrng hot ng)
Cc v dhc theo kiu c gim st (supervised) hay khng c gim st(unsupervised)
Cc v dhc phi tng thch vi (i din cho) cc v dsc s
dng bi hthng trong tng lai (future test examples)
Xc nh hm mc tiu (githit, khi nim) cn hc F: X {0,1}
F: X {Mt tp cc nhn lp}
F: X R+ (min cc gi tri sthc dng)
Tr TuNhn To10
-
7/23/2019 Chuong 10 - Hoc May
11/52
Bi ton hc my Cc thnh phn chnh (2)
La chn cch biu din cho hm mc tiu cn hc
Mt tp cc lut (a set of rules) Mt cy quyt nh (a decision tree)
-
La chn mt gii thut hc my c thhc (xp x)c hm mc tiu
Phng php hc hi quy (Regression-based)
Phn h hc u n lut Rule induction
Phng php hc cy quyt nh (ID3 hoc C4.5) Phng php hc lan truyn ngc (Back-propagation)
Tr TuNhn To11
-
7/23/2019 Chuong 10 - Hoc May
12/52
Cc vn trong Hc my (1) Gii thut hc my (Learning algorithm)
Nhng gii thut hc my no c thhc (xp x) mt hmmc tiu cn hc?
n ng u n n o, m g u c m y c nshi t(tim cn) hm mc tiu cn hc?
biu din cc v d(i tng) cth, gii thut hc myno thc hin tt nht?
Tr TuNhn To12
-
7/23/2019 Chuong 10 - Hoc May
13/52
Cc vn trong Hc my (2) Cc v dhc (Training examples)
Bao nhiu v dhc l ?
Kch thc ca tp hc (tp hun luyn) nh hng thn o v c n x c c a m mc u c c
Cc v dli (nhiu) v/hoc cc v dthiu gi trthuc-xc?
Tr TuNhn To13
-
7/23/2019 Chuong 10 - Hoc May
14/52
Cc vn
trong Hc my (3)
Qu trnh hc (Learning process)
Chin lc ti u cho vic la chn thtsdng (khaithc) cc v dhc?
c c n c a c n n y m ay m c p ctp ca bi ton hc my nhthno?
thng gp thno i vi qu trnh hc?
Tr TuNhn To14
-
7/23/2019 Chuong 10 - Hoc May
15/52
Cc vn
trong Hc my (4)
Kh nng/gii hn hc (Learning capability) Hm m c tiu no m h thn cn h c?
Biu din hm mc tiu: Kh nng biu din (vd: hm tuyntnh / hm phi tuyn) vs. phc tp ca gii thut v qutrnh h c
Cc gii hn (trn l thuyt) i vi kh nng hc ca cc gii thut
hc my? n ng qu a genera ze c a ng c c v c
trnh vn over-fitting (t chnh xc cao trn tp hc,nhngt chnh xc thp trn tp th nghim)
Kh nng h th ng t ng thay i (thch nghi) bi u di n (c u trc)bn trong ca n? ci thin kh nng (ca h thngi vi vic) biu din v hc
m mc u
Tr TuNhn To15
-
7/23/2019 Chuong 10 - Hoc May
16/52
Vn over-fitting (1) Mt hm mc tiu (mt githit) hc c h sc gi
-
tn ti mt hm mc tiu khc hsao cho: h km ph hp hn (t chnh xc km hn) hi vi tp
hc, nhng
ht chnh xc cao hn hi vi ton btp dliu (bao
gm cnhng v dc sdng sau qu trnh hun luyn)
Vn over-fitting thng do cc nguyn nhn:
Li nhiu tron t hun lu n do u trnh thu th /x d n
tp dliu) Slng cc v dhc qu nh, khng i din cho ton btp
Tr TuNhn To16
-
7/23/2019 Chuong 10 - Hoc May
17/52
Vn over-fitting (2) Gisgi D l tp ton bcc v d, v D_t r ai n l tp
Gisgi Er r D( h) l mc li m githit h sinh ra i, _ r a n
ra i vi tp D_t r ai n
_nu tn ti mt githit khc h :
Er r D_t r ai n( h) < Er r D_t r ai n( h ) , v
Er r D( h) > Er r D( h )
Tr TuNhn To 17
-
7/23/2019 Chuong 10 - Hoc May
18/52
Vn over-fitting (3) Trong scc githit (hm mc tiu)
hc c, githit (hm mc tiu) no Hm mc tiuf(x) nokhi qut ha tt nht tcc v dhc?
Lu : Mc tiu ca hc my l t c chnh xc cao tron
t chnh xc cao nh ti vi cc v d sau ny?
don i vi cc v dsau ny,khng phi i vi cc v dhc
f(x)
ccam s razor: u n c n mmc tiu n gin nht ph hp (khngnht thit hon ho) vi cc v dhc
qu a n Dgii thch/din gii hn
phc tp tnh ton t hnx
Tr TuNhn To 18
-
7/23/2019 Chuong 10 - Hoc May
19/52
Vn over-fitting V dTip tc qu trnh hc cy quytnh s lm gim chnh xcivi tp th nghim mc d tng chnh xci vi tp hc
Tr TuNhn To 19
[ Mitchell, 1997]
-
7/23/2019 Chuong 10 - Hoc May
20/52
Phn lp Nave Bayes L cc phng php hc phn lp c gim st v da
Da trn mt m hnh (hm) xc sut
c p n o a r n c c g r x c su c a c cnng xy ra ca cc githit
sdng trong cc bi ton thc t
Tr TuNhn To 20
-
7/23/2019 Chuong 10 - Hoc May
21/52
nh l Bayes
)().|( hPhDP
)(DP
lp) h lng
P( D) : Xc sut trc rng tp d liuDc quan st (thuc)
P( D| h) : Xc sut ca vic quan stc (thu c) tp d,
P( h| D) : Xc sut ca gi thith lng, viiu kin tpd liuDc quan st
Tr TuNhn To 21
-
7/23/2019 Chuong 10 - Hoc May
22/52
nh l Bayes V d(1)Xt tp d liu sauy:
D1 Sunny Hot High Weak NoD2 Sunny Hot High Strong No
vercas o g ea es
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak YesD6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak YesD10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
Tr TuNhn To 22
D12 Overcast Mild High Strong Yes
[Mitchell, 1997]
-
7/23/2019 Chuong 10 - Hoc May
23/52
nh l Bayes V d
(2)
Tp v dD. Tp cc ngy m thuc tnh Outlookc gi trSunnyvthuc tnh Windc gi trStrong
Githit (phn lp) h. Anh ta chi tennis
Xc sut trc P( h) . Xc sut anh ta chi tennis (khng phthuc
Xc sut trc P( D) . Xc sut ca mt ngy m thuc tnh Outlook
c gi trSunnyv thuc tnh Windc gi trStrongP( D| h) . Xc su t ca mt ngy m thuc tnh Outlookc gi tr
Sunnyv Wind c gi trStrong, vi iu kin (nu bit rng) anh tachi tennis
P( h| D) . Xc sut anh ta chi tennis, vi iu kin (nu bit rng)thuc tnh Outlookc gi trSunnyv Wind c gi trStrong
Phn h hn l Nave Ba es da trn xc sut c iukin (posterior probability) ny!
Tr TuNhn To 23
-
7/23/2019 Chuong 10 - Hoc May
24/52
Cc i ha xc sut c iu kin
Vi mt tp cc gi thit (cc phn lp) c thH, h thng hc
hypothesis)h( H) i vi ccd liu quan stcD Gi thit h n c i l i thit c c i ha xc sut c
iu kin (maximum a posteriori MAP)
)|(maxarg DhPhMAP = Hh)().|(
maxarg hPhDP
hMAP=(bi nh l Bayes)
)().|(maxarg hPhDPhMAP=(P( D) l nhnhaui vi cc ithit h
Tr TuNhn To 24
-
7/23/2019 Chuong 10 - Hoc May
25/52
MAP V d
TpHbao gm 2 gi thit (c th)h : Anh ta chi tennis
h2
: Anh ta khng chi tennis
Tnh gi tr ca 2 xc xut ciu kin: P( h1| D) , P( h2| D)
Gi thit c t h nht hMAP=h1 nuP( h1| D) P( h2| D) ; ngcli thhMAP=h2
, 1 , 2 thith1 vh2, nn c th b qua i lngP( D)
V v , cn tnh 2 biu thc: P( D h ) . P( h ) v
P( D| h2) . P( h2) , va ra quytnh tngng Nu P( D| h1) . P( h1) P( D| h2) . P( h2) , th kt lun l anh ta chi tennis
gc , u n an a ng c enn s
Tr TuNhn To 25
-
7/23/2019 Chuong 10 - Hoc May
26/52
nh gi khnng xy ra cao nht
Phng php MAP: Vi mt tp cc githit c thH, cntm mt githit cc i ha gi tr: P( D| h) . P( h)
Gis(assumption) trong phng php nh gi khnngxy ra cao nht (maximum likelihood estimation MLE):
P( hi ) =P( hj ) , hi ,hjH
Phng php MLE tm githit cc i ha gi tr P( D| h) ;trong P( D| h) c gi l khnng xy ra (likelihood) cadliu Di vi h
hypothesis)
)|(maxarg hDPhMLE=
Tr TuNhn To 26
-
7/23/2019 Chuong 10 - Hoc May
27/52
MLE V d Tp Hbao gm 2 githit c th
h1: Anh ta chi tennis
h2: Anh ta khng chi tennis
D: Tp dliu (cc ngy) m trong thuc tnh Outlookc gi trSunnyv thuc tnh Wind c gi trStrong
Tnh 2 gi trkhnng xy ra (likelihood values) ca dliu Di vi 2 githit: P( D| h
1
) v P( D| h2
)
, 1
P( Out l ook=Sunny, Wi nd=St r ong| h2) = 1/4
Githit MLE h =h nu P( D| h ) P( D| h ) ; v ngc
li th hMLE=h2 Bi v P( Out l ook=Sunny, Wi nd=St r ong| h1) 1) cc v dhc g n nh t vi v dc n phnlp, v gn v d vo lp chim sng trong sk v dhc gn nht ny
k thng c chn l mt sl, trnh cn bng vtlphn lp (ties in classification)
V d: k= 3, 5, 7,
Tr TuNhn To 43
-
7/23/2019 Chuong 10 - Hoc May
44/52
Hm tnh khong cch (1)
Hm tnh khong cch d
ging gn nht
Thng c xc nh trc, v khng thay i trong sut qu
La chn hm khong cch d
Cc hm khong cch hnh hc: Dnh cho cc bi ton c ccthuc tnh u vo l kiu sthc (xiR)
Hm khon cch Hammin : Dnh cho cc bi ton c cc
thuc tnh u vo l kiu nhphn (xi{0,1}) Hm tnh tng tCosine: Dnh cho cc bi ton phn lp
Tr TuNhn To 44
-
7/23/2019 Chuong 10 - Hoc May
45/52
Hm tnh khong cch (2)
Cc hm tnh khong cch hnh hc (Geometry distance
Hm Manhattan: =
=n
i
ii zxzxd1
),(
Hm Euclid: ( )=
=n
i
ii zxzxd1
2),(
Hm Minkowski (p-norm):n
i
p
ii zxzxd1
),(
= =
/1
Hm Chebyshev:
n
i
p
iip zxzxd
1lim),( = =
Tr TuNhn To 45
iii
-
7/23/2019 Chuong 10 - Hoc May
46/52
Hm tnh khong cch (3)
Hm khong cch=
n
zxDi erencezxdamm ng
i vi cc thuc tnhuvo l kiu nh phn
=i 1
==
)(,0
)(,1),(
bai
baifbaDifference
V d: x=(0,1,0,1,1)
n
Hm tnh tng tCosine
===
nn
i
iizx
zx
zxzxd
22
1.),(
cc gi tr trng s (TF/IDF)ca cc t kha
== ii
i
i zx11
Tr TuNhn To 46
-
7/23/2019 Chuong 10 - Hoc May
47/52
Chun ha min gi trthuc tnh
Hm tnh khong cch Euclid: ( )=
=n
i
ii zxzxd1
2),(
Gismi v dc biu din bi 3 thuc tnh: Age, I ncome (cho
mi thng), v Hei ght (o theo mt)x = (Age=20, I ncome=12000, Hei ght =1.68)
z = (Age=40, I ncome=1300, Hei ght =1.75)
Khong cch gia x v z = - 2 - 2 - 2 1/2, . . Gi trkhong cch ny bquyt nh chyu bi gi trkhong cch (s
khc bit) gia 2 v di vi thuc tnh I ncome
V: Thu c tnhI ncome c mi n i tr r t ln s o vi cc thu c tnh khc
Cn phi chun ha min gi tr(a vcng mt khong gi tr) Khong gi tr[0,1] thng c sdng xi xi _ _ _ _ _ _ _ _
Tr TuNhn To 47
-
7/23/2019 Chuong 10 - Hoc May
48/52
Trng sca cc thuc tnh Hm khong cch Euclid: ( )
=
=n
i
ii zxzxd1
2),(
Tt ccc thuc tnh c cng (nhnhau) nh hng i vi gi tr
khong cch Cc thu c tnh khc nhau c th nn c mc nh hn khc
nhau i vi gi trkhong cch
Cn phi tch hp (a vo) cc gi trtrng sca cc thuc tnhn
wi l trng sca thuc tnh i :( )
=
=i
iii zxwzxd1
2),(
Da trn cc tri thc cthca bi ton (vd: c chnh bi ccchuyn gia trong lnh vc ca bi ton ang xt)
Bn m t u trnh ti u ha cc i tr tr n s vd: sd n m t thc hc mt bcc gi trtrng sti u)
Tr TuNhn To 48
h h l ( )
-
7/23/2019 Chuong 10 - Hoc May
49/52
Khong cch ca cc lng ging (1)
Xt tp NB( z) gm k v dhc gntest instance z
Mi v d(lng ging gn nht) ny ckhong cch khc nhau n z
Cc lng gi ng ny c nh hng nhnhau i vi vic phn lp/doncho
z? KHNG! Cn gn cc mc nh hng (ng
gp) ca mi lng ging gn nht ty
Mc nh hng cao hn cho cclng ging gn hn!
Tr TuNhn To 49
Kh h l i (2)
-
7/23/2019 Chuong 10 - Hoc May
50/52
Khong cch ca cc lng ging (2)
Giv l hm xcnh trng s theo khong cch i vi mt gi tr d( x, z) khong cch giax vz v( x, z) t l nghch vi d( x, z)
i vi bi ton phn lp:
=)(
))(,().,(maxarg)(zNBx
jCc
xccIdenticalzxvzcj
==
)(,0
)(,1),(
baif
baifbaIdentical
. xxv
i vi bi ton d on (hi quy):
=
)(
)(
),()(
zNBx
zNBx
zxvzf
La chn mt hm xcnh trng s theo khong cch:1
),( zxv =2
1),( zxv =
2
2),(
),( zxd
ezxv
=
Tr TuNhn T
o
50
, ,
H NN Khi ?
-
7/23/2019 Chuong 10 - Hoc May
51/52
Hc NN Khi no? Cc v dc biu din l cc vecttrong khng gian sthc (Rn)
Slng cc thuc tnh (schiu ca khng gian) u vo khng ln
Tp hc kh ln (nhiu v dhc)
Cc u im ng c n c c ng c n g n u c c v c
Hot ng tt vi cc bi ton c slp kh ln
Khng cn phi hc ring rn bphn lp cho n lp Phng php hc k-NN (k >>1) c th lm vic c cvi dliu l i
Vic phn lp/don da trn k lng ging gn nht
Phi xc nh hm tnh khong cch ph hp Chi ph tnh ton (vthi gian v bnh) ti thi im phn lp/don
,attributes)
Tr TuNhn T
o
51
Ti li th kh
-
7/23/2019 Chuong 10 - Hoc May
52/52
Ti liu tham kho
E. Al paydi n. Introduction to Machine Learning. The MI TPr ess, 2004.
T. M. Mi t chel l . Machine Learning. McGr aw- Hi l l , 1997.
H. A. Si mon. Why Should Machines Learn? I n R. S.Mi chal ski , J . Car bonel l , and T. M. Mi t chel l ( Eds. ) :ac ne ear n ng: n ar c a n e gence appr oac ,
chapt er 2, pp. 25- 38. Mor gan Kauf mann, 1983.
Tr TuNhn T
o
52