第七章 網路資料庫之關連法則探勘

download 第七章  網路資料庫之關連法則探勘

If you can't read please download the document

description

第七章 網路資料庫之關連法則探勘. 內容概要. 簡介 關連法則探勘 (Association Rule Mining) 多層次關連法則探勘 (Multilevel Association Rule Mining) 數量化關連法則探勘 (Quantitative Association Rule Mining) 關連分析 (Correlation Analysis) 總結. 簡介 (1). 單一購物車告訴我們個別顧客的消費行為,但是累積大量的購物車資料之後,可以分析整體顧客的消費習慣。 - PowerPoint PPT Presentation

Transcript of 第七章 網路資料庫之關連法則探勘

  • (Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)

  • (1)IBM PC ViewSonic

  • (2)80%

  • (Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)

  • 7-1

  • (1) (itemset)XXTT (support) X

  • (2)X (support count) XX (support) X7-1 2 5 5/10=0.5{2,5} 3 3/10=0.3 X Y [,] X Y X Y X Y

  • (3)X Y (confidence)

  • (4) (minimum support) (minimum confidence) (minimum support count)7-10.20.5 {1,3} {5}20.2{1,3}0.3 {1,3} {5}0.2/0.3=0.67

  • (5) (large itemset)Z XY

  • (6)7-10.20.7{1,3}{1}{3}{3}{1}{1}{3}0.3/0.4=0.75{3}{1}0.3/0.5=0.6{1}{3}

  • Apriori k k- (k-itemset)Lkk- (large k-itemset) Apriori1- L1L1 L2L2L3

  • Apriori Apriori Apriori {A,B}{A,B}{A}{B}{A}{A,B}{A}{B}

  • Apriori Apriori (candidate itemsets) (join) (prune)

  • (k-1)-k- (candidate k-itemsets)Ckk-X1X2(k-1)-Xi[j]XijX1X2k-2X1[k-1]
  • 7-1X1X23-X1={1,3,5}X2={1,3,6}X1[1]=X2[1]=1X1[2]=X2[2]=3X1[3]
  • Apriori k-CkLkLk XCkApriori XX1 Apriori CkXk-1(k-1)-X k-XCk

  • 7-2X1X23-X1={1,3,5}X2={1,3,6}X1X24-{1,3,5,6}Apriori 4-{1,3,5,6}{1,3,5,6}3{1,3,5}{1,3,6}{1,5,6}{3,5,6}{1,3,5}{1,3,6}3-{1,5,6}{3,5,6}3-{1,3,5,6}4-{1,5,6}{3,5,6}3-{1,3,5,6}4-

  • Apriori 1 L1 = 1-;2 for (k = 2; Lk-1; k++) do begin3 Ck = Candidate_gen (Lk-1) 4 for each t 5 Ckctc c1 6 Lk = Ck 7 end8 return L =

  • Candidate_gen Procedure1for each X1 Lk-1 /* X1[1],X1[2], , X1[k-1]X1 k-1*/2 for each X2 Lk-1 /* X2[1],X2[2], , X2[k-1]X2 k-1*/3 if (X1[1]=X2[1]) (X1[2]=X2[2]) (X1[k-2]=X2[k-2]) (X1[k-1]
  • 7-3 (1)7-1Apriori3 1- C1

  • 7-3 (2) L1

  • 7-3 (3)L1C2

  • 7-3 (4)2-

  • 7-3 (5) L2

  • 7-3 (6)L2C3

    3-

  • 7-3 (7){{1},{2},{3},{4},{5},{6},{1,3},{1,5},{2,5},{3,5}}0.7 {1} {3} =3/4=0.75 {3} {1} =3/5=0.6 {1} {5} =3/4=0.75 {5} {1} =3/6=0.5 {2} {5} =3/5=0.6 {5} {2} =3/6=0.5 {3} {5} =3/5=0.6 {5} {3} =3/6=0.5 13

  • (Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)

  • 80%PC70%IBM PCViewSonic (lower concept level)

  • 7-5 IBM COMPAQ ASUS HP IBM Acer IBM Acer Toshiba

  • 7-14 CRT LCD 17 19 15 17

  • 7-15 A4 A3+

  • (lower) (higher) HP ViewSonic=0.01 =0.95

  • ViewSonic=0.7=0.9 (multilevel association rules)

  • (top-down) 1 (level-1) 2 (level-2) Apriori

  • (1)ixxi-1x

  • (2)ix i-1 1-x [=0.2]

    ()

    ()

    = 0.25

  • (1) [=0.2]

    [=0.12]

    [=0.08]

    = 0.3

    = 0.06

  • (2)k-ik-X i-1 k-k-X {,LCD}[ = 0.2]

    {,15LCD}[= 0.12]

    {,17LCD}[= 0.02]

    {,15LCD}[= 0.03]

    {,17LCD}[= 0.03]

    = 0.15 = 0.03

  • IBM 1122 1 1 1 2 2 3 2 4 IBM

  • 7-2

  • 7-3

  • 7-4T[1]

  • TT[1]L[j,k]jk-LL[j]jminsup[j]j

  • 7-4 1 2 3 4 5 T[1](7-4)7-3 1600 7-2IBM 1111

  • (1)1for (j=1L[j,1] and jj++) do begin /* 1 */2 if j=1 then {3 L[j,1] = Large_item_gen(T[1],j) /* T[1]1 1- */4 T[2] = Filtered_table(T[1],L[1,1]) /* L[1,1]T[1] */5 }6 else L[j,1] = Large_item_gen(T[2],j)

  • (2)7 for (k = 2;L[j,k-1]; k++) do begin /* j k- */8 Ck = Candidate_gen(L[j,k-1])9 for each T[2]t 10 Ckctcc 111 L[j,k] = Ckminsup[j]k- 12 end13 return LL[j] = j 14end

  • (3)3Large_item_gen (T[1],j) T[1]j1-L[j,1]1Large_item_gen(T[1],1) 1-L[1,1]6j(j>1)Large_item_gen(T[2],j) 1-L[j,1]L[j-1,1]L[j,1]2 11** 1-3 (111*) (112*)

  • (4)4Filtered_table(T[1],L[1,1]) L[1,1] T[1] ttttT[1]Filtered _table(T[1],L[1,1]) T[2]

  • 7-57-4114T[1]11-L[1,1]{4***} 235 8 4 * Filtered_table L[1,1]T[1]T[2]T[1] 2 3214 4 9 10

  • 7-6 (1)7-511-L[1,1]

  • 7-6 (2)T[1]T[2]

  • 7-6 (3)Candidate_gen12-14L[1,1]C2={{1***,2***}, {1***,4***}, {2***,4***}}{2***,4***}3L[1,2]={{1***,2***},{1***,4***}}L[1,2]C3={{1***,2***,4***}}{1***,2***,4***}3L[1,3]=

  • 7-6 (4)12-L[1,2]

  • 7-6 (5)222T[2]21-L[2,1]{41**} 2 3 8321-Candidate_genL[2,1]C2 = {{11**,12**}, {11**,21**}, {11**,22**}, {11**,41**}, {12**,21**}, {12**,22**}, {12**,41**}, {21**,22**}, {21**,41**}, {22**,41**}}2L[2,2] = {{11**,41**},{12**,21**},{12**,22**}}L[2,2]C3={{12**,21**,22**}}{12**,21**,22**}0L[2,3]=

  • 7-6 (6)2 L[2,1] L[2,2]

  • 7-6 (7)333T[2]31-L[3,1]Candidate_genL[3,1]C2={{111*,121*},{111*,211*},{111*,411*},{121*,211*},{121*,411*},{211*,411*}}3L[3,2] = {{121*,211*}}L[3,2]3-L[3,3]=

  • 7-6 (8)3L[3,1] L[3,2]

  • 7-6 (9)442T[2]41-L[4,1]Candidate_genL[4,1]C22L[4,2] ={{1212,2112}}L[4,2]3-L[4,3]=

  • 7-6 (10)4 L[4,1] L[4,2]

  • 7-7 (1)7-612340.80.70.70.6 1{1***} {2***} =7/8=0.875{2***} {1***} =7/7=1{1***} {4***} =4/8=0.5{4***} {1***} =4/4=1 124

  • 7-7 (2) 2{11**} {41**} =2/3=0.67{41**} {11**} =2/3=0.67{12**} {21**} =3/5=0.6{21**} {12**} =3/4=0.75{12**} {22**} =2/5=0.4{22**} {12**} =2/3=0.67 4

  • 7-7 (3) 3{121*} {211*} =3/3=1{211*} {121*} =3/4=0.75 12 4{1212} {2112} =2/2=1{2112} {1212} =2/3=0.67 12

  • 7-7 (4) 1 (7-5) 2 (7-14) 4 (7-15) =0.875 =1 =1 CRT =0.75 17CRT = 1 17CRT = 0.75 IBM 17CRT = 1 17CRT IBM = 0.67

  • (Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)

  • (1) 40% (quantitative association rule)

  • (2) (intervals)

  • q_ (q_item) q_ i qq_ q_ (q_itemset) q_ x q_x

  • q_ q_ q_

  • (1)i q_ , , ... , , ... q_

  • (2) T s 1 2 3 4

  • 7-8{,,,,}iq_q_5030100204050010%[1][2..3][4..5]123q_ ( )

  • (1)Xq_Xq_ttq_Xq_X q_X q_q_ (large q_itemset)kq_k-q_ (k-q_itemset)

  • (2) X Y [, ] X Y q_ Z q_XY

  • ()q_(LqiTid(large q_itemset generation using Tids))

  • 7-6DB

  • 7-6DB37-17DBDB7-7

  • 7-17 ABCDEFG

  • 7-7DB

  • q_(1)TS({x}) q_x (Tids) DBTS ({}) = {5,12,14} TS ({}) = {1,4,5,8}TS ({x1,x2}) q_x1x2TS ({x1}) TS ({x2}) TS ({x1,x2}) = TS ({x1}) TS ({x2}) TS ({,}) = TS ({}) TS ({}) ={5}

  • q_(2) x1,x2,...,xk q_TS ({x1,x2,...,xk}) q_{x1,x2,...,xk}SP ({x1,x2,...,xk}) TS ({x1,x2,...,xk}) : SP ({x1,x2,...,xk}) = Card (TS ({x1,x2,...,xk})) = Card (TS ({x1}) TS ({x2}) TS ({xk})) Card(S) S

  • 7-8q_ 7-7DBq_

  • q_(3)LqiTidq_SP({x1,x2,...,xk}) {x1,x2,...,xk} k-q_ q_{x1,x2,...,xk} k-q_q_ Candidate_gen(k-1)-q_k-q_ (candidate k-q_itemset)k-q_

  • q_(4)x[1]x[2]x[k-1](k-1)-q_ x k-1 q_Lkk-q_item(x[j]) q_x[j] q_{x[1],x[2],...,x[k-1]} item(x[1])
  • LqiTid LqiTid :q_TSSP1-q_q_SP1-q_k-q_k-q_CkTSSPk-q_

  • LqiTid 1 q_x TS({x}) SP({x}) /* */2L1={x | x q_ SP({x}) } /* 1-q_ */ 3for (k=2; |Lk-1| > 1; k++) do begin /* k-q_ */4 Lk-1k-q_Ck 5 for each q_c Ck do begin /* c (k-1)-q_ S1 S2 */ 6 TS(c)=TS(S1)TS(S2) SP(c)=Card(TS(c)) 7 If SP(c) then 8 Lk = Lk {c} 9 end 10end

  • 7-1027-81-q_7-9 2-q_7-9q_C22-q_7-102-q_ 3-q_7-10q_C33-q_7-113-q_C4=L4=

  • 7-91-q_

  • 7-102-q_

  • 7-113-q_

  • 7-117-100.657-17

    {} {} =2/3=0.67 {} {} =3/3=1 {} {C,[1..2]} =2/3=0.67 {} {} =2/2=1 {,} {} =2/3=0.67 {,} {} =2/2=1 {,} {} =2/2=1 {,} {} =2/2=1 {,} {} =2/2=1 {,} {} =2/3=0.67

  • (Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)

  • (1) 10000 60007500400030%60% [=40%, =67%] 75%67%

  • (2)P(AB) = P(A) P(B)AB (independent)AB (dependent and correlated)AB (correlation)

  • (3)correlation < 1 A B (negatively correlated) A B correlation > 1 A B (positively correlated) A B correlation = 1 A B 1 1

  • (Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)

  • Apriori (hash) (cache)