8/3/2019 10 Assrule Eng
1/39
Association rules Introduction
Association rules
A Binary database is a database where each row is composed ofbinary attributes
A B C D . . .T1 1 0 0 1 . . .T2 0 1 1 1 . . .T3 1 0 1 0 . . .T4 0 0 1 0 . . .
From this database frequent patterns of occurrence can be discovered
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 1 / 39
8/3/2019 10 Assrule Eng
2/39
Association rules Introduction
Association rules
This kind of databases appear for example on transactional data(market basket analysis)
An association rule represents coocurence of events in the database
TID Items1 Bread, Chips, Beer, Yogourt, Eggs2 Flour, Beer, Eggs, Butter,3 Bread, Ham, Eggs, Butter, Milk4 Flour, Eggs, Butter, Chocolate5 Beer, Chips, Bacon, Eggs
{Flour} {Eggs}{Beer} {Chips}{Bacon} {Eggs}
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 2 / 39
8/3/2019 10 Assrule Eng
3/39
Association rules Definitions
Definitions (I)
We define R as the set of attributes of the database
We define X as a subset of attributes from R.
We say that X is a pattern from DB if there are any row where all
the attributes of X are 1.We define the support (frequency) of a pattern X as the function:
fr(X) =|M(X, r)|
|r|
Where |M(X, r)| is the number of times that X appears in the DBand |r| is the size of the BD.
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 3 / 39
8/3/2019 10 Assrule Eng
4/39
Association rules Definitions
Definitions (II)
Given a minimum support (min sup [0, 1]), we say that the patternX
is frequent if: fr(X) min sup
F(r, min sup) is the set of patterns that are frequent in :
F(r, min sup) = {X R/fr(X) min sup}
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 4 / 39
A i i l D fi i i
8/3/2019 10 Assrule Eng
5/39
Association rules Definitions
Definitions (III)
Given a BD with R attributes and X and Y attribute subsets, withX Y = , an association rule is the expression:
X Y
conf(X Y) is the confidence of an association rule, computed as:
conf(X Y) =fr(X Y)
fr(X)
We consider a minimum value of confidence (min conf [0, 1])
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 5 / 39
A i i l ith Fi di i ti l
8/3/2019 10 Assrule Eng
6/39
Apriori algorithm Finding association rules
Finding association rules
Given a minimum confidence and a minimum support, a associationrule (X Y) exists in a DB if:
(fr(X Y) min sup) (conf(X Y) min conf )
Trivial approach:
It is enough to find all frequent subsets from RWe have to explore X Y ((X R) (Y X)) the association rule
(X Y) Y
There are 2|R| candidates
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 6 / 39
Apriori algorithm Finding association rules
8/3/2019 10 Assrule Eng
7/39
Apriori algorithm Finding association rules
Properties of frequent sets
Given X and Y with Y X:
If fr(Y) fr(X), if X is frequent, Y also is frequentIf any subset Y from X it is not frequent then X is not frequent
The feasible approach is to find the frequent sets starting from size 1and increasing its size
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 7 / 39
Apriori algorithm Finding association rules
8/3/2019 10 Assrule Eng
8/39
Apriori algorithm Finding association rules
Finding frequent sets
Fl(r, min sup) is the set of frequent sets from R of size l.
Given a set of patterns of size L, the only frequent set candidates oflength L + 1 will be those which all subsets are in the frequent sets of
length L.
C(Fl+1(r)) = {X R/|X| = l+1 Y X |Y| = l Y Fl(r)}
The computing of association rules can be done iteratively starting
with the smallest frequent subsets until the complete subset isanalyzed
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 8 / 39
Apriori algorithm Example: Properties
8/3/2019 10 Assrule Eng
9/39
Apriori algorithm Example: Properties
A DCBL=1
L=2
L=3
L=4 ABCD
ABC ABD ACD BCD
AB AC AD BC BD CD
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 9 / 39
Apriori algorithm Example: Properties
8/3/2019 10 Assrule Eng
10/39
Apriori algorithm Example: Properties
A DC
L=1
L=2
L=3
L=4 ABCD
ABC ABD ACD BCD
AB AC AD BC BD CD
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 10 / 39
Apriori algorithm Example: Properties
8/3/2019 10 Assrule Eng
11/39
p g p p
A DC
L=1
L=2
L=3
L=4 ABCD
ABC ABD ACD BCD
AC AD CD
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 11 / 39
Apriori algorithm Example: Properties
8/3/2019 10 Assrule Eng
12/39
p g p p
A DC
L=1
L=2
L=3
L=4 ABCD
ACD
AC AD CD
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 12 / 39
Apriori algorithm Example: Properties
8/3/2019 10 Assrule Eng
13/39
g
A DC
L=1
L=2
L=3
L=4
ACD
AC AD CD
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 13 / 39
Apriori algorithm Algorithm
8/3/2019 10 Assrule Eng
14/39
Association Rules Algorithm
Algorithm: Apriori (R,min sup,min conf)C,CCan,CTemp:set of frequent subsetsRAs:Set of association rules, L:integerL=1CCan=Frequent sets 1(R,fr min)RAs=while CCan = do
L=L+1CTemp=Candidate sets(L,R,CCan) = C(Fl+1(r))
C=Frequent sets(CTemp,min sup)RAs=RAs Confidence rules(C,min conf)CCan=C
end
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 14 / 39
Apriori algorithm Algorithm
8/3/2019 10 Assrule Eng
15/39
Effect of Support
The specific value used as min sup has effect on the computationalcost of the search
If it is too high, only a few patterns will appear (we could miss
interesting rare occurrences)If it is too low the computational cost will be too high (too manyassociation will appear)
Unfortunately the threshold value can no be known beforehand
Sometimes multiple minimum supports are needed (different fordifferent groups of items)
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 15 / 39
Apriori algorithm Measures of interestingness
8/3/2019 10 Assrule Eng
16/39
Measures for association rules interestingness
Sometimes the confidence of a rule does not represent well itsinterestingness
Y YX 15 5 20
X 75 5 8090 10 100
conf(X Y) = 0,75 but conf(X Y) = 0,9357
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 16 / 39
Apriori algorithm Measures of interestingness
8/3/2019 10 Assrule Eng
17/39
Measures for association rules interestingness
Other measures can be computed using probabilistic independence
Lift(X Y) =P(Y|X)
P(Y)
Interest(X Y) =P(X, Y)
P(X) P(Y)PS(X Y) = P(X, Y) P(X) P(Y)
There are plenty of measures in the literature, some good for certain
application, but not for others
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 17 / 39
Apriori algorithm Example
8/3/2019 10 Assrule Eng
18/39
Example (1)
A B C D E F
1 1 1 0 0 1 02 1 0 1 0 1 13 0 1 1 1 0 14 1 1 1 0 0 05 0 1 0 0 1 16 1 0 1 1 0 07 1 1 0 0 1 0
8 1 1 0 0 0 09 1 0 1 1 0 1
10 1 1 1 0 1 111 1 1 0 0 1 012 1 1 0 1 1 013 1 1 1 1 1 114 0 1 1 0 0 015 1 1 0 1 0 016 1 1 0 0 1 017 0 1 0 0 1 018 0 1 1 0 1 019 1 0 0 0 1 020 1 1 0 0 1 0
fr([A])=15/20
fr([B])=16/20fr([C])= 9/20fr([D])= 6/20fr([E])=13/20fr([F])= 6/20
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 18 / 39
Apriori algorithm Example
8/3/2019 10 Assrule Eng
19/39
Example (2)
Supposing a minimum support of 25 %. All columns have larger frequency,thus:
F1= {[A],[B],[C],[D],[E],[F]}From this we can compute the candidates for frequent sets (all
combinations):
C(F1)= {[A,B],[A,C],[A,D],[A,E],[A,F],[B,C],[B,D],[B,E],[B,F],[C,D],[C,E],[C,F],
[D,E],[D,F],[E,F]}
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 19 / 39
Apriori algorithm Example
8/3/2019 10 Assrule Eng
20/39
AB AC AD AE AF BC BD BE BF CD CE CF DE DF EF
FR=25%
A B C D E F
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 20 / 39
Apriori algorithm Example
8/3/2019 10 Assrule Eng
21/39
Example (3)
fr([A,B])= 11/20 fr([B,C])= 6/20 fr([C,D])= 4/20 fr([D,E])= 2/20fr([A,C])= 6/20 fr([B,D])= 4/20 fr([C,E])= 4/20 fr([D,F])= 2/20fr([A,D])= 4/20 fr([B,E])= 11/20 fr([C,F])= 5/20 fr([E,F])= 4/20fr([A,E])= 10/20 fr([B,F])= 4/20fr([A,F])= 4/20
Of the candidates, the ones that have a support greater than 25 % are:
F2={[A,B],[A,C],[A,E],[B,C],[B,E],[C,F]}
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 21 / 39
Apriori algorithm Example
8/3/2019 10 Assrule Eng
22/39
A B C D E F
AF BD BF CD CE DE DF EFAB AC AE BC BE CFAD
FR=25%
ABC ABE
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 22 / 39
Apriori algorithm Example
8/3/2019 10 Assrule Eng
23/39
Example (4)
The confidence of the association rules are:
conf(A B)= 11/15 conf(A E)= 10/15 conf(B E)= 11/16conf(B A)= 11/16 conf(E A)= 10/13 conf(E B)= 11/13conf(A C)= 6/15 conf(B C)= 6/16 conf(C F)= 5/9conf(C A)= 6/9 conf(C B)= 6/9 conf(F C)= 5/13
If we choose as confidence value 2/3 the rules that are significant:
{A B, B A, C A, A E, E A, C B, B E, E B}
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 23 / 39
Apriori algorithm Example
8/3/2019 10 Assrule Eng
24/39
Example (5)
From the set F2, we can compute the set of candidates for F3
C(F2)={[A,B,C],[A,B,E]}
fr([A,B,C])= 3/20 fr([A,B,E])= 8/20
Of the candidates, the ones that have support greater than 25 % are:
F3={[A,B,E]}
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 24 / 39
Apriori algorithm Example
8/3/2019 10 Assrule Eng
25/39
A B C D E F
AF BD BF CD CE DE DF EFAD
FR=25%
AB AC AE BC BE CF
ABEABC
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 25 / 39
Apriori algorithm Example
8/3/2019 10 Assrule Eng
26/39
Example (6)
The confidence of their corresponding association rules are:
conf(A B E)= 8/11conf(A E B)= 8/10conf(B E A)= 8/11
If we choose as confidence value 2/3 the rules that are significant:
{A B E, A E B, B E A}
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 26 / 39
Apriori algorithm Example
8/3/2019 10 Assrule Eng
27/39
Example (7)
Now we choose a minimum support of 50 %. The columns that have largerfrequency are only A, B and E. Therefore:
F1= {[A],[B],[E]}
From this we can compute the support set candidates:
C(F1)={[A,B],[A,E],[B,E]}
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 27 / 39
Apriori algorithm Example
( )
8/3/2019 10 Assrule Eng
28/39
Example (8)
All have a support greater than 50 %, therefore
F2={[A,B],[A,E],[B,E]}
And the candidates are:
C(F2)={[A,B,E]}
That also have a support greater than 50 %, thus:
F3={[A,B,E]}
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 28 / 39
FP-Grow Introduction
G
8/3/2019 10 Assrule Eng
29/39
FP-Grow
Han, Pei, Yin, Mao Mining Frequent Patterns without CandidateGeneration: A Frequent-Pattern Tree Approach (2004)
The problem of the apriori approach is that the number of candidates
to explore in order to find long patterns can be very largeThis approach tries to obtain patterns from transaction databaseswithout candidate generation
It is based on a specialized data structure (FP-Tree) that summarizesthe frequency of the patterns in the DB
The patterns are explored incrementally adding prefixes withoutcandidate generation
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 29 / 39
FP-Grow FP-tree
FP T
8/3/2019 10 Assrule Eng
30/39
FP-Tree
The goal of this structure is to avoid to query the DB to compute thefrequency of patterns
It assumes that there is an order among the elements of a transactionThis order allows to obtain common prefixes from the transactions
The transactions with common prefixes are merged in the structuremaintaining the frequency of each subprefix
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 30 / 39
FP-Grow FP-tree
FP T Al i h
8/3/2019 10 Assrule Eng
31/39
FP-Tree - Algorithm
1 Compute the frequency of each individual item in the DB
2 Create the tree root (empty prefix)3 For each transaction from the DB
1 Pick the more frequent items2 Order the items by their original frequency3 Iterate for each item, inserting it in the tree
If the node has a descendant equal to the actual item increase its
frequency
Otherwise, a new node is created
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 31 / 39
FP-Grow FP-tree
FP T E l
8/3/2019 10 Assrule Eng
32/39
FP-Tree - Example
BD Transactions
1 a, e, f 2 a, c, b, f, g3 b, a, e, d4 d, e, a, b, c5 c, d, f, b6 c, f, a, d
Item frequency
a 5b 4
c 4d 4e 3f 4g 1
BD Transactions (fr=4)
1 a, f 2 a, c, b, f 3 a, b, d4 a, b, c, d5 b, c, d6 a, c, d, f
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 32 / 39
FP-Grow FP-tree
FP T E l
8/3/2019 10 Assrule Eng
33/39
FP-Tree - Example
a
b
c
d
f4
4
4
4
5
a
f
1
1
Trans(a,f)
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 33 / 39
FP-Grow FP-tree
FP T E l
8/3/2019 10 Assrule Eng
34/39
FP-Tree - Example
a
b
c
d
f4
4
4
4
5
a
f1
b
c
f
2
1
1
1
Trans(a,b,c,f
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 34 / 39
FP-Grow FP-tree
FP Tree Example
8/3/2019 10 Assrule Eng
35/39
FP-Tree - Example
a
b
c
d
f4
4
4
4
5
a
f1
b
c
f
1
1
d1
Trans(a,b,d)
2
3
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 35 / 39
FP-Grow FP-tree
FP Tree Example
8/3/2019 10 Assrule Eng
36/39
FP-Tree - Example
a
b
c
d
f4
4
4
4
5
a
f1
b
c
f1
d1
Trans(a,b,cd)
d
1
2
3
4
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 36 / 39
FP-Grow FP-tree
FP Tree Example
8/3/2019 10 Assrule Eng
37/39
FP-Tree - Example
a
b
c
d
f4
4
4
4
5
a
f1
b
c
f1
d1
d
1
2
3
4
Trans(b,c,d,f
b
c
d
f
1
1
1
1
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 37 / 39
FP-Grow FP-tree
FP Tree Example
8/3/2019 10 Assrule Eng
38/39
FP-Tree - Example
a
b
c
d
f4
4
4
4
5
a
f1
b
c
f1
d1
d
1
2
3b
c
d
f
1
1
1
1
Trans(a,c,d,f
d
f
c
1
1
1
5
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 38 / 39
FP-Grow Pattern extraction
Pattern extraction
8/3/2019 10 Assrule Eng
39/39
Pattern extraction
From this data structure we can obtain all frequent patterns using theproperties of a FP-tree:
For any given frequent item ai we can obtain all the frequent patterns
that contain the item following the links ofai
in the treeTo compute the frequent patterns with ai as suffix only the prefixsubpaths of the nodes ai have to be used and their frequencycorresponds to the frequency of node ai
The frequent patterns can be obtained from a recursive traversal of
the paths in the FP-tree and the combination of subpatterns
Javier Bejar cbea (LSI - Master AI) Association rules TAVA - Term 2010/2011 39 / 39
Top Related