CanTree: a tree structure for efficient incremental mining of frequent patterns
description
Transcript of CanTree: a tree structure for efficient incremental mining of frequent patterns
![Page 1: CanTree: a tree structure for efficient incremental mining of frequent patterns](https://reader035.fdocument.pub/reader035/viewer/2022062806/56814f45550346895dbce7c8/html5/thumbnails/1.jpg)
CanTree: a tree structure for efficient incremental mining of frequent patterns
Carson Kai-Sang Leung, Quamrul I. Khan, Tariqul Hoque
ICDM’05
報告者:林靜怡2006/11/15
![Page 2: CanTree: a tree structure for efficient incremental mining of frequent patterns](https://reader035.fdocument.pub/reader035/viewer/2022062806/56814f45550346895dbce7c8/html5/thumbnails/2.jpg)
Introduction
Many existing incremental mining algorithms are Apriori-based
not easily adoptable to FP-tree based frequent-pattern mining
![Page 3: CanTree: a tree structure for efficient incremental mining of frequent patterns](https://reader035.fdocument.pub/reader035/viewer/2022062806/56814f45550346895dbce7c8/html5/thumbnails/3.jpg)
Related Work
The FELINE Algorithm with the CATS Tree
The AFPIM Algorithm
![Page 4: CanTree: a tree structure for efficient incremental mining of frequent patterns](https://reader035.fdocument.pub/reader035/viewer/2022062806/56814f45550346895dbce7c8/html5/thumbnails/4.jpg)
The FELINE Algorithm with the CATS Tree
CATS tree (Compressed and Arranged Transaction Sequences tree) Allows frequent-pattern mining without t
he generation of candidate itemsets requires one database scan to build the t
ree
![Page 5: CanTree: a tree structure for efficient incremental mining of frequent patterns](https://reader035.fdocument.pub/reader035/viewer/2022062806/56814f45550346895dbce7c8/html5/thumbnails/5.jpg)
CATS Tree New transactions are added at the root level At each level, items of the new transaction are
compared with children (or descendant) nodes. If the same items exist in both
1.the transaction is merged with the node at the
highest frequency level 2.The remainder of the transaction is then
added to the merged nodes repeated recursively until all common items
are found.
![Page 6: CanTree: a tree structure for efficient incremental mining of frequent patterns](https://reader035.fdocument.pub/reader035/viewer/2022062806/56814f45550346895dbce7c8/html5/thumbnails/6.jpg)
CATS Tree Any remaining items of the transaction
are added as a new branch to the last merged node.
The frequency of a node is lower than or equal to the frequencies of its ancestors
If the frequency of a node becomes higher than its ancestors, then it has to swap with the ancestors
![Page 7: CanTree: a tree structure for efficient incremental mining of frequent patterns](https://reader035.fdocument.pub/reader035/viewer/2022062806/56814f45550346895dbce7c8/html5/thumbnails/7.jpg)
![Page 8: CanTree: a tree structure for efficient incremental mining of frequent patterns](https://reader035.fdocument.pub/reader035/viewer/2022062806/56814f45550346895dbce7c8/html5/thumbnails/8.jpg)
Weaknesses
tree construction could be computationally
expensive checks existing tree paths one-by-one u
ntil a mergeable one is found extra cost is required for the swapping o
r merging of nodes.
![Page 9: CanTree: a tree structure for efficient incremental mining of frequent patterns](https://reader035.fdocument.pub/reader035/viewer/2022062806/56814f45550346895dbce7c8/html5/thumbnails/9.jpg)
The AFPIM Algorithm Adjusting FP-tree for Incremental Mining all the “frequent” items are arranged in
descending order of their global frequency when the ordering is changed, items in the
tree need to be adjusted When previously infrequent item becomes
“frequent” in the updated database, it needs to rescan and build a new FP-tree.
![Page 10: CanTree: a tree structure for efficient incremental mining of frequent patterns](https://reader035.fdocument.pub/reader035/viewer/2022062806/56814f45550346895dbce7c8/html5/thumbnails/10.jpg)
preMinsup:35%minsup:55%
4 x 0.35 = 1.4
![Page 11: CanTree: a tree structure for efficient incremental mining of frequent patterns](https://reader035.fdocument.pub/reader035/viewer/2022062806/56814f45550346895dbce7c8/html5/thumbnails/11.jpg)
Weaknesses
the amount of computation spent on swapping, merging, and splitting tree nodes
requirement for an additional mining parameter preMinsup finding an appropriate value for this para
meter is not easy
![Page 12: CanTree: a tree structure for efficient incremental mining of frequent patterns](https://reader035.fdocument.pub/reader035/viewer/2022062806/56814f45550346895dbce7c8/html5/thumbnails/12.jpg)
Weaknesses
when the database is updated, item frequencies may have changed. This results in changes in the ordering.
Both FELINE and AFPIM algorithms need lots of swapping, merging, and splitting of tree nodes
![Page 13: CanTree: a tree structure for efficient incremental mining of frequent patterns](https://reader035.fdocument.pub/reader035/viewer/2022062806/56814f45550346895dbce7c8/html5/thumbnails/13.jpg)
Canonical-Order Tree (CanTree) requires one database scan items are arranged according to some
canonical order in lexicographic order or alphabeticalord
er some specific order depending on the ite
m properties
![Page 14: CanTree: a tree structure for efficient incremental mining of frequent patterns](https://reader035.fdocument.pub/reader035/viewer/2022062806/56814f45550346895dbce7c8/html5/thumbnails/14.jpg)
Property Property 1 The ordering of items is unaffected by
the changes in frequency caused by incremental updates.
Property 2 The frequency of a node in the CanTre
e is at least as high as the sum of frequencies of its children.
![Page 15: CanTree: a tree structure for efficient incremental mining of frequent patterns](https://reader035.fdocument.pub/reader035/viewer/2022062806/56814f45550346895dbce7c8/html5/thumbnails/15.jpg)
CanTree Transactions can be easily added to t
he CanTree without any extensive searches for mergeable paths
mine frequent patterns from the tree in a fashion similar to FP-growth(a divide-and-conquer approach).
![Page 16: CanTree: a tree structure for efficient incremental mining of frequent patterns](https://reader035.fdocument.pub/reader035/viewer/2022062806/56814f45550346895dbce7c8/html5/thumbnails/16.jpg)
g: eg,deg,cdeg,bcdeg,abcdege: de,cde,bcde,abcde,ce, bce,abce,de,bde,abdef: ef,def,bdef,abdefd: cd,bcd,abcd,bd,abdc: bc,abcb: ab
![Page 17: CanTree: a tree structure for efficient incremental mining of frequent patterns](https://reader035.fdocument.pub/reader035/viewer/2022062806/56814f45550346895dbce7c8/html5/thumbnails/17.jpg)
Discussion CanTrees can be used for incremental constrained mining Efficiency and Memory Issues
On the surface, it appears CanTree may take a large amount of memory.
CanTree may not be as compact as the CATS tree,but it significantly reduce computation and time
assume we have enough main memory space
![Page 18: CanTree: a tree structure for efficient incremental mining of frequent patterns](https://reader035.fdocument.pub/reader035/viewer/2022062806/56814f45550346895dbce7c8/html5/thumbnails/18.jpg)
Experiment Database:generated by the program develo
ped at IBM Almaden Research Center consists of 1M records with an average transaction length of 10 items and a domain
of 1000 items time-sharing environment in a 1 GHz machi
ne
![Page 19: CanTree: a tree structure for efficient incremental mining of frequent patterns](https://reader035.fdocument.pub/reader035/viewer/2022062806/56814f45550346895dbce7c8/html5/thumbnails/19.jpg)
Experiment
![Page 20: CanTree: a tree structure for efficient incremental mining of frequent patterns](https://reader035.fdocument.pub/reader035/viewer/2022062806/56814f45550346895dbce7c8/html5/thumbnails/20.jpg)
Experiment
![Page 21: CanTree: a tree structure for efficient incremental mining of frequent patterns](https://reader035.fdocument.pub/reader035/viewer/2022062806/56814f45550346895dbce7c8/html5/thumbnails/21.jpg)
Experiment
![Page 22: CanTree: a tree structure for efficient incremental mining of frequent patterns](https://reader035.fdocument.pub/reader035/viewer/2022062806/56814f45550346895dbce7c8/html5/thumbnails/22.jpg)
Experiment
![Page 23: CanTree: a tree structure for efficient incremental mining of frequent patterns](https://reader035.fdocument.pub/reader035/viewer/2022062806/56814f45550346895dbce7c8/html5/thumbnails/23.jpg)
Conclusion
provide the user with a simple, but powerful, tree structure for efficient FP-tree based incremental mining
CanTree can be easily maintained Can used for efficient incremental con
strained mining