Weighted Cluster Ensembles: Methods and analysis
description
Transcript of Weighted Cluster Ensembles: Methods and analysis
Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
Weighted Cluster Ensembles:Methods and analysis
Presenter: Chien-Hsing Chen
Author:
Carlotta Domeniconi
Muna Al-Razgan
1
2009.TKDD.40..
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
2
Outline Motivation Objective Overall of clustering ensemble Method Experiments Conclusion Comment
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
3
High-dimensional A dimension (feature) is highly relevant to a cluster, but is irrelevant to
another cluster.
Common global dimensionality reduction techniques are unable to capture such local structure of the data.
it instead of using an equal weight for all w1, w2, …, wD.
using an equal weight for a wi among all clusters, where i =1, …, D,
Clustering ensemble An ensemble bag includes: K-means, SOM, … etc Alternative bag is: 3-means, 5-means, 7-means
How can a technique combine the two respects?
Motivation
baseball homerun shopping c1={sport}
c2={auction}
attribute name
w=(0.9, 0.8, 0.1)t
w=(0.1, 0.2, 0.9)t
w1,i ≠ w2,i
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
4
High-dimensional provide a first attempt to capture local structure of the data. LAC-h approach
Clustering ensemble LAC-1, LAC-3, LAC-29, …
Combine the two respects WSPA approach WBPA approach WSBPA approach
Objective
w1,i ≠ w2,i
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
5
Clustering ensemble
Overall work
s ( ) > s ( )
0.95 0.01
0.25
0.20
0.20
0.15
1. A new clustering approach is discussed• handle high-D
2. Three ensemble techniques are introduced• consensus function
3. Graph cut
clusteringpartition
0.13
0.91
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
6
Clustering ensemble
distance of a attribute i within a cluster j
LAC (locally adaptive clustering)
0.9 0.20
0.1 0.22
0.2 0.21
0.7 0.23
c1
c2
w=(0.9, 0.5, 0.1)t
w=(0.1, 0.5, 0.9)t
|nc1| = 4
|nc2| = 3
?
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
7
Clustering ensemble
Overall work
s ( ) > s ( )
0.950.01
0.25
0.20
0.20
0.15
1. A new clustering approach is discussed• handle high-D
2. Three ensemble techniques are introduced• consensus function
3. Graph cut
c1
0.13
0.91
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
8
WSPA 1/2 s ( )
0.94
0.02
0.04
0.060.90 0.04
P =(0.94, 0.04, 0.02)t
P =(0.90, 0.06, 0.02)t
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
9
Clustering ensemble
Two points have high similarity score if often appearing in the same partitions. Instance-based Graph cut
WSPA 2/2
0.950.01
0.25
0.20
0.20
0.15
0.13
0.91
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
10
Problem definition
WBPA 1/3
…
and are never clustered together
the groups to which and belong share the same instances
≡ 0
0.940.02
0.04
0.910.03 0.06 P =(0.94, 0.04, 0.02)t
P =(0.03, 0.91, 0.06)t
Graph
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
11
WBPA 2/3
The Graph is connect between a cluster and an instance instead of that among data
Graph
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
12
WBPA 3/3
0.940.64
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
13
WSBPA
0.94
0.64
0.93
0.94 0.91
0.86 0.85
0.89
0.930.94
0.86
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
14
WSBPA
0.94
0.64
0.01
0.04 0.01
0.89
0.86 0.85
0.89
0.030.04
0.86
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
15
Experiment
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
16
Experiment
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
17
Experiment
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
18
Experiment
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
19
Experiment
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
20
Experiment w1,i ≠ w2,i
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
21
Experiment
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
22
High-dimensional LAC-h approach
Clustering ensemble LAC-1, LAC-3, LAC-29, …
Combine the two respects WSPA approach WBPA approach WSBPA approach
Conclusion
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
23
Comment Advantage
Consensus function
Drawback
Application Ensemble clustering on SOM