Similarity/Clustering 인공지능연구실 문홍구 2006. 1. 17. 2 Content What is Clustering ...

Similarity/Clustering

인공지능연구실문홍구

2006. 1. 17

Content

What is Clustering

Clustering Method Distance-based

-Hierarchical

-Flat Geometric embedding approach

-self-organizing maps

-multidimensional scaling

-latent semantic indexing

Formulations and Approaches

Partitioning Approaches One possible goal that we can set up for a clustering algorithm is t

partition the document collection into k subsets or clusters D1,···,Dk so

as to minimize the intracluster distance or maximize the intracluster resemblance.

Bottom-up clustering

Top-down clustering

Formulations and Approaches

Distance based

Hierarchical clustering

-The tree of hierarchical clustering can be produced Bottom-up(agglomerative clustering)

– start with the individual object and grouping the most similar ones

– join cluster with maximum similarity

Top-down(divisive clustering)

– start with all the object and divides them into groups in order to maximize within-group similarity

– split least coherent part in cluster

Three methods in hierarchical clustering

Single-link Similarity of two most similar members

Complete link Similarity of two least similar members

Group average Average similarity between members

Single link Clustering

Similarity of two most similar members => O(n2) Locally Coherent

close objects are in the same cluster

Chaining Effect Because of following a chain of large similarities without taking

into account the global context => low global cluster quality

Complete link Clustering

Similarity of two least similar members => O(n3) The function focused on global cluster quality

avoids elongated cluster a/f or b/e is tighter than a/d (tighter cluster are better than

‘straggly’ cluster)

Group average agglomerative clustering

Averages similarity between members The complexity of computing average similarity is O(n2)

Average similarities are computed at each time a new group is formed

compromise between single-link and complete-link

Comparison

Single-link Relative efficient Long straggly clusters

– Ellipsoidal cluster Loosely bound cluster

Complete-link Tightly bound cluster

Group average Intermediate between single and complete

Distance based

Flat clustering

-k – means

- k – means 군집방법은 계층적 군집 분석과는 달리 개체가 어느 한 군집에만 속하도록 하는 상호 배반적 군집 방법이다 .

이 방법은 군집의 수를 미리 정하고 , 각 개체가 어느 군집에 속 하는지를 분석하는 방법으로서 대량의 데이터의 군집분석에 유용하게 이용되는 방법이다 .

Distance based

k – means

Geometric Embedding Approaches

Self - organizing maps

Multidimensional scaling

Latent semantic indexing

★ A different form of partition-based clustering is to identify dense regions in space.

Self - organizing maps(SOMs)

- Self – organizing maps are a close cousin to k-means, except that

unlike k-means, which is concerned only with determining the

association between clusters and documents, the SOM algorithm

also embeds the clusters in a low – dimensional space right from

the beginning and proceeds in as way that places related clusters

close together in that space.

SOM : Example

SOM computed from over a million documents taken from 80 Usenet newsgroups. Light

areas have a high density of documents.

Multidimensional scaling (MDS)

-The goal of MDS is to present documents as point in a low – dimensional

space (often 2D-3D) such that the Euclidean distance between any pair of

points is as close as possible to the distance between them specified by the

Latent semantic indexing (LSI)

- The latent semantic indexing (LSI) method is an attempt to solve the

synonymy problem while staying within the vector space model

framework

Latent semantic indexing (LSI)

k-dim vector

Documents

Term Document

EM algorithm

A soft version of K-means clustering

① both cluster move towards the centroid of all three objects

② reach the stable final state

EM algorithm(2)

We want to calculate probability P(cj| vector xi)

Assume that clusteri has a normal distribution

Maximum likelihood of the form

1),;( 1

jj xxxn

jjjiji xnxP

),;()(

Procedure of EM

Expectation Step (E) Compute hij that is expectation of zij

Maximization Step (M)

Tjijiij

jiiijij

Θ);n|xP(

Θ);n|xP(Θ);x|E(zh

Similarity/Clustering 인공지능연구실 문홍구 2006. 1. 17. 2 Content What is Clustering ...

Documents

Transcript of Similarity/Clustering 인공지능연구실 문홍구 2006. 1. 17. 2 Content What is Clustering ...

Xen Clustering

Clustering 1

2. Clustering - LMU Munich · 26 2. Clustering Inhalt dieses Kapitels 3.1 Einleitung Ziel des Clustering, Anwendungen, Typen von Clustering-Algorithmen 3.2 Partitionierende Verfahren

pathos223.compathos223.com/for_student/onepoint_histopathologic... · 2013-11-18 · Grimelius pre-embedding L pre-embedding PAM Warthin- Starry Grocott Vol. Fontana-Masson post-embedding

4 Clustering

Improving of Clustering Partitions Fitness and Clustering ...

Embedding WikiVet in the curriculum

Clustering Ha

Text-Mining: Clustering - Philosophische Fakultät · Clustering im TM Flaches Clustering Hierarchisches Clustering Erweiterungen, LabelingLiteratur Cluster-Hypothese \Documents in

Word vectorization(embedding) with nnlm

Embedding Multimedia

YouTube Shooting, Uploading, and Embedding

05 Clustering

Clustering _ishii_2014__ch10

Embedding Watermarks into Deep Neural Networks

Distributed clustering algorithm for large scale clustering problemsuu.diva-portal.org/smash/get/diva2:676130/FULLTEXT01.pdf · Distributed clustering algorithm for large scale clustering

Clustering Clustering

(1) Clustering

De Embedding

kmean clustering