Inventory management, loading strategy and warehouse categorization
Improving Text Categorization Bootstrapping via Unsupervised Learning
description
Transcript of Improving Text Categorization Bootstrapping via Unsupervised Learning
Improving Text Categorization Bootstrapping via Unsupervised Learning
Presenter : Bo-Sheng Wang Authors :ALFIO GLIOZZO, IDO DAGAN
TSLP, 2009
1
Outlines
• Motivation• Objectives• Methodology• Evaluation• Experiments• Conclusions• Comments
2
Motivation
• Supervised systems for text categorization require large amounts of hand-labeled texts
• IL inherently suffers from a score scaling problem and very little information about the intension of a category.
3
Objectives
• Investigate and improve two specific weaknesses that inherently affect the IL schema.
Latent Semantic Index
Gaussian Mixture Algorithm
4
Methodology-Latent Semantic Index
5
Vector Semantic Model
6
Methodology-Latent Semantic Index
7
Methodology-Latent Semantic Index
8
Methodology-Gaussian Mixture Algorithm
9
• This paper propose mapping the similarity values into class posterior probabilities using unsupervised estimation of Gaussian mixtures.
Methodology-Gaussian Mixture Algorithm
10
Seeds
11
Evaluation-Impact of LSI Similarity and GM on IL Performance
12
Evaluation-Extensional vs. Intensional Learning
• A major of a comparison between IL and EL is the amount of supervision required to obtain level of performance.
13
Experiments –
14
Conclusions
• We obtained competitive performance using only the category names as initial seeds.
• Drastically reduce the number of seeds while significantly improving the performance.
15
Comments
• Advantages– Performance,
• Disadvantage– Time
• Applications– Text Mining
16