Extracting Why Text Segment from Web Based on Grammar-gram

19
Extracting Why Text Segment from Web Based on Grammar-gram Iulia Nagy, Master student, 2010- 02-27

description

Extracting Why Text Segment from Web Based on Grammar-gram. Iulia Nagy, Master student, 2010-02-27. Summary. Introduction Related work Rule Based Methods Machine Learning Approach “Bag of Function Words” method Method outline Adaptation of “Bag of Function Words” to English - PowerPoint PPT Presentation

Transcript of Extracting Why Text Segment from Web Based on Grammar-gram

Page 1: Extracting Why Text Segment from Web Based on Grammar-gram

Extracting Why Text Segment from Web Based on Grammar-gram

Iulia Nagy, Master student, 2010-02-27

Page 2: Extracting Why Text Segment from Web Based on Grammar-gram

www.***.com

SummaryIntroductionRelated work

Rule Based MethodsMachine Learning Approach

“Bag of Function Words” methodMethod outline

Adaptation of “Bag of Function Words” to English

Experiments and EvaluationConclusion and Remarks

-2-

Page 3: Extracting Why Text Segment from Web Based on Grammar-gram

www.***.com

Problem tremendous growth of the Internet    information

hard to find

-3-

Page 4: Extracting Why Text Segment from Web Based on Grammar-gram

www.***.com

Solution Create QA system

system capable to give an exact answer to an exact questiondetect answer from arbitrary corpora

Purpose obtain viable information rapidly

-4-

Page 5: Extracting Why Text Segment from Web Based on Grammar-gram

www.***.com

Purpose of our research

Create a why-QA system with automatically-built classifier

Classifier Use a model presented in Japanese Literature

created using Machine learning based on Bag of Grammar approach

Purpose of this paper

-5-

adapt Japanese method to English

test effectiveness of the method on English

Page 6: Extracting Why Text Segment from Web Based on Grammar-gram

www.***.com

Related word

Two main trends Rule Based methods

Machine Learning methods

Preprocess text

Detect patterns

Create set of rules

Apply rules to identify why-answer from

text

Preprocess text

Identify and extract relevant features

Create classification scheme

Classify

-6-

Page 7: Extracting Why Text Segment from Web Based on Grammar-gram

www.***.com

Rule based in why-QA

Suzan Vererne’s ApproachImprove performance by re-ranking

Method : weight the score assigned to a QA-pair by QAP

with a number of syntactic features.

+ -

Hardly adaptable to various languages

Deep grammar

knowledge

Labour intensive

Importance of syntax

Effective

-7-

Page 8: Extracting Why Text Segment from Web Based on Grammar-gram

www.***.com

Machine Learning method

Higashinaka and Isozaki’s ApproachAcquire causal expression from Japanese EDR

dictionary Method :

train a ranker based on clause structures extracted from EDR

+ -

Hardly adaptable to various languages

Not fully automated:

based on EDR

EDR rather high priced

Partially automated

Effective

-8-

Page 9: Extracting Why Text Segment from Web Based on Grammar-gram

www.***.com

Machine Learning method

Tanaka’s ApproachBuild why-classifier with function words as

featuresMethod :

Bag of function words

Adaptable to different languages

Domain independent

ScalableEffective

Fully automated

-9-

Page 10: Extracting Why Text Segment from Web Based on Grammar-gram

www.***.com

Bag of function words Machine learning approach to automatically build

domain independent why-classifier based of function words

Conditions to obtain domain-independence

Class fulfilling conditions

Convergence and reasonable size of feature space

Generality of features in feature space

Ability of features to discriminate causality

Function words

-10-

Page 11: Extracting Why Text Segment from Web Based on Grammar-gram

www.***.com

Bag of function words Method – same baseline for Japanese and English

Ts 1

Ts 2

Ts n…

Extract function words

Tag •label all words with POS tagger

Classify

•Determine POS for function words

Create feature space

for because at after in under which that why to therefore

Create feature vectors

Fv 1

Fv 2

Fv n…

Trainer Classification scheme

Loogit Boostweak learners

Mapping using “tf-idf” on function words

-11-

{ ( x⃗ i , y i )} є

Vectors' format:

Page 12: Extracting Why Text Segment from Web Based on Grammar-gram

www.***.com

Adaptation to English Differences

Adjustments Identify eligible function words in English

Japanese

•Forms phrases by adding new words at the end of the phrase

•Use of particles to define syntactic roles in a phrase

English•Forms phrases by adding new words at the beginning of

the phrase•Words do not belong to an only grammatical category

-12-

Page 13: Extracting Why Text Segment from Web Based on Grammar-gram

www.***.com

Experiment Data

ProcessingLabel all words with POS and extract function wordsCalculate tf-idf for each function wordMap features from feature set into feature vectors

216 Why

answers

216 definitio

ns

Dataset : 432 text segment

s

-13-

Page 14: Extracting Why Text Segment from Web Based on Grammar-gram

www.***.com

Experiment Classifier

Used Loogit Boost (Weka) with Decision stump Created 5 classifiers (50, 100, 150, 200, 250 iterations)

Evaluation10-fold cross validationModels trained on 9 folds and tested on 1Measured precision, recall and F-measure

-14-

Page 15: Extracting Why Text Segment from Web Based on Grammar-gram

www.***.com

Results – why text segments

50 100 150 200 250

0.620.640.660.680.7

0.720.740.760.78

Evaluation of classifiers for why-TS

RecallF-measurePrecision

No of iterations

-15-

Page 16: Extracting Why Text Segment from Web Based on Grammar-gram

www.***.com

Results – non why text segments (NWTS)

50 100 150 200 250

0.660.680.7

0.720.740.760.780.8

Evaluation of classifiers for NWTS

PrecisionF-measureRecall

No of iterations

-16-

Page 17: Extracting Why Text Segment from Web Based on Grammar-gram

www.***.com

ConclusionResults

321 instances out of 432 correctly classified 76.1% precision and 70.6% recall on WTS72.6% precision and 77.9% recall on NWTS

WTS NWTS0.65

0.7

0.75

0.8Global results

PrecisionF-measureRecall

Type of TS

Method effective on English

-17-

Page 18: Extracting Why Text Segment from Web Based on Grammar-gram

www.***.com

Future worksExperiment with a increased dataset (>

5000)Use Yahoo!Answers database to extract datasetInterest

Include causative construction in the analysis

to identify optimal number of iteration

to make a better selection of the function words to be used English

English often expresses cause by a closed set of verbs or nouns

Increase accuracy of the classifier

-18-

Page 19: Extracting Why Text Segment from Web Based on Grammar-gram

www.***.com

Questions and remarks

Thank you for your attention !

-19-