슬라이드 1

42
Survey of Semantic Annotation Platform CILAB Seminar 2008/03/21 [email protected]

description

 

Transcript of 슬라이드 1

Page 1: 슬라이드 1

Survey of Semantic Anno-tation Platform

CILAB Seminar2008/03/21

[email protected]

Page 2: 슬라이드 1

Contents

Paper Overview

Wrapper Induction

Pattern-based Approch

Rule-based Approach

Conclusion

Page 3: 슬라이드 1

Introducing Paper

Surveys on the Semantic Annotation Platform Writer: Lawrence Reeve, Hyoil Han Affiliation: Drexel University (Philadelphia) ACM Symposium on Applied Computing

They examined Semantic Web annotation platforms▪ Platform Classification, Overview, Evaluation Comparison

What I want to get … Annotation hint for our Project Term unification – Pattern, Rule, … For my research

Page 4: 슬라이드 1

Platform Classification

Pattern-based Discovery: Seed expansion Rules: Taxonomy Label Matching

Machine Learning-based Probabilistic: HMM, N-gram analysis Induction: Linguistic, Structural

Page 5: 슬라이드 1

Platform Overview

Platform Method Machine Learning

Manual Rules

Bootstrap Ontology

AeroDAML Rule N Y WordNet

Armadillo Pattern Discovery

N Y User

KIM Rule N Y KIMO

MnM Wrapper Induction

Y N KMi

MUSE Rule N Y User

Ont-O-Mat: Amilcare

Wrapper Induction

Y N User

Ont-O-Mat: PANKOW

Pattern Discovery

N N User

SemTag Rule N N TAP

Page 6: 슬라이드 1

Platform Overview - 2

Platform Method Machine Learning

Manual Rules

Bootstrap Ontology

AeroDAML Rule N Y WordNet

Armadillo Pattern Discovery

N Y User

KIM Rule N Y KIMO

MnM Wrapper Induction

Y N KMi

MUSE Rule N Y User

Ont-O-Mat: Amilcare

Wrapper Induction

Y N User

Ont-O-Mat: PANKOW

Pattern Discovery

N N User

SemTag Rule N N TAP

Page 7: 슬라이드 1

Wrapper Induction

Page 8: 슬라이드 1

What is Wrapper? - 1

A frame to analyze semi-structured data (mostly in web)

Page 9: 슬라이드 1

What is Wrapper? - 2

Page 10: 슬라이드 1

Wrapper Induction

Information Extraction from

Semi-Structured Databy creating Wrapper Automatically

“Wrapper Induction for Information Extraction”- Nicholas Kushmerick (264p)

Page 11: 슬라이드 1

Wrapper Induction

High precision Useful bootstrapping method

Many other semantic annotation platform used this method Amilcare: Wrapper Induction Tool▪ MnM▪ OntoMat▪ Armadillo

Page 12: 슬라이드 1

Pattern-based Approach

Page 13: 슬라이드 1

OntoMat PANKOW

PANKOW Pattern-based Annotation through Knowledge

on the Web Plugin for OntoMat Institute of AIFB, University of Karlsruhe

Page 14: 슬라이드 1

Patterns in PANKOW

Hearst Patterns

Page 15: 슬라이드 1

Patterns in PANKOW

Definites

Apposition and Copula

Page 16: 슬라이드 1

The Process of PANKOW

Proper Noun Ex-traction

(Term Ex-traction)

Hypothe-sis

PhraseConstruc-

tion

Using Pat-tern

Page 17: 슬라이드 1

PANKOW Example - 1

The Extensible Markup Language ( XML ) is a general-purpose markup language.It is classified as an extensible language because it allows its users to define their own tags.

The |Extensible Markup Language|( |XML| ) is a general-purpose |markup language|.It is classified as an |extensible language| because it allows its users to define their own |tags|.

H1: <CONCEPT>s such as <INSTANCE>Extensible Markup Languages such as XMLExtensible Markup Languages such as markup languageXMLs such as markup languagemarkup languages such as XML…DEFINITE1: the <INSTANCE> <CONCEPT>the markup language XML… the tags markup language

Web Page

Web Page with Proper Noun Phrases

Hypothesis Phrases

Page 18: 슬라이드 1

PANKOW Example - 2

H1: <CONCEPT>s such as <INSTANCE>Extensible Markup Languages such as XMLExtensible Markup Languages such as markup languageXMLs such as markup languagemarkup languages such as XML…DEFINITE1: the <INSTANCE> <CONCEPT>the markup language XML… the tags markup language

Extensible Markup Languages such as XML -- 3Extensible Markup Languages such as markup language -- 0XMLs such as markup language -- 0markup languages such as XML -- 834

Hypothesis Phrases

Number of hits for phrase

Page 19: 슬라이드 1

PANKOW Example - 3

Extensible Markup Languages such as XML -- 3Extensible Markup Languages such as markup language -- 0XMLs such as markup language -- 0markup languages such as XML -- 834

The Extensible Markup Language ( <Term id =“2” instanceOf=“3”>XML</Term> ) is a general-purpose <Term id=“3” conceptOf=“2”>markup language</Term>.It is classified as an extensible language because it allows its users to define their own tags.

Number of hits for phrase

Annotated Document

ComputerLanguage

MarkupLanguage

Program-ming

Language

Page 20: 슬라이드 1

Rule-based Approach

Page 21: 슬라이드 1

Semantic Annotation in KIM

UpperOntology

Named En-tity Recogni-

tion

Map-ping

Page 22: 슬라이드 1

Key Points in KIM

Semantic annotation system requires a light-weight upper-level ontology focused on named entity classes

RDF(S) with compliance and possible extensions to OWL Lite is the best choice for knowledge rep-resentation language for the ontology and the KB More power will unneccessarily degrade the scale and

performance

The documents and the metadata (annotations) should be kept decoupled from each other and separate from the ontology and theh knowledge base

Page 23: 슬라이드 1

Rules in KIM

Lists of mapping rule 80,000 mapping rules already▪ Date, Person, Organization, Location, Percent, Money

Page 24: 슬라이드 1

Evaluation

Page 25: 슬라이드 1

Platform Evaluation

Framework Precision Recall F-Measure

Armadillo 91.0 74.0 87.0

KIM 86.0 82.0 84.0

MnM 95.0 90.0 n/a

MUSE 93.5 92.3 92.9

Ont-O-Mat: PANKOW

65.0 28.2 24.9

SemTag 82.0 n/a n/a

Page 26: 슬라이드 1

Unfairness in Evaluation

Definitions and Scope of Semantic Annota-tion are different PANKOW: concept, instance annotation Armadillo: Restricted NE Annotation(Human,

Paper) KIM: NE Annotation (Date, Person, Organization,

Location, Percent, Money)To the best of our knowledge there is no well established term for this task; Neither there is a well established meaning for the term “semantic annota-tion” - From “KIM – Semantic Annotation Platform”

Page 27: 슬라이드 1

Conclusion

Terms like pattern, rule, semantic annotation are very ambiguous Defining these terms suitable for our project is important

Wrapper Induction for Bootstrapping Data PANKOW Term Extraction method Upper ontology is important

Every annotation tool have upper ontology and they mapped extracted entity to this ontology

KIMO is well-defined Separation of relation extraction from concept

gathering

Page 28: 슬라이드 1

The end

Page 29: 슬라이드 1
Page 30: 슬라이드 1
Page 31: 슬라이드 1

Conclusion

Named Entity ( 추출하고자 하는 대상을 좁히면 편하다 )

개념 등록과 관계 맺기를 분리하라 Use Upper Ontology 자신의 목적에 맞게 annotation 툴을 사용하라 .

같은 용어를 사용했다고 , 같은 행동을 하는 툴은 아니다 .

Page 32: 슬라이드 1

각 논문에서의 Semantic Annota-tion 의 의미

Page 33: 슬라이드 1

Named Entity Recognition

Page 34: 슬라이드 1

용어 통일

Pattern Rule Machine Learning

새 triple 에서 pattern 을 추출하는 것은 Machine Learning 은 아니다 .

Page 35: 슬라이드 1

Pattern

Example of Ont-O-Mat: PANKOW PANKOW▪ Pattern-based Annotation through Knowledge on the

Web

Patterns in PANKOW Linguistic Patterns (similar pattern with ours)▪ Hearst Patterns▪ Definites▪ Apposition and Copula

They use patterns to extract concepts, in-stances from text

Page 36: 슬라이드 1

Pattern Discovery in PANKOW

Page 37: 슬라이드 1

평가 방법 Precision Recall

평가셋을 어디에서 구하던가 ?

Page 38: 슬라이드 1

주요 프로그램 예제 KIM ?

프로그램에서 저장하고 있는 Annotation 의 형태

Page 39: 슬라이드 1

Un-covered Annotation Tool

MMAX2 EML

OntoNote

Page 40: 슬라이드 1

우리의 어노테이션과 차이점

Page 41: 슬라이드 1

어느 프로그램이 가장 유용할까 ? 우리 프로젝트에

Page 42: 슬라이드 1

Reference