슬라이드 1
-
Upload
butest -
Category
Technology
-
view
628 -
download
3
description
Transcript of 슬라이드 1
Survey of Semantic Anno-tation Platform
CILAB Seminar2008/03/21
Contents
Paper Overview
Wrapper Induction
Pattern-based Approch
Rule-based Approach
Conclusion
Introducing Paper
Surveys on the Semantic Annotation Platform Writer: Lawrence Reeve, Hyoil Han Affiliation: Drexel University (Philadelphia) ACM Symposium on Applied Computing
They examined Semantic Web annotation platforms▪ Platform Classification, Overview, Evaluation Comparison
What I want to get … Annotation hint for our Project Term unification – Pattern, Rule, … For my research
Platform Classification
Pattern-based Discovery: Seed expansion Rules: Taxonomy Label Matching
Machine Learning-based Probabilistic: HMM, N-gram analysis Induction: Linguistic, Structural
Platform Overview
Platform Method Machine Learning
Manual Rules
Bootstrap Ontology
AeroDAML Rule N Y WordNet
Armadillo Pattern Discovery
N Y User
KIM Rule N Y KIMO
MnM Wrapper Induction
Y N KMi
MUSE Rule N Y User
Ont-O-Mat: Amilcare
Wrapper Induction
Y N User
Ont-O-Mat: PANKOW
Pattern Discovery
N N User
SemTag Rule N N TAP
Platform Overview - 2
Platform Method Machine Learning
Manual Rules
Bootstrap Ontology
AeroDAML Rule N Y WordNet
Armadillo Pattern Discovery
N Y User
KIM Rule N Y KIMO
MnM Wrapper Induction
Y N KMi
MUSE Rule N Y User
Ont-O-Mat: Amilcare
Wrapper Induction
Y N User
Ont-O-Mat: PANKOW
Pattern Discovery
N N User
SemTag Rule N N TAP
Wrapper Induction
What is Wrapper? - 1
A frame to analyze semi-structured data (mostly in web)
What is Wrapper? - 2
Wrapper Induction
Information Extraction from
Semi-Structured Databy creating Wrapper Automatically
“Wrapper Induction for Information Extraction”- Nicholas Kushmerick (264p)
Wrapper Induction
High precision Useful bootstrapping method
Many other semantic annotation platform used this method Amilcare: Wrapper Induction Tool▪ MnM▪ OntoMat▪ Armadillo
Pattern-based Approach
OntoMat PANKOW
PANKOW Pattern-based Annotation through Knowledge
on the Web Plugin for OntoMat Institute of AIFB, University of Karlsruhe
Patterns in PANKOW
Hearst Patterns
Patterns in PANKOW
Definites
Apposition and Copula
The Process of PANKOW
Proper Noun Ex-traction
(Term Ex-traction)
Hypothe-sis
PhraseConstruc-
tion
Using Pat-tern
PANKOW Example - 1
The Extensible Markup Language ( XML ) is a general-purpose markup language.It is classified as an extensible language because it allows its users to define their own tags.
The |Extensible Markup Language|( |XML| ) is a general-purpose |markup language|.It is classified as an |extensible language| because it allows its users to define their own |tags|.
H1: <CONCEPT>s such as <INSTANCE>Extensible Markup Languages such as XMLExtensible Markup Languages such as markup languageXMLs such as markup languagemarkup languages such as XML…DEFINITE1: the <INSTANCE> <CONCEPT>the markup language XML… the tags markup language
Web Page
Web Page with Proper Noun Phrases
Hypothesis Phrases
PANKOW Example - 2
H1: <CONCEPT>s such as <INSTANCE>Extensible Markup Languages such as XMLExtensible Markup Languages such as markup languageXMLs such as markup languagemarkup languages such as XML…DEFINITE1: the <INSTANCE> <CONCEPT>the markup language XML… the tags markup language
Extensible Markup Languages such as XML -- 3Extensible Markup Languages such as markup language -- 0XMLs such as markup language -- 0markup languages such as XML -- 834
Hypothesis Phrases
Number of hits for phrase
PANKOW Example - 3
Extensible Markup Languages such as XML -- 3Extensible Markup Languages such as markup language -- 0XMLs such as markup language -- 0markup languages such as XML -- 834
The Extensible Markup Language ( <Term id =“2” instanceOf=“3”>XML</Term> ) is a general-purpose <Term id=“3” conceptOf=“2”>markup language</Term>.It is classified as an extensible language because it allows its users to define their own tags.
Number of hits for phrase
Annotated Document
ComputerLanguage
MarkupLanguage
Program-ming
Language
Rule-based Approach
Semantic Annotation in KIM
UpperOntology
Named En-tity Recogni-
tion
Map-ping
Key Points in KIM
Semantic annotation system requires a light-weight upper-level ontology focused on named entity classes
RDF(S) with compliance and possible extensions to OWL Lite is the best choice for knowledge rep-resentation language for the ontology and the KB More power will unneccessarily degrade the scale and
performance
The documents and the metadata (annotations) should be kept decoupled from each other and separate from the ontology and theh knowledge base
Rules in KIM
Lists of mapping rule 80,000 mapping rules already▪ Date, Person, Organization, Location, Percent, Money
Evaluation
Platform Evaluation
Framework Precision Recall F-Measure
Armadillo 91.0 74.0 87.0
KIM 86.0 82.0 84.0
MnM 95.0 90.0 n/a
MUSE 93.5 92.3 92.9
Ont-O-Mat: PANKOW
65.0 28.2 24.9
SemTag 82.0 n/a n/a
Unfairness in Evaluation
Definitions and Scope of Semantic Annota-tion are different PANKOW: concept, instance annotation Armadillo: Restricted NE Annotation(Human,
Paper) KIM: NE Annotation (Date, Person, Organization,
Location, Percent, Money)To the best of our knowledge there is no well established term for this task; Neither there is a well established meaning for the term “semantic annota-tion” - From “KIM – Semantic Annotation Platform”
Conclusion
Terms like pattern, rule, semantic annotation are very ambiguous Defining these terms suitable for our project is important
Wrapper Induction for Bootstrapping Data PANKOW Term Extraction method Upper ontology is important
Every annotation tool have upper ontology and they mapped extracted entity to this ontology
KIMO is well-defined Separation of relation extraction from concept
gathering
The end
Conclusion
Named Entity ( 추출하고자 하는 대상을 좁히면 편하다 )
개념 등록과 관계 맺기를 분리하라 Use Upper Ontology 자신의 목적에 맞게 annotation 툴을 사용하라 .
같은 용어를 사용했다고 , 같은 행동을 하는 툴은 아니다 .
각 논문에서의 Semantic Annota-tion 의 의미
Named Entity Recognition
용어 통일
Pattern Rule Machine Learning
새 triple 에서 pattern 을 추출하는 것은 Machine Learning 은 아니다 .
Pattern
Example of Ont-O-Mat: PANKOW PANKOW▪ Pattern-based Annotation through Knowledge on the
Web
Patterns in PANKOW Linguistic Patterns (similar pattern with ours)▪ Hearst Patterns▪ Definites▪ Apposition and Copula
They use patterns to extract concepts, in-stances from text
Pattern Discovery in PANKOW
평가 방법 Precision Recall
평가셋을 어디에서 구하던가 ?
주요 프로그램 예제 KIM ?
프로그램에서 저장하고 있는 Annotation 의 형태
Un-covered Annotation Tool
MMAX2 EML
OntoNote
우리의 어노테이션과 차이점
어느 프로그램이 가장 유용할까 ? 우리 프로젝트에
Reference