NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm...
-
Upload
gwendoline-johnson -
Category
Documents
-
view
221 -
download
0
Transcript of NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm...
NEVER-ENDING LANGUAGE LEARNER
Student:
Nguyễn Hữu Thành
Phạm Xuân Khoái
Vũ Mạnh Cầm
Instructor:
PhD Lê Hồng Phương
Hà Nội, January 11 2014
Idea: Build a structuring KB.
What is KB?Categories: cities, companies, sport
teams….Relations: hasOfficeIn(organisation,
location)Noun Phrase
What is structuring KB?
Globe and Mail
StanleyCup
hockey
NHL
Toronto
CFRB
Wilson
playhired
wonMaple Leafs
home town
city paper
league
Sundin
Milson
writer
radio
Maple Leaf Gardens
team stadiu
m
Canada
city stadiu
m
politician
country
Miller
airport
member
Toskala
Pearson
Skydome
Connaught
Sunnybrook
hospital
city company
skates helmet
uses equipmen
t
won
Red Wings
Detroit
hometown
GM
city company
competes
with
Toyota
plays in
league
Prius
Corrola
createdHino
acquired
automobile
economic
sector
city stadiu
m
Idea: Structuring Knowledge Baseclimbing
football
uses equipme
nt
Ideas: using Machine LearningMachine Learning: a branch of
artificial intelligence, concerns the construction and study of systems that can learn from data.
Ideas
Seed examples
Web
NELLKnowledge Base (KB)
Human trainers
Initial ontology
Ideas: the task run 24x7, forevereach day:
1. Reading task: extract more facts from the web to populate the initial ontology.
2. Learning task: learn to read (perform #1) better than yesterday.
NELL Architecture
3
2 1
Beliefs
Candidate facts
Knowledge Integrator
CPL RLCMCCSEAL
Data Resource
s
Knowledge Base
Subsystem Components
Coupled Pattern Learner (CPL)
- Learns to extract category and relation instances/ pattern from unstructure text.
- Learns contextual pattern that high-precision extractor for each predicate.
- Eg: + Trang An la ten mot co gai. + Trang An la ten mot cong ty.
Use it to improve high-precision
Input/Output
- Input : + Larger text corpus
+ Initial ontology containing the information.
- Output: + Proposed instances/ contextual pattern for each predicate.
Input: An ontology O, and a text corpus C
Output: Trusted instances/patterns for each predicate
for i=1,2,...,∞ do foreach predicate p in O do EXTRACT candidate instances/contextual patterns using recently promoted patterns/instances;
FILTER candidates that violate coupling;
RANK candidate instances/patterns;
PROMOTE top candidates;
endend
Example:
Samsung vừa tung clip chế nhạo sản phẩm mới của Nokia.
City Ha Noi, Ho Chi Minh, Da Nang,...
Company Son Ha, Kinh Do,...
competesWith
(AMD, Intel), (Google, Microsoft), (Samsung, Nokia),...
Coupled SEAL
BeliefsCSEA
LNew
candidate facts
Internet
Coupled SEALSEAL (Set Expander for Any
Language): expands entities automatically by utilizing resources from the Web
CSEAL adds mutual-exclusion and type-checking constraints
Coupled SEALCoupled SEAL :: A semi-structured
extractorQueries the internet with sets of beliefs
from each category or relation; mines lists and tables for instances
Uses mutual exclusion relationships to provide negative examples for filtering overly general lists and tables
5 queries/category 10 queries/relation fetches 50 web pages/query
probabilities assigned as in CPL
Coupled SEALExample:
Coupled Morphological Classifier
KB
Data Resources
CMC New candidate
facts
CMC classify NP based on various morphological features (words, capitalization, affixes)
Coupled Morphological Classifier
Ex1: Bach Mai hotel hotel(Bach Mai)
Ex2: Mai person(Mai)
Ex3: tradition noun(tradition)
Coupled Morphological Classifier
Beliefs from KB are used as training instances
CMC examines candidate facts proposed by other components and classifies up to 30 new beliefs/candidate
Rule Learner
Candidate facts
Beliefs
RL New candidate
facts
RL uses categories and relations in KB as its input and make new relations for KB.
Rule LearnerExample 1: playSport(Rooney, football)
athlete(Rooney), sport(football)
Example2: isCapital(Hanoi, Vietnam), liveIn(Thanh, Hanoi), roommate(Thanh, Khoai), roommate(Khoai, Cam)
liveIn(Thanh, Vietnam), roommate(Thanh, Cam), liveIn(Khoai, Hanoi)…..
Rule LearnerSome kinds of Rule Learner
Systems: OneR, Ridor, PART, JRip, ConjunctiveRule.
Clip: https://www.youtube.com/watch?v=5On-tDeu2ic
Initial resultRunning 24x7, since January, 12, 2010 Inputs:
• ontology defining >600 categories and relations
• 10-20 seed examples of each• • 100,000 web search queries per day • ~ 5 minutes/day of human guidance
Result:• KB with > 15 million candidate beliefs,
growing daily• learning to reason, as well as read• automatically extending its ontology
Initial resultDemo:http://rtw.ml.cmu.edu/rtw/kbbrow
ser/beverage:beer
ReferencesNELL article: http://www.cs.cmu.edu/~acarlson/papers/carlson-aaai10.pdfhttp://
rtw.ml.cmu.edu/rtw/kbbrowser/beverage:beerhttp://videolectures.net/akbcwekex2012_mitch
ell_language_learning/
Tom Mitchell’s seminar: http://www.youtube.com/watch?v=51q2IajH94ARL:
http://mydatamining.wordpress.com/2008/04/14/rule-learner-or-rule-induction/