NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm...

Post on 31-Dec-2015

221 views 0 download

Transcript of NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm...

NEVER-ENDING LANGUAGE LEARNER

Student:

Nguyễn Hữu Thành

Phạm Xuân Khoái

Vũ Mạnh Cầm

Instructor:

PhD Lê Hồng Phương

Hà Nội, January 11 2014

Idea: Build a structuring KB.

What is KB?Categories: cities, companies, sport

teams….Relations: hasOfficeIn(organisation,

location)Noun Phrase

What is structuring KB?

Globe and Mail

StanleyCup

hockey

NHL

Toronto

CFRB

Wilson

playhired

wonMaple Leafs

home town

city paper

league

Sundin

Milson

writer

radio

Maple Leaf Gardens

team stadiu

m

Canada

city stadiu

m

politician

country

Miller

airport

member

Toskala

Pearson

Skydome

Connaught

Sunnybrook

hospital

city company

skates helmet

uses equipmen

t

won

Red Wings

Detroit

hometown

GM

city company

competes

with

Toyota

plays in

league

Prius

Corrola

createdHino

acquired

automobile

economic

sector

city stadiu

m

Idea: Structuring Knowledge Baseclimbing

football

uses equipme

nt

Ideas: using Machine LearningMachine Learning: a branch of 

artificial intelligence, concerns the construction and study of systems that can learn from data.

Ideas

Seed examples

 

Web

NELLKnowledge Base (KB)

Human trainers

Initial ontology

 

Ideas: the task run 24x7, forevereach day:

1. Reading task: extract more facts from the web to populate the initial ontology.

2. Learning task: learn to read (perform #1) better than yesterday.

NELL Architecture

3

2 1

Beliefs

Candidate facts

Knowledge Integrator

CPL RLCMCCSEAL

Data Resource

s

Knowledge Base

Subsystem Components

Coupled Pattern Learner (CPL)

- Learns to extract category and relation instances/ pattern from unstructure text.

- Learns contextual pattern that high-precision extractor for each predicate.

- Eg: + Trang An la ten mot co gai. + Trang An la ten mot cong ty.

Use it to improve high-precision

Input/Output

- Input : + Larger text corpus

+ Initial ontology containing the information.

- Output: + Proposed instances/ contextual pattern for each predicate.

Input: An ontology O, and a text corpus C

Output: Trusted instances/patterns for each predicate

for i=1,2,...,∞ do foreach predicate p in O do EXTRACT candidate instances/contextual patterns using recently promoted patterns/instances;

FILTER candidates that violate coupling;

RANK candidate instances/patterns;

PROMOTE top candidates;

endend

Example:

Samsung vừa tung clip chế nhạo sản phẩm mới của Nokia.

City Ha Noi, Ho Chi Minh, Da Nang,...

Company Son Ha, Kinh Do,...

competesWith

(AMD, Intel), (Google, Microsoft), (Samsung, Nokia),...

Coupled SEAL

BeliefsCSEA

LNew

candidate facts

Internet

Coupled SEALSEAL (Set Expander for Any

Language): expands entities automatically by utilizing resources from the Web

CSEAL adds mutual-exclusion and type-checking constraints

Coupled SEALCoupled SEAL :: A semi-structured

extractorQueries the internet with sets of beliefs

from each category or relation; mines lists and tables for instances

Uses mutual exclusion relationships to provide negative examples for filtering overly general lists and tables

5 queries/category 10 queries/relation fetches 50 web pages/query

probabilities assigned as in CPL

Coupled SEALExample:

Coupled Morphological Classifier

KB

Data Resources

CMC New candidate

facts

CMC classify NP based on various morphological features (words, capitalization, affixes)

Coupled Morphological Classifier

Ex1: Bach Mai hotel hotel(Bach Mai)

Ex2: Mai person(Mai)

Ex3: tradition noun(tradition)

Coupled Morphological Classifier

Beliefs from KB are used as training instances

CMC examines candidate facts proposed by other components and classifies up to 30 new beliefs/candidate

Rule Learner

Candidate facts

Beliefs

RL New candidate

facts

RL uses categories and relations in KB as its input and make new relations for KB.

Rule LearnerExample 1: playSport(Rooney, football)

athlete(Rooney), sport(football)

Example2: isCapital(Hanoi, Vietnam), liveIn(Thanh, Hanoi), roommate(Thanh, Khoai), roommate(Khoai, Cam)

liveIn(Thanh, Vietnam), roommate(Thanh, Cam), liveIn(Khoai, Hanoi)…..

Rule LearnerSome kinds of Rule Learner

Systems: OneR, Ridor, PART, JRip, ConjunctiveRule.

Clip: https://www.youtube.com/watch?v=5On-tDeu2ic

Initial resultRunning 24x7, since January, 12, 2010 Inputs:

• ontology defining >600 categories and relations

• 10-20 seed examples of each• • 100,000 web search queries per day • ~ 5 minutes/day of human guidance

Result:• KB with > 15 million candidate beliefs,

growing daily• learning to reason, as well as read• automatically extending its ontology

Initial resultDemo:http://rtw.ml.cmu.edu/rtw/kbbrow

ser/beverage:beer