NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm...

24
NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014

Transcript of NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm...

Page 1: NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

NEVER-ENDING LANGUAGE LEARNER

Student:

Nguyễn Hữu Thành

Phạm Xuân Khoái

Vũ Mạnh Cầm

Instructor:

PhD Lê Hồng Phương

Hà Nội, January 11 2014

Page 2: NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

Idea: Build a structuring KB.

What is KB?Categories: cities, companies, sport

teams….Relations: hasOfficeIn(organisation,

location)Noun Phrase

What is structuring KB?

Page 3: NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

Globe and Mail

StanleyCup

hockey

NHL

Toronto

CFRB

Wilson

playhired

wonMaple Leafs

home town

city paper

league

Sundin

Milson

writer

radio

Maple Leaf Gardens

team stadiu

m

Canada

city stadiu

m

politician

country

Miller

airport

member

Toskala

Pearson

Skydome

Connaught

Sunnybrook

hospital

city company

skates helmet

uses equipmen

t

won

Red Wings

Detroit

hometown

GM

city company

competes

with

Toyota

plays in

league

Prius

Corrola

createdHino

acquired

automobile

economic

sector

city stadiu

m

Idea: Structuring Knowledge Baseclimbing

football

uses equipme

nt

Page 4: NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

Ideas: using Machine LearningMachine Learning: a branch of 

artificial intelligence, concerns the construction and study of systems that can learn from data.

Page 5: NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

Ideas

Seed examples

 

Web

NELLKnowledge Base (KB)

Human trainers

Initial ontology

 

Page 6: NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

Ideas: the task run 24x7, forevereach day:

1. Reading task: extract more facts from the web to populate the initial ontology.

2. Learning task: learn to read (perform #1) better than yesterday.

Page 7: NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

NELL Architecture

3

2 1

Beliefs

Candidate facts

Knowledge Integrator

CPL RLCMCCSEAL

Data Resource

s

Knowledge Base

Subsystem Components

Page 8: NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

Coupled Pattern Learner (CPL)

- Learns to extract category and relation instances/ pattern from unstructure text.

- Learns contextual pattern that high-precision extractor for each predicate.

- Eg: + Trang An la ten mot co gai. + Trang An la ten mot cong ty.

Use it to improve high-precision

Page 9: NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

Input/Output

- Input : + Larger text corpus

+ Initial ontology containing the information.

- Output: + Proposed instances/ contextual pattern for each predicate.

Page 10: NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

Input: An ontology O, and a text corpus C

Output: Trusted instances/patterns for each predicate

for i=1,2,...,∞ do foreach predicate p in O do EXTRACT candidate instances/contextual patterns using recently promoted patterns/instances;

FILTER candidates that violate coupling;

RANK candidate instances/patterns;

PROMOTE top candidates;

endend

Page 11: NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

Example:

Samsung vừa tung clip chế nhạo sản phẩm mới của Nokia.

City Ha Noi, Ho Chi Minh, Da Nang,...

Company Son Ha, Kinh Do,...

competesWith

(AMD, Intel), (Google, Microsoft), (Samsung, Nokia),...

Page 12: NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

Coupled SEAL

BeliefsCSEA

LNew

candidate facts

Internet

Page 13: NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

Coupled SEALSEAL (Set Expander for Any

Language): expands entities automatically by utilizing resources from the Web

CSEAL adds mutual-exclusion and type-checking constraints

Page 14: NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

Coupled SEALCoupled SEAL :: A semi-structured

extractorQueries the internet with sets of beliefs

from each category or relation; mines lists and tables for instances

Uses mutual exclusion relationships to provide negative examples for filtering overly general lists and tables

5 queries/category 10 queries/relation fetches 50 web pages/query

probabilities assigned as in CPL

Page 15: NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

Coupled SEALExample:

Page 16: NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

Coupled Morphological Classifier

KB

Data Resources

CMC New candidate

facts

CMC classify NP based on various morphological features (words, capitalization, affixes)

Page 17: NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

Coupled Morphological Classifier

Ex1: Bach Mai hotel hotel(Bach Mai)

Ex2: Mai person(Mai)

Ex3: tradition noun(tradition)

Page 18: NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

Coupled Morphological Classifier

Beliefs from KB are used as training instances

CMC examines candidate facts proposed by other components and classifies up to 30 new beliefs/candidate

Page 19: NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

Rule Learner

Candidate facts

Beliefs

RL New candidate

facts

RL uses categories and relations in KB as its input and make new relations for KB.

Page 20: NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

Rule LearnerExample 1: playSport(Rooney, football)

athlete(Rooney), sport(football)

Example2: isCapital(Hanoi, Vietnam), liveIn(Thanh, Hanoi), roommate(Thanh, Khoai), roommate(Khoai, Cam)

liveIn(Thanh, Vietnam), roommate(Thanh, Cam), liveIn(Khoai, Hanoi)…..

Page 21: NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

Rule LearnerSome kinds of Rule Learner

Systems: OneR, Ridor, PART, JRip, ConjunctiveRule.

Clip: https://www.youtube.com/watch?v=5On-tDeu2ic

Page 22: NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

Initial resultRunning 24x7, since January, 12, 2010 Inputs:

• ontology defining >600 categories and relations

• 10-20 seed examples of each• • 100,000 web search queries per day • ~ 5 minutes/day of human guidance

Result:• KB with > 15 million candidate beliefs,

growing daily• learning to reason, as well as read• automatically extending its ontology

Page 23: NEVER-ENDING LANGUAGE LEARNER Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương Hà Nội, January 11 2014.

Initial resultDemo:http://rtw.ml.cmu.edu/rtw/kbbrow

ser/beverage:beer