A Structured Model for Joint Learning of Argument Roles and Predicate Senses

A Structured Model for Joint Learning of Argument Roles and Predicate Senses

Yotaro Watanabe Masayuki Asahara Yuji Matsumoto

ACL 2010Uppsala, SwedenJuly 12, 2010

Tohoku University Nara Institute of Science and Technology

Predicate-Argument Structure Analysis(Semantic Role Labeling)

Task of analyzing predicates and its arguments– A predicate represents a state or an event, and its arguments have relations to

the predicate– Each of arguments has a particular semantic role (Agent, Theme, etc)

In recent years, predicate sense disambiguation has been included in predicate-argument structure analysis [Surdeanu+ 08, Hajič+ 09]

– ‘sell.01’ means that ‘sold’ is an instance of the first sense of ‘sell’

Important for many NLP applications– MT, QA, RTE, etc.

ThemeLocation

Temporalluxury auto maker lastThe year sold 1,214 cars in the U.S.

maker.01 sell.01

Product Agent Agent

drive.01: drive a vehicle A0: driver A1: vehicle

drive.02: cause to move A0: driver A1: things in motion

Two Types of Dependencies of Elements in Predicate-Argument Structures

(1) Inter-dependencies between a predicate and its arguments– A1: car => we can infer that the correct sense is drive.01

(2) Non-local dependencies among arguments• Two or more arguments do not have the same role• Basically, obligatory roles of the predicate should appear in sentences

drive.01A0 A1

SBJ NMODOBJ

Paul drove his car

In order to realize robust predicate-argument structure analysis, it is necessary to deal with these types of dependencies

Previous Work

(1) Non-local dependencies among arguments: Re-ranking [Johansson and Nugues 2008, etc.]

• Generate N-best assignments of argument roles, then obtain global features for each assignment, finally select the argmax using the re-ranker

• Can not explicitly capture inter-dependencies between a predicate and its arguments

(2) Inter-dependencies between a predicate and its arguments: Markov Logic Networks [Meza-Ruiz and Riedel 2009, etc.]

• Jointly learn and classify pred. senses and arg. roles simultaneously• MLN can not deal with particular types of global features

Currently, no existing (discriminative) approach sufficiently handles both types of dependencies

Previous Work

(1) Non-local dependencies among arguments: Re-ranking [Johansson and Nugues 2008, etc.]

• Generate N-best assignments of argument roles, then obtain global features for each assignment, finally select the argmax using the re-ranker

• Can not explicitly capture inter-dependencies between a predicate and its arguments

(2) Inter-dependencies between a predicate and its arguments: Markov Logic Networks [Meza-Ruiz and Riedel 2009, etc.]

• Jointly learn and classify pred. senses and arg. roles simultaneously• MLN can not deal with particular types of global features

Currently, no existing (discriminative) approach sufficiently handles both types of dependenciesWe propose a structured model that can capture both

types of dependencies simultaneously

The proposed model

SBJ NMODOBJ

Paul drove his car



The proposed model

A0

drive.01 drive.02

…A1 A1A0 …Paul car

drove

NONE NONE

SBJ NMODOBJ

Paul drove his car



Expand the possible labels of predicate senses and argument roles

The proposed model

A0

drive.01 drive.02


drove

NONE NONE

SBJ NMODOBJ

Paul drove his car




We use four types of factors which score labels of elements in predicate-argument structures

The proposed model

A0

drive.01 drive.02


drove

NONE NONE

SBJ NMODOBJ

Paul drove his car




These factors are defined by (linear model)

We use four types of factors which score labels of elements in predicate-argument structures

A0

drive.01 drive.02


drove

NONE NONE

The proposed model

1.4754 0.7268

SBJ NMODOBJ

Paul drove his car

FP



use a factor which scores sense labels of the predicate

A0

drive.01 drive.02


drove

NONE NONE

The proposed model

1.784 0.238 -1.665 -1.235 0.876 -1.482

SBJ NMODOBJ

Paul drove his car

FA

FP



use a factor which scores role labels of each argument

A0 …A1 A1A0 …Paul car

drove

NONE NONE

The proposed model

0.764 0.261

SBJ NMODOBJ

Paul drove his car

FPA

FA

FP



drive.01 drive.02

add a factor which scores label pairs of a predicate sense and a semantic role of an argument

The proposed model

drive.02

…A1 A0 …Paul car

drove

NONE NONE

A0,drive01,A1… 1.865

A0

drive.01

A1

SBJ NMODOBJ

Paul drove his car

FP

FPA

FA

FG



add a factor which captures plausibility of the whole predicate-argument structure(use global features)

The proposed model

drive.02


drove

NONE NONE

A0,drive01,A1… 1.865

A0

drive.01

A1

SBJ NMODOBJ

Paul drove his car

FP

FPA

FA

FG



add a factor which captures plausibility of the whole predicate-argument structure(use global features)

The predicate ‘drive’ has all obligatory roles A0 and A1=> FG assigns the higher score to the weight corresponds to this feature

The proposed model

drive.02


drove

NONEA0

drive.01

A1 NONE1.784 0.238 -1.665 -1.235 0.876 -1.482

1.4754 0.7268

0.7640.425

SBJ NMODOBJ

Paul drove his car

A0,drive01,A1… 1.865

FP

FPA

FA

FG



The proposed model combines these types of factors

The proposed model

drive.02


drove

NONEA0

drive.01

A1 NONE

drive.01A0 A1

1.784 0.238 -1.665 -1.235 0.876 -1.482

1.4754 0.7268

0.7640.425

SBJ NMODOBJ

Paul drove his car

A0,drive01,A1… 1.865

FP

FPA

FA

FG



The proposed model combines these types of factors

The highest scoring assignment is returned by the proposed model

Dealing with global (non-local) features

Introduce the fundamental idea of [Kazama and Torisawa 2007]– Features are divided into local features and global features– Inference: N-best based approach

(1) Generate N-best assignments using only local features

(2) Obtain global features in the N-best assignments

(3) Select the argmax – Learning: train parameters with two margin constraints

• All: train parameters so as to ensure a sufficient margin using all features (both local features and global features)

• Local only: when the constraint All is satisfied, train parameters so as to ensure a sufficient margin using only local features

• K&T proposed a Margin-Perceptron Learning Algorithm

Inference and Learning Algorithm of the Proposed Model

Inference: generate N-best assignments for each predicate senseLearning: the online Passive-Aggressive Algorithm [Crammer 2006]

• The parameters are trained by solving the optimization problem used in PA with the two margin constraints: All (local + global) and Local only

(1) All (local + global)

margin

(2) Local only

margin

positiveother

positiveother

Results on the CoNLL-2009 ST Dataset (average)

feature selection

Overall (Sem. F1)

WSD (Acc.)

SRL (Lab. F1)

FP+FA no 79.17 89.65 72.20FP+FA+FPA no 79.58 89.78 72.74FP+FA+FG no 80.42 89.83 74.11ALL no 80.75 90.15 74.46Björkelund yes 80.80Zhao yes 80.47Meza-Ruiz no 77.46

sense FP

FPA

FG

FA

…role1 role2 roleN

The best performance is obtained by using the all factorsOur model achieved the competitive results with the top system in

the CoNLL-2009 Shared Task without any feature selection procedure

Results on the CoNLL-2009 ST Dataset (average)

feature selection

Overall (Sem. F1)

WSD (Acc.)

SRL (Lab. F1)

FP+FA no 79.17 89.65 72.20FP+FA+FPA no 79.58 89.78 72.74FP+FA+FG no 80.42 89.83 74.11ALL no 80.75 90.15 74.46Björkelund yes 80.80Zhao yes 80.47Meza-Ruiz no 77.46

sense FP

FPA

FG

FA

…role1 role2 roleN

By adding two types of factors FPA and FG, we obtained performance improvements in both tasks (predicate sense disambiguation and argument role labeling)

=> Succeeded in joint learning

Summary

We proposed a structured model that can capture two types of dependencies(1) Non-local dependencies among arguments

(2) Inter-dependencies between a predicate and its arguments

The proposed model achieved the competitive results with the state-of-the-art SRL systems without any feature selection procedure

By adding two types of factors, we obtained performance improvements on both predicate sense disambiguation and argument role labeling

=> succeeded in joint learning Future Work

– exploiting unlabeled data (unsupervised or semi-supervised predicate-argument structure analysis)

A Structured Model for Joint Learning of Argument Roles and Predicate Senses

Documents

Transcript of A Structured Model for Joint Learning of Argument Roles and Predicate Senses