Dependency Parsing Parsing Algorithms Peng.Huang peng.huangp@alibaba-inc.com.

Post on 19-Jan-2016

243 views 0 download

Transcript of Dependency Parsing Parsing Algorithms Peng.Huang peng.huangp@alibaba-inc.com.

Dependency ParsingDependency ParsingParsing AlgorithmsParsing Algorithms

Peng.Huang

peng.huangp@alibaba-inc.com

OutlineOutline Introduction

Phrase Structure Grammar Dependency GrammarComparison and Conversion

Dependency ParsingFormal definition

Parsing Algorithms IntroductionDynamic programming Constraint satisfactionDeterministic search

2

IntroductionIntroduction Syntactic Parsing

finding the correct syntactic structure of that sentence in a given formalism/grammar

Two FormalismDependency Parsing (DG)Phrase Structure Grammar (PSG)

3

Phrase Structure Grammar (PSG)Phrase Structure Grammar (PSG)

Breaks sentence into constituents (phrases)Which are then broken into smaller

constituentsDescribes phrase structure, clause

structureE.g.. NP, PP, VP etc..

Structures often recursiveThe clever tall blue-eyed old … man

4

Tree StructureTree Structure

5

Dependency GrammarDependency Grammar Syntactic structure consists of lexical items,

linked by binary asymmetric relations called dependencies

Interested in grammatical relations between individual words (governing & dependent words)

Does not propose a recursive structureRather a network of relations

These relations can also have labels

6

ComparisonComparison

• Dependency structures explicitly represent – Head-dependent relations (directed arcs)– Functional categories (arc labels)– Possibly some structural categories (parts-of-speech)

• Phrase structure explicitly represent – Phrases (non-terminal nodes)– Structural categories (non-terminal labels)– Possibly some functional categories (grammatical

functions)

7

Conversion … PSG to DGConversion … PSG to DG Head of a phrase governs/dominates all the

siblings

Heads are calculated using heuristics

Dependency relations are established between the head of each phrase as the parent and its siblings as children

The tree thus formed is the unlabeled dependency tree

8

PSG to DGPSG to DG

9

Conversion … DG to PSGConversion … DG to PSG Each head together with its dependents (and

their dependents) forms a constituent of the sentence

Difficult to assign the structural categories (NP, VP, S, PP etc…) to these derived constituents

Every projective dependency grammar has a strongly equivalent context-free grammar, but not vice versa [Gaifman 1965]

10

OutlineOutline Introduction

Phrase Structure Grammar Dependency GrammarComparison and Conversion

Dependency ParsingFormal definition

Parsing Algorithms IntroductionDynamic programming Constraint satisfactionDeterministic search

11

Dependency TreeDependency Tree Formal definition

– An input word sequence w1…wn

– Dependency graph D = (W,E) whereW is the set of nodes i.e. word tokens in the input seq E is the set of unlabeled tree edges (wi, wj) (wi, wj є W) (wi, wj) indicates an edge from wi (parent) to wj (child)

Formal definition

• Task of mapping an input string to a dependency graph satisfying certain conditions is called dependency parsing

12

Well-formednessWell-formedness A dependency graph is well-formed iff

Single head: Each word has only one head

Acyclic: The graph should be acyclic

Connected: The graph should be a single tree with all the words in the sentence

• Projective: If word A depends on word B, then all words between A and B are also subordinate to B (i.e. dominated by B)

13

14

Non-projective dependency treeNon-projective dependency tree

Ram saw a dog yesterday which was a Yorkshire Terrier

* Crossing lines

English has very few non-projective cases.

OutlineOutline Introduction

Phrase Structure Grammar Dependency GrammarComparison and Conversion

Dependency ParsingFormal definition

Parsing Algorithms IntroductionDynamic programming Constraint satisfactionDeterministic search

15

Dependency Parsing (P. Mannem)16

Dependency ParsingDependency Parsing

• Dependency based parsers can be broadly categorized into – Grammar driven approaches

• Parsing done using grammars.

– Data driven approaches• Parsing by training on annotated/un-annotated

data.

May 28, 2008

Well-formednessWell-formedness A dependency graph is well-formed iff

Single head: Each word has only one head

Acyclic: The graph should be acyclic

Connected: The graph should be a single tree with all the words in the sentence

• Projective: If word A depends on word B, then all words between A and B are also subordinate to B (i.e. dominated by B)

17

18

Parsing MethodsParsing Methods

• Three main traditions– Dynamic programming

• CYK, Eisner, McDonald

– Constraint satisfaction• Maruyama, Foth et al., Duchier

– Deterministic search• Covington, Yamada and Matsumuto, Nivre

Dynamic ProgrammingDynamic Programming Basic Idea: Treat dependencies as constituents Use, e.g. , CYK parser (with minor modifications)

19

Eisner 1996Eisner 1996 Two novel aspects:

Modified parsing algorithmProbabilistic dependency parsing

Time requirement: O(n3) Modification: Instead of storing subtrees, store

spans Span: Substring such that no interior word links

to any word outside the span Idea: In a span, only the boundary words are

active, i.e. still need a head or a child

20

21

Red

figures

on

the

screen

indicated

falling

stocks_ROOT_

ExampleExample

22

Red figures on the screen indicated falling stocks_ROOT_

ExampleExample

indicated falling stocksRed figures

Spans:

23

Red figures on the screen indicated falling stocks_ROOT_

Assembly of correct parseAssembly of correct parse

Red figures

Start by combining adjacent words to minimal spans

figures on on the

24

Red figures on the screen indicated falling stocks_ROOT_

Assembly of correct parseAssembly of correct parse

Combine spans which overlap in one word; this word must be governed by a word in the left or right span.

on the the screen+ → on the screen

25

Red figures on the screen indicated falling stocks_ROOT_

Assembly of correct parseAssembly of correct parse

Combine spans which overlap in one word; this word must be governed by a word in the left or right span.

figures

on + →on the screen figures on the screen

26

Red figures on the screen indicated falling stocks_ROOT_

Assembly of correct parseAssembly of correct parse

Combine spans which overlap in one word; this word must be governed by a word in the left or right span.

Invalid span

Red figures on the screen

McDonald’s Maximum Spanning TreesMcDonald’s Maximum Spanning Trees

Score of a dependency tree = sum of scores of dependencies

Scores are independent of other dependencies

If scores are available, parsing can be formulated as maximum spanning tree problem

Uses online learning for determining weight vector w

27

McDonald’s AlgorithmMcDonald’s Algorithm

• Score entire dependency tree

• Online Learning

28

29

Parsing MethodsParsing Methods

• Three main traditions– Dynamic programming

• CYK, Eisner, McDonald

– Constraint satisfaction• Maruyama, Foth et al., Duchier

– Deterministic search• Covington, Yamada and Matsumuto, Nivre

Dependency Parsing (P. Mannem)30

Constraint SatisfactionConstraint Satisfaction

• Uses Constraint Dependency Grammar• Grammar consists of a set of boolean

constraints, i.e. logical formulas that describe well-formed trees

• A constraint is a logical formula with variables that range over a set of predefined values

• Parsing is defined as a constraint satisfaction problem

• Constraint satisfaction removes values that contradict constraints

May 28, 2008

31

Parsing MethodsParsing Methods

• Three main traditions– Dynamic programming

• CYK, Eisner, McDonald

– Constraint satisfaction• Maruyama, Foth et al., Duchier

– Deterministic search• Covington, Yamada and Matsumuto, Nivre

Deterministic ParsingDeterministic Parsing

Basic idea– Derive a single syntactic representation

(dependency graph) through a deterministic sequence of elementary parsing actions

– Sometimes combined with backtracking or repair

32

Shift-Reduce Type AlgorithmsShift-Reduce Type Algorithms

• Data structures:– Stack [. . . ,wi ]S of partially processed tokens

– Queue [wj , . . .]Q of remaining input tokens

• Parsing actions built from atomic actions:– Adding arcs (wi → wj , wi ← wj )

– Stack and queue operations

• Left-to-right parsing

33

34

Yamada’s AlgorithmYamada’s Algorithm

Three parsing actions: Shift [. . .]S [wi , . . .]Q

[. . . , wi ]S [. . .]Q

Left [. . . , wi , wj ]S [. . .]Q

[. . . , wi ]S [. . .]Q wi → wj

Right [. . . , wi , wj ]S [. . .]Q

[. . . , wj ]S [. . .]Q wi ← wj

Multiple passes over the input give time complexity O(n2)

35

Yamada and MatsumotoYamada and Matsumoto

• Parsing in several rounds: deterministic bottom-up O(n2)

• Looks at pairs of words

• 3 actions: shift, left, right

• Shift: shifts focus to next word pair

36

Yamada and MatsumotoYamada and Matsumoto

• Right: decides that the right word depends on the left word

Left: decides that the left word depends on the right one

37

Nivre’s AlgorithmNivre’s Algorithm

• Four parsing actions:

Shift [. . .]S [wi , . . .]Q

[. . . , wi ]S [. . .]Q

Reduce [. . . , wi ]S [. . .]Q Эwk : wk → wi

[. . .]S [. . .]Q

Left-Arc [. . . , wi ]S [wj , . . .]Q ¬Эwk : wk → wi

[. . .]S [wj , . . .]Q wi ← wj

Right-Arc [. . . ,wi ]S [wj , . . .]Q ¬Эwk : wk → wj

[. . . , wi , wj ]S [. . .]Q wi → wj

Dependency Parsing (P. Mannem)38

Nivre’s AlgorithmNivre’s Algorithm

• Characteristics:– Arc-eager processing of right-dependents– Single pass over the input gives time worst

case complexity O(2n)

May 28, 2008

39

Red figures on the screen indicated falling stocks_ROOT_

Example

S Q

40

Red figures on the screen indicated falling stocks_ROOT_

ExampleExample

S Q

Shift

41

Red figures on the screen indicated falling stocks_ROOT_

ExampleExample

S Q

Left-arc

42

Red figures on the screen indicated falling stocks_ROOT_

ExampleExample

S Q

Shift

43

Red figures on the screen indicated falling stocks_ROOT_

ExampleExample

S Q

Right-arc

44

Red figures on the screen indicated falling stocks_ROOT_

ExampleExample

S Q

Shift

45

Red figures on the screen indicated falling stocks_ROOT_

ExampleExample

S Q

Left-arc

46

Red figures on the screen indicated falling stocks_ROOT_

ExampleExample

S Q

Right-arc

47

Red figures on the screen indicated falling stocks_ROOT_

ExampleExample

S Q

Reduce

48

Red figures on the screen indicated falling stocks_ROOT_

ExampleExample

S Q

Reduce

49

Red figures on the screen indicated falling stocks_ROOT_

ExampleExample

S Q

Left-arc

50

Red figures

on the screen indicated falling stocks_ROOT_

ExampleExample

S Q

Right-arc

51

Red figures

on the screen indicated falling stocks_ROOT_

ExampleExample

S Q

Shift

52

Red figures on the screen indicated falling stocks_ROOT_

ExampleExample

S Q

Left-arc

53

Red figures on the screen indicated falling stocks_ROOT_

ExampleExample

S Q

Right-arc

54

Red figures on the screen indicated falling stocks_ROOT_

ExampleExample

S Q

Reduce

55

Red figures on the screen indicated

falling stocks_ROOT_

ExampleExample

S Q

Reduce

The EndThe End

•QAQA

56