RobustSubgraph GenerationImproves AbstractMeaning ...eriguchi/slide/20150824_aclyomi.pdf2015/8/24...

17
Robust Subgraph Generation Improves Abstract Meaning Representation Parsing [Werling et al.] 2015/8/24 ACL読み会@すずかけ東京学 鶴岡研 D1 江瑛

Transcript of RobustSubgraph GenerationImproves AbstractMeaning ...eriguchi/slide/20150824_aclyomi.pdf2015/8/24...

  • Robust  Subgraph  Generation  Improves  Abstract  Meaning  

    Representation  Parsing  [Werling  et  al.]  

    2015/8/24  ACL読み会@すずかけ台  

    東京⼤大学  鶴岡研  D1  江⾥里里⼝口瑛⼦子

  • Abstract  Meaning  Representation;  AMR  [Banarescu  et  al.,  2013]

    • ⾃自然⾔言語の意味をグラフで記述  

    • based  on  neo-‐‑‒Davidsonian  semantics  

    • AMRの構造  • node:  concept  

    • edge:  relation  

    • 機械翻訳,  QAなどへ応⽤用

    Proceedings of the 53rd Annual Meeting of the Association for Computational Linguisticsand the 7th International Joint Conference on Natural Language Processing, pages 982–991,

    Beijing, China, July 26-31, 2015. c�2015 Association for Computational Linguistics

    Robust Subgraph Generation Improves Abstract Meaning RepresentationParsing

    Keenon WerlingStanford University

    [email protected]

    Gabor AngeliStanford University

    [email protected]

    Christopher D. ManningStanford University

    [email protected]

    Abstract

    The Abstract Meaning Representation(AMR) is a representation for open-domain rich semantics, with potential usein fields like event extraction and machinetranslation. Node generation, typicallydone using a simple dictionary lookup, iscurrently an important limiting factor inAMR parsing. We propose a small setof actions that derive AMR subgraphs bytransformations on spans of text, whichallows for more robust learning of thisstage. Our set of construction actionsgeneralize better than the previous ap-proach, and can be learned with a sim-ple classifier. We improve on the previ-ous state-of-the-art result for AMR pars-ing, boosting end-to-end performance by3 F1 on both the LDC2013E117 andLDC2014T12 datasets.

    1 Introduction

    The Abstract Meaning Representation (AMR)(Banarescu et al., 2013) is a rich, graph-based lan-guage for expressing semantics over a broad do-main. The formalism is backed by a large data-labeling effort, and it holds promise for enabling anew breed of natural language applications rang-ing from semantically aware MT to rich broad-domain QA over text-based knowledge bases. Fig-ure 1 shows an example AMR for “he gleefully ranto his dog Rover,” and we give a brief introductionto AMR in Section 2. This paper focuses on AMRparsing, the task of mapping a natural languagesentence into an AMR graph.

    We follow previous work (Flanigan et al., 2014)in dividing AMR parsing into two steps. Thefirst step is concept identification, which generatesAMR nodes from text, and which we’ll refer to asNER++ (Section 3.1). The second step is relation

    Figure 1: The AMR graph for He gleefully ran tohis dog Rover. We show that improving the gen-eration of low level subgraphs (e.g., Rover gener-ating name :op1��! “Rover”) significantly improvesend-to-end performance.

    identification, which adds arcs to link these nodesinto a fully connected AMR graph, which we’llcall SRL++ (Section 3.2).

    We observe that SRL++ is not the hard part ofAMR parsing; rather, much of the difficulty inAMR is generating high accuracy concept sub-graphs from the NER++ component. For example,when the existing AMR parser JAMR (Flaniganet al., 2014) is given a gold NER++ output, andmust only perform SRL++ over given subgraphsit scores 80 F1 – nearly the inter-annotator agree-ment of 83 F1, and far higher than its end to endaccuracy of 59 F1.

    SRL++ within AMR is relatively easy given aperfect NER++ output, because so much pressureis put on the output of NER++ to carry meaningfulinformation. For example, there’s a strong type-check feature for the existence and type of any arcjust by looking at its end-points, and syntactic de-pendency features are very informative for remov-ing any remaining ambiguity. If a system is con-

    982

    “He gleefully ran to his dog Rover.”

    AMR Parsing

  • AMR  parsing  process

    Figure 2: A graphical explanation of our method. We represent the derivation process for He gleefullyran to his dog Rover. First the tokens in the sentence are labeled with derivation actions, then theseactions are used to generate AMR subgraphs, which are then stitched together to form a coherent whole.

    sidering how to link the node run-01 in Figure 1,the verb-sense frame for “run-01” leaves very littleuncertainty for what we could assign as an ARG0arc. It must be a noun, which leaves either he ordog, and this is easily decided in favor of he bylooking for an nsubj arc in the dependency parse.

    The primary contribution of this paper is a novelapproach to the NER++ task, illustrated in Fig-ure 2. We notice that the subgraphs aligned to lexi-cal items can often be generated from a small set ofgenerative actions which generalize across tokens.For example, most verbs generate an AMR nodecorresponding to the verb sense of the appropri-ate PropBank frame – e.g., run generates run-01in Figure 1. This allows us to frame the NER++task as the task of classifying one of a small num-ber of actions for each token, rather than choosinga specific AMR subgraph for every token in thesentence.

    Our approach to the end-to-end AMR parsingtask is therefore as follows: we define an actionspace for generating AMR concepts, and createa classifier for classifying lexical items into oneof these actions (Section 4). This classifier istrained from automatically generated alignmentsbetween the gold AMR trees and their associatedsentences (Section 5), using an objective which fa-vors alignment mistakes which are least harmful tothe NER++ component. Finally, the concept sub-graphs are combined into a coherent AMR parseusing the maximum spanning connected subgraph

    algorithm of Flanigan et al. (2014).We show that our approach provides a large

    boost to recall over previous approaches, and thatend to end performance is improved from 59 to62 smatch (an F1 measure of correct AMR arcs;see Cai and Knight (2013)) when incorporated intothe SRL++ parser of Flanigan et al. (2014). Whenevaluating the performance of our action classifierin isolation, we obtain an action classification ac-curacy of 84.1%.

    2 The AMR Formalism

    AMR is a language for expressing semantics asa rooted, directed, and potentially cyclic graph,where nodes represent concepts and arcs are re-lationships between concepts. AMR is basedon neo-Davidsonian semantics, (Davidson, 1967;Parsons, 1990). The nodes (concepts) in an AMRgraph do not have to be explicitly grounded in thesource sentence, and while such an alignment isoften generated to train AMR parsers, it is not pro-vided in the training corpora. The semantics ofnodes can represent lexical items (e.g., dog), sensetagged lexical items (e.g., run-01), type markers(e.g., date-entity), and a host of other phenomena.

    The edges (relationships) in AMR describe oneof a number of semantic relationships betweenconcepts. The most salient of these is seman-tic role labels, such as the ARG0 and destinationarcs in Figure 2. However, often these arcs define

    983

    AMR Parsing

    TextSTEP1: concept identification (NER++)

    STEP2: relation identification (SRL++)

  • Figure 2: A graphical explanation of our method. We represent the derivation process for He gleefullyran to his dog Rover. First the tokens in the sentence are labeled with derivation actions, then theseactions are used to generate AMR subgraphs, which are then stitched together to form a coherent whole.

    sidering how to link the node run-01 in Figure 1,the verb-sense frame for “run-01” leaves very littleuncertainty for what we could assign as an ARG0arc. It must be a noun, which leaves either he ordog, and this is easily decided in favor of he bylooking for an nsubj arc in the dependency parse.

    The primary contribution of this paper is a novelapproach to the NER++ task, illustrated in Fig-ure 2. We notice that the subgraphs aligned to lexi-cal items can often be generated from a small set ofgenerative actions which generalize across tokens.For example, most verbs generate an AMR nodecorresponding to the verb sense of the appropri-ate PropBank frame – e.g., run generates run-01in Figure 1. This allows us to frame the NER++task as the task of classifying one of a small num-ber of actions for each token, rather than choosinga specific AMR subgraph for every token in thesentence.

    Our approach to the end-to-end AMR parsingtask is therefore as follows: we define an actionspace for generating AMR concepts, and createa classifier for classifying lexical items into oneof these actions (Section 4). This classifier istrained from automatically generated alignmentsbetween the gold AMR trees and their associatedsentences (Section 5), using an objective which fa-vors alignment mistakes which are least harmful tothe NER++ component. Finally, the concept sub-graphs are combined into a coherent AMR parseusing the maximum spanning connected subgraph

    algorithm of Flanigan et al. (2014).We show that our approach provides a large

    boost to recall over previous approaches, and thatend to end performance is improved from 59 to62 smatch (an F1 measure of correct AMR arcs;see Cai and Knight (2013)) when incorporated intothe SRL++ parser of Flanigan et al. (2014). Whenevaluating the performance of our action classifierin isolation, we obtain an action classification ac-curacy of 84.1%.

    2 The AMR Formalism

    AMR is a language for expressing semantics asa rooted, directed, and potentially cyclic graph,where nodes represent concepts and arcs are re-lationships between concepts. AMR is basedon neo-Davidsonian semantics, (Davidson, 1967;Parsons, 1990). The nodes (concepts) in an AMRgraph do not have to be explicitly grounded in thesource sentence, and while such an alignment isoften generated to train AMR parsers, it is not pro-vided in the training corpora. The semantics ofnodes can represent lexical items (e.g., dog), sensetagged lexical items (e.g., run-01), type markers(e.g., date-entity), and a host of other phenomena.

    The edges (relationships) in AMR describe oneof a number of semantic relationships betweenconcepts. The most salient of these is seman-tic role labels, such as the ARG0 and destinationarcs in Figure 2. However, often these arcs define

    983

    STEP2: relation identification (SRL++)

    AMR  parsing  process

    AMR Parsing

    TextSTEP1: concept identification (NER++)

    単語の正規化etc→subgraph構築 単語とsubgraphのマッピング(Viterbi sequence model)

  • AMR  parsing  process

    Figure 2: A graphical explanation of our method. We represent the derivation process for He gleefullyran to his dog Rover. First the tokens in the sentence are labeled with derivation actions, then theseactions are used to generate AMR subgraphs, which are then stitched together to form a coherent whole.

    sidering how to link the node run-01 in Figure 1,the verb-sense frame for “run-01” leaves very littleuncertainty for what we could assign as an ARG0arc. It must be a noun, which leaves either he ordog, and this is easily decided in favor of he bylooking for an nsubj arc in the dependency parse.

    The primary contribution of this paper is a novelapproach to the NER++ task, illustrated in Fig-ure 2. We notice that the subgraphs aligned to lexi-cal items can often be generated from a small set ofgenerative actions which generalize across tokens.For example, most verbs generate an AMR nodecorresponding to the verb sense of the appropri-ate PropBank frame – e.g., run generates run-01in Figure 1. This allows us to frame the NER++task as the task of classifying one of a small num-ber of actions for each token, rather than choosinga specific AMR subgraph for every token in thesentence.

    Our approach to the end-to-end AMR parsingtask is therefore as follows: we define an actionspace for generating AMR concepts, and createa classifier for classifying lexical items into oneof these actions (Section 4). This classifier istrained from automatically generated alignmentsbetween the gold AMR trees and their associatedsentences (Section 5), using an objective which fa-vors alignment mistakes which are least harmful tothe NER++ component. Finally, the concept sub-graphs are combined into a coherent AMR parseusing the maximum spanning connected subgraph

    algorithm of Flanigan et al. (2014).We show that our approach provides a large

    boost to recall over previous approaches, and thatend to end performance is improved from 59 to62 smatch (an F1 measure of correct AMR arcs;see Cai and Knight (2013)) when incorporated intothe SRL++ parser of Flanigan et al. (2014). Whenevaluating the performance of our action classifierin isolation, we obtain an action classification ac-curacy of 84.1%.

    2 The AMR Formalism

    AMR is a language for expressing semantics asa rooted, directed, and potentially cyclic graph,where nodes represent concepts and arcs are re-lationships between concepts. AMR is basedon neo-Davidsonian semantics, (Davidson, 1967;Parsons, 1990). The nodes (concepts) in an AMRgraph do not have to be explicitly grounded in thesource sentence, and while such an alignment isoften generated to train AMR parsers, it is not pro-vided in the training corpora. The semantics ofnodes can represent lexical items (e.g., dog), sensetagged lexical items (e.g., run-01), type markers(e.g., date-entity), and a host of other phenomena.

    The edges (relationships) in AMR describe oneof a number of semantic relationships betweenconcepts. The most salient of these is seman-tic role labels, such as the ARG0 and destinationarcs in Figure 2. However, often these arcs define

    983

    AMR Parsing

    TextSTEP1: concept identification (NER++)

    STEP2: relation identification (SRL++)

    構築したsubgraphに対して言語制約を満たしながら全域化

  • AMR  parsing  process

    Figure 2: A graphical explanation of our method. We represent the derivation process for He gleefullyran to his dog Rover. First the tokens in the sentence are labeled with derivation actions, then theseactions are used to generate AMR subgraphs, which are then stitched together to form a coherent whole.

    sidering how to link the node run-01 in Figure 1,the verb-sense frame for “run-01” leaves very littleuncertainty for what we could assign as an ARG0arc. It must be a noun, which leaves either he ordog, and this is easily decided in favor of he bylooking for an nsubj arc in the dependency parse.

    The primary contribution of this paper is a novelapproach to the NER++ task, illustrated in Fig-ure 2. We notice that the subgraphs aligned to lexi-cal items can often be generated from a small set ofgenerative actions which generalize across tokens.For example, most verbs generate an AMR nodecorresponding to the verb sense of the appropri-ate PropBank frame – e.g., run generates run-01in Figure 1. This allows us to frame the NER++task as the task of classifying one of a small num-ber of actions for each token, rather than choosinga specific AMR subgraph for every token in thesentence.

    Our approach to the end-to-end AMR parsingtask is therefore as follows: we define an actionspace for generating AMR concepts, and createa classifier for classifying lexical items into oneof these actions (Section 4). This classifier istrained from automatically generated alignmentsbetween the gold AMR trees and their associatedsentences (Section 5), using an objective which fa-vors alignment mistakes which are least harmful tothe NER++ component. Finally, the concept sub-graphs are combined into a coherent AMR parseusing the maximum spanning connected subgraph

    algorithm of Flanigan et al. (2014).We show that our approach provides a large

    boost to recall over previous approaches, and thatend to end performance is improved from 59 to62 smatch (an F1 measure of correct AMR arcs;see Cai and Knight (2013)) when incorporated intothe SRL++ parser of Flanigan et al. (2014). Whenevaluating the performance of our action classifierin isolation, we obtain an action classification ac-curacy of 84.1%.

    2 The AMR Formalism

    AMR is a language for expressing semantics asa rooted, directed, and potentially cyclic graph,where nodes represent concepts and arcs are re-lationships between concepts. AMR is basedon neo-Davidsonian semantics, (Davidson, 1967;Parsons, 1990). The nodes (concepts) in an AMRgraph do not have to be explicitly grounded in thesource sentence, and while such an alignment isoften generated to train AMR parsers, it is not pro-vided in the training corpora. The semantics ofnodes can represent lexical items (e.g., dog), sensetagged lexical items (e.g., run-01), type markers(e.g., date-entity), and a host of other phenomena.

    The edges (relationships) in AMR describe oneof a number of semantic relationships betweenconcepts. The most salient of these is seman-tic role labels, such as the ARG0 and destinationarcs in Figure 2. However, often these arcs define

    983

    AMR Parsing

    TextSTEP1: concept identification

    STEP2: relation identification

    verb

    verb

    nsubj

    (係り受け解析より)

    JAMR [Flanigan et al., 2014] existing end-to-end AMR

    parser

  • AMR  parsing  process

    Figure 2: A graphical explanation of our method. We represent the derivation process for He gleefullyran to his dog Rover. First the tokens in the sentence are labeled with derivation actions, then theseactions are used to generate AMR subgraphs, which are then stitched together to form a coherent whole.

    sidering how to link the node run-01 in Figure 1,the verb-sense frame for “run-01” leaves very littleuncertainty for what we could assign as an ARG0arc. It must be a noun, which leaves either he ordog, and this is easily decided in favor of he bylooking for an nsubj arc in the dependency parse.

    The primary contribution of this paper is a novelapproach to the NER++ task, illustrated in Fig-ure 2. We notice that the subgraphs aligned to lexi-cal items can often be generated from a small set ofgenerative actions which generalize across tokens.For example, most verbs generate an AMR nodecorresponding to the verb sense of the appropri-ate PropBank frame – e.g., run generates run-01in Figure 1. This allows us to frame the NER++task as the task of classifying one of a small num-ber of actions for each token, rather than choosinga specific AMR subgraph for every token in thesentence.

    Our approach to the end-to-end AMR parsingtask is therefore as follows: we define an actionspace for generating AMR concepts, and createa classifier for classifying lexical items into oneof these actions (Section 4). This classifier istrained from automatically generated alignmentsbetween the gold AMR trees and their associatedsentences (Section 5), using an objective which fa-vors alignment mistakes which are least harmful tothe NER++ component. Finally, the concept sub-graphs are combined into a coherent AMR parseusing the maximum spanning connected subgraph

    algorithm of Flanigan et al. (2014).We show that our approach provides a large

    boost to recall over previous approaches, and thatend to end performance is improved from 59 to62 smatch (an F1 measure of correct AMR arcs;see Cai and Knight (2013)) when incorporated intothe SRL++ parser of Flanigan et al. (2014). Whenevaluating the performance of our action classifierin isolation, we obtain an action classification ac-curacy of 84.1%.

    2 The AMR Formalism

    AMR is a language for expressing semantics asa rooted, directed, and potentially cyclic graph,where nodes represent concepts and arcs are re-lationships between concepts. AMR is basedon neo-Davidsonian semantics, (Davidson, 1967;Parsons, 1990). The nodes (concepts) in an AMRgraph do not have to be explicitly grounded in thesource sentence, and while such an alignment isoften generated to train AMR parsers, it is not pro-vided in the training corpora. The semantics ofnodes can represent lexical items (e.g., dog), sensetagged lexical items (e.g., run-01), type markers(e.g., date-entity), and a host of other phenomena.

    The edges (relationships) in AMR describe oneof a number of semantic relationships betweenconcepts. The most salient of these is seman-tic role labels, such as the ARG0 and destinationarcs in Figure 2. However, often these arcs define

    983

    AMR Parsing

    TextSTEP1: concept identification

    STEP2: relation identification

    テキストからの意味情報抽出で 重要となるのはこっち

    提案: AMR subgraph を作成

  • AMR  subgraph

    • トークンとAMRのnodeは⾮非⼀一対⼀一対応  

    • 1つのトークン、または複数のトークンから、nodeを複数含むAMRのsubgraphが⽣生成  • subgraph=1つのconceptの表現

    Figure 3: AMR representation of the word sailor,which is notable for breaking the word up intoa self-contained multi-node unit unpacking thederivational morphology of the word.

    a semantics more akin to syntactic dependencies(e.g., mod standing in for adjective and adverbialmodification), or take on domain-specific mean-ing (e.g., the month, day, and year arcs of a date-entity).

    To introduce AMR and its notation in more de-tail, we’ll unpack the translation of the sentence“he gleefully ran to his dog Rover.” We show inFigure 1 the interpretation of this sentence as anAMR graph.

    The root node of the graph is labeled run-01,corresponding to the PropBank (Palmer et al.,2005) definition of the verb ran. run-01 has anoutgoing ARG0 arc to a node he, with the usualPropBank semantics. The outgoing mod edgefrom run-01 to glee takes a general purpose se-mantics corresponding to adjective, adverbial, orother modification of the governor by the depen-dent. We note that run-01 has a destination arc todog. The label for destination is taken from a finiteset of special arc sense tags similar to the prepo-sition senses found in (Srikumar, 2013). The lastportion of the figure parses dog to a node whichserves as a type marker similar to named entitytypes, and Rover into the larger subgraph indicat-ing a concept with name “Rover.”

    2.1 AMR Subgraphs

    The mapping from tokens of a sentence to AMRnodes is not one-to-one. A single token or spanof tokens can generate a subgraph of AMR con-sisting of multiple nodes. These subgraphs canlogically be considered the expression of a singleconcept, and are useful to treat as such (e.g., seeSection 3.1).

    Many of these multi-node subgraphs capturestructured data such as time expressions, as in Fig-

    Figure 4: AMR representation of the span Jan-uary 1, 2008, an example of how AMR can rep-resent structured data by creating additional nodessuch as date-entity to signify the presence of spe-cial structure.

    ure 4. In this example, a date-entity node is cre-ated to signify that this cluster of nodes is part ofa structured sub-component representing a date,where the nodes and arcs within the componenthave specific semantics. This illustrates a broaderrecurring pattern in AMR: an artificial node may,based on its title, have expected children with spe-cial semantics. A particularly salient example ofthis pattern is the name node (see “Rover” in Fig-ure 1) which signifies that all outgoing arcs withlabel op comprise the tokens of a name object.

    The ability to decouple the meaning representa-tion of a lexical item from its surface form allowsfor rich semantic interpretations of certain con-cepts in a sentence. For example, the token sailoris represented in Figure 3 by a concept graph rep-resenting a person who performs the action sail-01. Whereas often the AMR node aligned to aspan of text is a straightforward function of thetext, these cases remain difficult to capture in aprincipled way beyond memorizing mappings be-tween tokens and subgraphs.

    3 Task Decomposition

    To the best of our knowledge, the JAMR parseris the only published end-to-end AMR parser atthe time of publication. An important insight inJAMR is that AMR parsing can be broken intotwo distinct tasks: (1) NER++ (concept identifi-cation): the task of interpreting what entities arebeing referred to in the text, realized by gener-ating the best AMR subgraphs for a given set oftokens, and (2) SRL++ (relation identification):the task of discovering what relationships exist be-tween entities, realized by taking the disjoint sub-graphs generated by NER++ and creating a fully-connected graph. We describe both tasks in more

    984

    January 1, 2008AMR subgraph

  • A  Novel  NER++  Method

    • 現状:  • AMRの訓練データは少  

    • トークンとsubgraphをマッピング対応  

    • 提案⼿手法;  action-‐‑‒based  NER++    • subgraphを⽣生成する9つのactionを定義  

    • 各トークンのactionを決定する分類器を作成

  • 9  actionsACTION 内容IDENTITY nodeのタイトルがトークンと一致NONE トークンはnodeに割り当てないVERB ProbBankから最類似動詞を得るVALUE 数値変換LEMMA LemmatizeNAME name nodeを付与PERSON person node, name nodeを追加

    DATESUTimeの出力を利用[Chang and Manning, 2012]

    date-entityに変換DICT 他のactionへのback off

  • Action  Reliability

    • 予測したactionが誤りである場合もある  

    • 各actionの信頼度度合い(reliability)を開発データを⽤用いて調査

    there are often discrepancies between the lemmagenerated by the lemmatizer and the correct AMRlemma.

    NAME AMR often references names with aspecial structured data type: the name construc-tion. For example, Rover in Figure 1. We cancapture this phenomenon on unseen names by at-taching a created name node to the top of a span.

    PERSON A variant of the NAME action, thisaction produces a subgraph identical to the NAMEaction, but adds a node person as a parent. Thisis, in effect, a name node with an implicit entitytype of person. Due to discrepancies between theoutput of our named entity tagger and the richerAMR named entity ontology, we only apply thistag to the person named entity tag.

    DATE The most frequent of the structured datatype in the data, after name, is the date-entity con-struction (for an example see Figure 4). We de-terministically take the output of SUTime (Changand Manning, 2012) and convert it into the date-entity AMR representation.

    DICT This class serves as a back-off for theother classes, implementing an approach similarto Flanigan et al. (2014). In particular, we mem-orize a simple mapping from spans of text (suchas sailor) to their corresponding most frequentlyaligned AMR subgraphs in the training data (i.e.,the graph in Figure 3). See Section 5 for detailson the alignment process. At test time we can do alookup in this dictionary for any element that getslabeled with a DICT action. If an entry is notfound in the mapping, we back off to the secondmost probable class proposed by the classifier.

    It is worth observing at this point that our ac-tions derive much of their power from the similar-ity between English words and their AMR coun-terparts; creating an analogue of these actions forother languages remains an open problem.

    4.2 Action ReliabilityIn many cases, multiple actions could yield thesame subgraph when applied to a node. In thissection we introduce a method for resolving thisambiguity based on comparing the reliability withwhich actions generate the correct subgraph, anddiscuss implications.

    Even given a perfect action classification fora token, certain action executions can introduce

    Figure 5: Reliability of each action. The top roware actions which are deterministic; the secondrow occasionally produce errors. DICT is the leastpreferred action, with a relatively high error rate.

    errors. Some of our actions are entirely deter-ministic in their conversion from the word to theAMR subgraph (e.g., IDENTITY), but others areprone to making mistakes in this conversion (e.g.,VERB, DICT). We define the notion of action re-liability as the probability of deriving the correctnode from a span of tokens, conditioned on hav-ing chosen the correct action.

    To provide a concrete example, our dictionarylookup classifier predicts the correct AMR sub-graph 67% of the time on the dev set. We thereforedefine the reliability of the DICT action as 0.67.In contrast to DICT, correctly labeling a node asIDENTITY, NAME, PERSON, and NONE haveaction reliability of 1.0, since there is no ambigu-ity in the node generation once one of those ac-tions have been selected, and we are guaranteed togenerate the correct node given the correct action.

    We can therefore construct a hierarchy of reli-ability (Figure 5) – all else being equal, we pre-fer to generate actions from higher in the hierar-chy, as they are more likely to produce the cor-rect subgraph. This hierarchy is useful in resolv-ing ambiguity throughout our system. During thecreation of training data for our classifier (Sec-tion 4.3) from our aligner, when two actions couldboth generate the aligned AMR node we prefer themore reliable one. In turn, in our aligner we biasalignments towards those which generating morereliable action sequences as training data (see Sec-tion 5).

    The primary benefit of this action-basedNER++ approach is that we can reduce the us-age of low reliability actions, like DICT. Theapproach taken in Flanigan et al. (2014) can be

    986

  • Action  Reliability

    • 開発データ中のaction分布  

    • 全体の74%(non-‐‑‒DICT)がreliability≧0.9Action # Tokens % TotalNONE 41538 36.2DICT 30027 26.1IDENTITY 19034 16.6VERB 11739 10.2LEMMA 5029 4.5NAME 4537 3.9DATE 1418 1.1PERSON 1336 1.1VALUE 122 0.1

    Table 1: Distribution of action types in theproxy section of the newswire section of theLDC2014T12 dataset, generated from automati-cally aligned data.

    Input token; word embeddingLeft+right token / bigramToken length indicatorToken starts with “non”POS; Left+right POS / bigramDependency parent token / POSIncoming dependency arcBag of outgoing dependency arcsNumber of outgoing dependency arcsMax Jaro-Winkler to any lemma in PropBankOutput tag of the VERB action if appliedOutput tag of the DICT action if appliedNER; Left+right NER / bigramCapitalizationIncoming prep * or appos + parent has NERToken is pronounToken is part of a coref chainToken pronoun and part of a coref chain

    Table 2: The features for the NER++ maxent clas-sifier.

    thought of as equivalent to classifying every tokenas the DICT action.

    We analyze the empirical distribution of actionsin our automatically aligned corpus in Table 1.The cumulative frequency of the non-DICT ac-tions is striking: we can generate 74% of the to-kens with high reliability (p � 0.9) actions. In thislight, it is unsurprising that our results demonstratea large gain in recall on the test set.

    4.3 Training the Action Classifier

    Given a set of AMR training data, in the form of(graph, sentence) pairs, we first induce alignmentsfrom the graph nodes to the sentence (see Sec-tion 5). Formally, for every node n

    i

    in the AMRgraph, alignment gives us some token s

    j

    (at thejth index in the sentence) that we believe gener-ated the node n

    i

    .

    Then, for each action type, we can ask whetheror not that action type is able to take token s

    j

    andcorrectly generate n

    i

    . For concreteness, imaginethe token s

    j

    is running, and the node ni

    has thetitle run-01. The two action types we find that areable to correctly generate this node are DICT andVERB. We choose the most reliable action typeof those available (see Figure 5) to generate theobserved node – in this case, VERB.

    In cases where an AMR subgraph is generatedfrom multiple tokens, we assign the action label toeach token which generates the subgraph. Each ofthese tokens are added to the training set; at testtime, we collapse sequences of adjacent identicalaction labels, and apply the action once to the re-sulting token span.

    Inducing the most reliable action (according tothe alignments) for every token in the training cor-pus provides a supervised training set for our ac-tion classifier, with some noise introduced by theautomatically generated alignments. We then traina simple maxent classifier1 to make action deci-sions at each node. At test time, the classifier takesas input a pair hi, Si, where i is the index of the to-ken in the input sentence, and S is a sequence to-kens representing the source sentence. It then usesthe features in Table 2 to predict the actions to takeat that node.

    5 Automatic Alignment of Training Data

    AMR training data is in the form of bi-text, wherewe are given a set of (sentence, graph) pairs, withno explicit alignments between them. We wouldlike to induce a mapping from each node in theAMR graph to the token it represents. It is per-fectly possible for multiple nodes to align to thesame token – this is the case with sailors, for in-stance.

    It is not possible, within our framework, to rep-resent a single node being sourced from multi-ple tokens. Note that a subgraph can consist ofmany individual nodes; in cases where a subgraphshould align to multiple tokens, we generate analignment from the subgraph’s nodes to the associ-ated tokens in the sentence. It is empirically veryrare for a subgraph to have more nodes than thetoken span it should align to.

    There have been two previous attempts at pro-ducing automatic AMR alignments. The first was

    1A sequence model was tried and showed no improve-ment over a simple maxent classifier.

    987

  • Training  the  action  classifier• 訓練データは組(graph,  sentence)  • AMRのnode(ni)は、トークン(sj)と必ず対応とする  

    • 全actionに対して、sjからniが正しく⽣生成されるか否かを調べる  +  reliabilityで判断  

    • e.g.  “running”  -‐‑‒>  DICT  /  VERB  -‐‑‒>  VERB  

    • 最⼤大エントロピー分類器で学習  

    • ⼊入⼒力力(i,  S);  S=トークン列列    

    • 18素性

  • Automatic  Alignment  of  Training  Data

    • actionの対数尤度度最⼤大化  

    • Q  =  {0,1}|N|×|S|;|N|:  AMRグラフ内node数,  |S|:⽂文内トークン数  

    • V  =  T|N|×|S|                    ; Vi, j=action

    • ε: Jaro-Winkler類似度, α: ハイパーパラメータ  (=0.8)

    published as a component of JAMR, and used arule-based approach to perform alignments. Thiswas shown to work well on the sample of 100hand-labeled sentences used to develop the sys-tem. Pourdamghani et al. (2014) approached thealignment problem in the framework of the IBMalignment models. They rendered AMR graphs astext, and then used traditional machine translationalignment techniques to generate an alignment.

    We propose a novel alignment method, sinceour decomposition of the AMR node generationprocess into a set of actions provides an additionalobjective for the aligner to optimize, in addition tothe accuracy of the alignment itself. We would liketo produce the most reliable sequence of actionsfor the NER++ model to train from, where reliableis taken in the sense defined in Section 4.2. To givean example, a sequence of all DICT actions couldgenerate any AMR graph, but is very low reliabil-ity. A sequence of all IDENTITY actions couldonly generate one set of nodes, but does it withabsolute certainty.

    We formulate this objective as a Boolean LPproblem. Let Q be a matrix in {0, 1}|N|⇥|S| ofBoolean constrained variables, where N are thenodes in an AMR graph, and S are the tokens inthe sentence. The meaning of Q

    i,j

    = can beinterpreted as node n

    i

    having being aligned to to-ken s

    j

    . Furthermore, let V be a matrix T |N|⇥|S|,where T is the set of NER++ actions from Sec-tion 4. Each matrix element V

    i,j

    is assigned themost reliable action which would generate noden

    i

    from token sj

    . We would like to maximize theprobability of the actions collectively generating aperfect set of nodes. This can be formulated lin-early by maximizing the log-likelihood of the ac-tions. Let the function REL(l) be the reliability ofaction l (probability of generating intended node).Our objective can then be formulated as follows:

    maxQ

    X

    i,j

    Qi,j

    [log(REL(Vi,j

    )) + ↵Ei,j

    ] (1)

    s.t.X

    j

    Qi,j

    = 1 8i (2)

    Qk,j

    + Ql,j

    1 8k, l, j; nk

    = nl

    (3)

    where E is the Jaro-Winkler similarity between thetitle of the node i and the token j, ↵ is a hyper-parameter (set to 0.8 in our experiments), and theoperator = denotes that two nodes in the AMRgraph are both not adjacent and do not have thesame title.

    The constraint (2), combined with the binaryconstraint on Q, ensures that every node in thegraph is aligned to exactly one token in the sourcesentence. The constraint (3) ensures that only ad-jacent nodes or nodes that share a title can refer tothe same token.

    The objective value penalizes alignments whichmap to the unreliable DICT tag, while rewardingalignments with high overlap between the title ofthe node and the token. Note that most incorrectalignments fall into the DICT class by default, asno other action could generate the correct AMRsubgraph. Therefore, if there exists an alignmentthat would consume the token using another ac-tion, the optimization prefers that alignment. TheJaro-Winkler similarity term, in turn, serves asa tie-breaker between equally (un)reliable align-ments.

    There are many packages which can solvethis Boolean LP efficiently. We used Gurobi(Gurobi Optimization, 2015). Given a matrix Qthat maximizes our objective, we can decode oursolved alignment as follows: for each i, align n

    i

    to the j s.t. Qi,j

    = 1. By our constraints, exactlyone such j must exist.

    6 Related Work

    Prior work in AMR and related formalisms in-clude Jones et al. (2012), and Flanigan et al.(2014). Jones et al. (2012), motivated by appli-cations in Machine Translation, proposed a graph-ical semantic meaning representation that predatesAMR, but is intimately related. They proposea hyper-edge replacement grammar (HRG) ap-proach to parsing into and out of this graphicalsemantic form. Flanigan et al. (2014) forms thebasis of the approach of this paper. Their systemintroduces the two-stage approach we use: theyimplement a rule-based alignment to learn a map-ping from tokens to subgraphs, and train a vari-ant of a maximum spanning tree parser adapted tographs and with additional constraints for their re-lation identifications (SRL++) component. Wanget al. (2015) uses a transition based algorithmto transform dependency trees into AMR parses.They achieve 64/62/63 P/R/F1 with contributionsroughly orthogonal to our own. Their transforma-tion action set could be easily augmented by therobust subgraph generation we propose here, al-though we leave this to future work.

    Beyond the connection of our work with Flani-

    988

  • 実験

    • データセット  • ニュース記事(LDC2014T12,  LDC2013E117)  

    • end-‐‑‒to-‐‑‒endのJAMR  parserを⽤用いて⽐比較  • JAMR  parser  (default)  [Flanigan  et  al.,  2014]  

    • JAMR  SRL++  with  proposed  NER++  approach  

    • 評価指標  • smatch  =s(emantic)  match  [Cai  and  Knight,  2013]  

    • (parent,  edge,  child)の三つ組で評価

  • 実験結果

    • Recall値を⼤大きく改善  

    • 提案NER++で、より良良いsubgraphを構築    

    • SRL++⾃自体の改善にまではいたっていない

    gan et al. (2014), we note that the NER++ com-ponent of AMR encapsulates a number of lex-ical NLP tasks. These include named entityrecognition (Nadeau and Sekine, 2007; Finkel etal., 2005), word sense disambiguation (Yarowsky,1995; Banerjee and Pedersen, 2002), lemmatiza-tion, and a number of more domain specific tasks.For example, a full understanding of AMR re-quires normalizing temporal expressions (Verha-gen et al., 2010; Strötgen and Gertz, 2010; Changand Manning, 2012).

    In turn, the SRL++ facet of AMR takes manyinsights from semantic role labeling (Gildea andJurafsky, 2002; Punyakanok et al., 2004; Sriku-mar, 2013; Das et al., 2014) to capture the rela-tions between verbs and their arguments. In addi-tion, many of the arcs in AMR have nearly syntac-tic interpretations (e.g., mod for adjective/adverbmodification, op for compound noun expressions).These are similar to representations used in syn-tactic dependency parsing (de Marneffe and Man-ning, 2008; McDonald et al., 2005; Buchholz andMarsi, 2006).

    More generally, parsing to a semantic represen-tation is has been explored in depth for when therepresentation is a logical form (Kate et al., 2005;Zettlemoyer and Collins, 2005; Liang et al., 2011).Recent work has applied semantic parsing tech-niques to representations beyond lambda calculusexpressions. For example, work by Berant et al.(2014) parses text into a formal representation ofa biological process. Hosseini et al. (2014) solvesalgebraic word problems by parsing them into astructured meaning representation. In contrast tothese approaches, AMR attempts to capture opendomain semantics over arbitrary text.

    Interlingua (Mitamura et al., 1991; Carbonell etal., 1999; Levin et al., 1998) are an important in-spiration for decoupling the semantics of the AMRlanguage from the surface form of the text beingparsed; although, AMR has a self-admitted En-glish bias.

    7 Results

    We present improvements in end-to-end AMRparsing on two datasets using our NER++ compo-nent. Action type classifier accuracy on an auto-matically aligned corpus and alignment accuracyon a small hand-labeled corpus are also reported.

    Dataset System P R F1

    2014T12 JAMR 67.1 53.2 59.3Our System 66.6 58.3 62.2

    2013E117 JAMR 66.9 52.9 59.1Our System 65.9 59.0 62.3

    Table 3: Results on two AMR datasets for JAMRand our NER++ embedded in the JAMR SRL++component. Note that recall is consistently higheracross both datasets, with only a small loss in pre-cision.

    7.1 End-to-end AMR Parsing

    We evaluate our NER++ component in the contextof end-to-end AMR parsing on two corpora: thenewswire section of LDC2014T12 and the splitgiven in Flanigan et al. (2014) of LDC2013E117,both consisting primarily of newswire. We com-pare two systems: the JAMR parser (Flaniganet al., 2014),2 and the JAMR SRL++ componentwith our NER++ approach.

    AMR parsing accuracy is measured with a met-ric called smatch (Cai and Knight, 2013), whichstands for “s(emantic) match.” The metric is the F1of a best-match between triples implied by the tar-get graph, and triples in the parsed graph – that is,the set of (parent, edge, child) triples in the graph.

    Our results are given in Table 3. We reportmuch higher recall numbers on both datasets, withonly small ( 1 point) loss in precision. Thisis natural considering our approach. A betterNER++ system allows for more correct AMR sub-graphs to be generated – improving recall – butdoes not in itself necessarily improve the accuracyof the SRL++ system it is integrated in.

    7.2 Component Accuracy

    We evaluate our aligner on a small set of 100 hand-labeled alignments, and evaluate our NER++ clas-sifier on automatically generated alignments overthe whole corpus,

    On a hand-annotated dataset of 100 AMRparses from the LDC2014T12 corpus,3 our alignerachieves an accuracy of 83.2. This is a measure-ment of the percentage of AMR nodes that arealigned to the correct token in their source sen-tence. Note that this is a different metric than the

    2Available at https://github.com/jflanigan/jamr.

    3Our dataset is publicly available at http://nlp.stanford.edu/projects/amr

    989

  • まとめ

    • AMR  Parsingの性能改善  

    • action-‐‑‒based  NER++の提案  • 9  actionsを定義  

    • 分類器による判定→subgraphを⽣生成  

    • 実験:  Recall値、F1値の改善  (=NER++の改善)  

    • 今後の課題:  actionのカバレッジや、lemmatizerの改善