Segment-Level Sequence Tagging using Gated Recursive Semi...
Transcript of Segment-Level Sequence Tagging using Gated Recursive Semi...
![Page 1: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries](https://reader031.fdocument.pub/reader031/viewer/2022021603/5d673b3c88c9933b138b561e/html5/thumbnails/1.jpg)
Segment-Level Sequence Tagging using GatedRecursive Semi-Markov Conditional Random Fields
Jingwei Zhuo1,2, Yong Cao2, Jun Zhu1, Bo Zhang1, ZaiqingNie2
1Dept. of Comp. Sci. & Tech., Tsinghua University, Beijing2Microsoft Research, Beijing
June 16, 2016
![Page 2: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries](https://reader031.fdocument.pub/reader031/viewer/2022021603/5d673b3c88c9933b138b561e/html5/thumbnails/2.jpg)
Outline
• Motivations• Our Model• Experiments• Conclusion and Future Work
2 of 13
![Page 3: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries](https://reader031.fdocument.pub/reader031/viewer/2022021603/5d673b3c88c9933b138b561e/html5/thumbnails/3.jpg)
Motivations
• Sequence tagging problems◦ Given a sentence x = (x1, ..., xT ), to assign each word or each set
of words (i.e., segment) a tag.◦ E.g, [NP He] [VP reckons] [NP the current account deficit] [VP will
narrow] [PP to] [NP only 1.8 billion] [PP in] [NP September]• Word-level modeling: Conditional Random Fields (CRFs)◦ To represent tags as y = (y1, ..., yT )
p(y|x) = 1Z (x)
exp
(T∑
t=1
F (yt ,x) + A(yt−1, yt)
), (1)
F (yt ,x) = vTyt
f(yt ,x) + byt , (2)
◦ F (yt ,x): tag score; f(yt ,x): features.◦ f(yt ,x) can be hand-crafted or automatically extracted (neural
networks).3 of 13
![Page 4: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries](https://reader031.fdocument.pub/reader031/viewer/2022021603/5d673b3c88c9933b138b561e/html5/thumbnails/4.jpg)
Motivations
• Segment-level modeling: Semi-Markov CRFs (Semi-CRFs)◦ To represent tags as s = (s1, ..., sT ), where sj = 〈hj ,dj , yj〉
p(s|x) = 1Z (x)
exp
|s|∑j=1
F (sj ,x) + A(yj−1, yj)
, (3)
F (sj ,x) = vTyj
f(sj ,x) + byj , (4)
◦ Pros: Modeling segments directly.◦ Cons: Hard to design features; No automatic feature extractor.
• Can we combine the advantages of Semi-CRFs and neuralnetworks?◦ Fully leveraging segment-level information.◦ Features for Semi-CRFs can be extracted automatically.
4 of 13
![Page 5: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries](https://reader031.fdocument.pub/reader031/viewer/2022021603/5d673b3c88c9933b138b561e/html5/thumbnails/5.jpg)
Our Model
• Key idea: To extract features for all the segments by onepropagation using gated recursive convolutional neuralnetworks (grConvs) (Cho et al., 2014)
5 of 13
![Page 6: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries](https://reader031.fdocument.pub/reader031/viewer/2022021603/5d673b3c88c9933b138b561e/html5/thumbnails/6.jpg)
Our Model
• Local building block (Cho et al., 2014)
z(d)k = θL ◦ z(d−1)k + θR ◦ z(d−1)
k+1 + θM ◦ z(d)k , (5)
z(d)k = g(WLz(d−1)k + WRz(d−1)
k+1 + bW), (6)
◦ Intuition: sources for the construction of one segment z(d)k• prefix z(d−1)
k
• suffix z(d−1)k+1
• Interaction of both, i.e., z(d−1)k
6 of 13
![Page 7: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries](https://reader031.fdocument.pub/reader031/viewer/2022021603/5d673b3c88c9933b138b561e/html5/thumbnails/7.jpg)
Our Model
• Connection to Semi-CRFs◦ For segment sj = 〈hj ,dj , yj〉,
• feature f (sj , x) = z(d)hj
,
• tag score F (sj , x) = v(dj )yj
Tf(sj , x) + byj .
7 of 13
![Page 8: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries](https://reader031.fdocument.pub/reader031/viewer/2022021603/5d673b3c88c9933b138b561e/html5/thumbnails/8.jpg)
Experiments
• Settings◦ Datasets
• CONLL-2000 dataset (text chunking) and CONLL-2003 dataset(named entity recognition, NER)
◦ Compared models• Neural models: Senna (Collobert et al., 2011) and BI-LSTM-CRF
(Huang et al., 2015)• Non-neural models: JESS-CM (Suzuki and Isozaki, 2008), etc.
◦ Hyperparameters
Hyperparameters CONLL 2000 CONLL 2003Segment length 15 10Dropout 0.3 0.3Learning rate 0.3 0.3Epochs 15 20Minibatches 10 10Window width 2 2
8 of 13
![Page 9: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries](https://reader031.fdocument.pub/reader031/viewer/2022021603/5d673b3c88c9933b138b561e/html5/thumbnails/9.jpg)
Experiments
• Comparison with the state-of-the-art.
Models CONLL 2000 CONLL 2003
OursgrSemi-CRF (Random embeddings) 93.92 84.66grSemi-CRF (Senna embeddings) 95.01 89.44 (90.87)
Neural Models
Senna (Random embeddings) 90.33 81.47Senna (Senna embeddings) 94.32 88.67 (89.59)
BI-LSTM-CRF (Random embeddings) 94.13 84.26BI-LSTM-CRF (Senna embeddings) 94.46 88.83 (90.10)
Non-Neural Models
JESS-CM (Suzuki and Isozaki, 2008), 15M 94.67 89.36JESS-CM (Suzuki and Isozaki, 2008), 1B 95.15 89.92
Ratinov and Roth (2009) – 90.57Lin and Wu (2009) – 90.90
Passos et al. (2014) – 90.90
9 of 13
![Page 10: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries](https://reader031.fdocument.pub/reader031/viewer/2022021603/5d673b3c88c9933b138b561e/html5/thumbnails/10.jpg)
Experiments
• Impact of external information• Embeddings, Brown clusters and gazetteers
Input Features CONLL 2000 CONLL 2003None 93.92 84.66
Brown(NYT) 94.18 86.57Brown(RCV1) 94.05 88.22
Emb 94.73 88.12Gaz – 87.94
Emb + Brown(NYT) 95.01 88.86Emb + Brown(RCV1) 94.87 89.44
Emb + Gaz – 89.88Brown(NYT) + Gaz – 88.69
Brown(RCV1) + Gaz – 89.82All(NYT) – 90.00
All(RCV1) – 90.87
10 of 13
![Page 11: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries](https://reader031.fdocument.pub/reader031/viewer/2022021603/5d673b3c88c9933b138b561e/html5/thumbnails/11.jpg)
Experiments
• Visualization of segment-level features learnt on the CONLL2003 dataset.
QueriedFilippo Inzaghi AC Milan Central African Republic Asian Cup
SegmentsPierluigi Casiraghi FC Hansa From Central African Republic Scottish CupFabrizio Ravanelli SC Freiburg Southeast Asian Nations European Cup
Bogdan Stelea FC Cologne In Central African Republic African CupNearest Francesco Totti Aston Villa The Central African Republic World Cup
Neighbour Predrag Mijatovic Red Cross South African Breweries UEFA CupResults Fausto Pizzi Yasuto Honda Of Southeast Asian Nations Europoean Cup
Pierre Laigle NAC Breda New South Wales Asian GamesPavel Nedved La Plagne Central African Republic . Europa Cup
Anghel Iordanescu Sporting Gijon Papua New Guinea National LeagueZeljko Petrovic NEC Nijmegen Central Africa F.A. Cup
11 of 13
![Page 12: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries](https://reader031.fdocument.pub/reader031/viewer/2022021603/5d673b3c88c9933b138b561e/html5/thumbnails/12.jpg)
Conclusions and Future Work
• We proposed grSemi-CRF, a neural network-based model forsegment-level sequence tagging that◦ models segments explicitly as Semi-CRFs,◦ extracts segment-level features automatically,◦ and achieves high performance compared to CRF models
• Future Work◦ Exploring better way to utilize unlabelled data, e.g., learning
segment-level embeddings in an unsupervised way◦ Extending to other sequence tagging tasks
12 of 13
![Page 13: Segment-Level Sequence Tagging using Gated Recursive Semi ...qngw2014.bj.bcebos.com/upload/2016/06/17-卓靖炜.pdf · Neighbour Predrag Mijatovic Red Cross South African Breweries](https://reader031.fdocument.pub/reader031/viewer/2022021603/5d673b3c88c9933b138b561e/html5/thumbnails/13.jpg)
Questions and Answers
• Motivations◦ Segment-level sequence tagging◦ CRFs and Semi-CRFs
• Our Model◦ Feature extractor◦ Connection to Semi-CRFs
• Experiments◦ Comparison with state-of-the-art◦ Impact of external information◦ Visualization of learnt features
• Conclusion and Future Work
13 of 13