NAISTビッグデータシンポジウム - 情報 松本先生
-
Upload
ysuzuki-naist -
Category
Technology
-
view
1.045 -
download
0
Transcript of NAISTビッグデータシンポジウム - 情報 松本先生
![Page 1: NAISTビッグデータシンポジウム - 情報 松本先生](https://reader033.fdocument.pub/reader033/viewer/2022051006/58eefc9c1a28ab25588b46bf/html5/thumbnails/1.jpg)
Scientific Paper Analysis
Yuji MatsumotoComputational Linguistics Lab
Graduate School of Information Science
March 6, 2015Big Data Symposium
at NAIST
![Page 2: NAISTビッグデータシンポジウム - 情報 松本先生](https://reader033.fdocument.pub/reader033/viewer/2022051006/58eefc9c1a28ab25588b46bf/html5/thumbnails/2.jpg)
Large Scale Text DataData on the Web SNS: twitter, blog Wikipedia News, …Scientific/Technical documents Scientific Papers Legal documents: law reports, casebooks Patent documents
![Page 3: NAISTビッグデータシンポジウム - 情報 松本先生](https://reader033.fdocument.pub/reader033/viewer/2022051006/58eefc9c1a28ab25588b46bf/html5/thumbnails/3.jpg)
Knowledge BasesConstructed manually WordNet, Domain ontologiesConstructed by community (Wikipedia) FreebaseConstructed automatically NELL: Never-Ending Language Learning MindNet
![Page 4: NAISTビッグデータシンポジウム - 情報 松本先生](https://reader033.fdocument.pub/reader033/viewer/2022051006/58eefc9c1a28ab25588b46bf/html5/thumbnails/4.jpg)
ApplicationsKnowledge Graph (Google) Knowledge extracted from Freebase,
Wikipedia, …
Watson (IBM) Extracted from Wikipedia Deep QA
![Page 5: NAISTビッグデータシンポジウム - 情報 松本先生](https://reader033.fdocument.pub/reader033/viewer/2022051006/58eefc9c1a28ab25588b46bf/html5/thumbnails/5.jpg)
Structures of KBLinked structure entities and relations PDF
Entity: person, country, products, etc Relation: born_in(Barack Obama, Honolulu) locates_in(Honolulu, Hawaii) state_of(Hawaii, USA)
![Page 6: NAISTビッグデータシンポジウム - 情報 松本先生](https://reader033.fdocument.pub/reader033/viewer/2022051006/58eefc9c1a28ab25588b46bf/html5/thumbnails/6.jpg)
Natural Language AnalysisHow text is analyzed Word segmentation, Part-of-speech
tagging Named entity recognition Syntactic parsing Semantic disambiguation Semantic parsing Discourse analysis
![Page 7: NAISTビッグデータシンポジウム - 情報 松本先生](https://reader033.fdocument.pub/reader033/viewer/2022051006/58eefc9c1a28ab25588b46bf/html5/thumbnails/7.jpg)
Linked Knowledge Extraction
Named entity recognition Extraction of entities, concepts
Syntactic dependency parsing direct dependency between entities
Semantic parsing predicate argument structure analysis subject-predicate-object, relation between
entitiesDiscourse analysis co-reference – the same entity by different
mentions relation between facts: temporal, causal
![Page 8: NAISTビッグデータシンポジウム - 情報 松本先生](https://reader033.fdocument.pub/reader033/viewer/2022051006/58eefc9c1a28ab25588b46bf/html5/thumbnails/8.jpg)
8
We analyzed the effect on the binding and the activity of transcription factors at a regulatory element.
TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element.
TPA induction increases the binding of AP-1 factors to this element.
Cause ThemeTheme
Theme Theme
S1
S2
S3
Semantic Parsing: Example
Katsumasa Yoshikawa, Sebastian Riedel, Tsutomu Hirao, Masayuki Asahara, Yuji Matsumoto,"Coreference Based Event-Argument Relation Extraction on Biomedical Text,“Journal of Biomedical Semantics, Volume 2, Supplement 5, S6, October 2011
![Page 9: NAISTビッグデータシンポジウム - 情報 松本先生](https://reader033.fdocument.pub/reader033/viewer/2022051006/58eefc9c1a28ab25588b46bf/html5/thumbnails/9.jpg)
9
"this element" in S2 is coreferent to… "a regulatory element" in S1
We analyzed the effect on the binding and the activity of transcription factors at a regulatory element. Corefer
TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element.
TPA induction increases the binding of AP-1 factors to this element.
Cause ThemeTheme
Theme Theme
S1
S2
S3
Co-reference analysis
![Page 10: NAISTビッグデータシンポジウム - 情報 松本先生](https://reader033.fdocument.pub/reader033/viewer/2022051006/58eefc9c1a28ab25588b46bf/html5/thumbnails/10.jpg)
10
The true argument (Theme) of binding is "a regulatory element“ and "this element" is just an anaphor of itTransitivity enables us to conflate the information
We analyzed the effect on the binding and the activity of transcription factors at a regulatory element. (B) Corefer(C) Theme
TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element.
TPA induction increases the binding of AP-1 factors to this element.
Cause ThemeTheme
Theme (A) Theme
S1
S2
S3
(A) Theme & (B) Corefer => (C) Theme
Information conflation
![Page 11: NAISTビッグデータシンポジウム - 情報 松本先生](https://reader033.fdocument.pub/reader033/viewer/2022051006/58eefc9c1a28ab25588b46bf/html5/thumbnails/11.jpg)
11
We analyzed the effect on the binding and the activity of transcription factors at a regulatory element. CoreferTheme
TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element.
TPA induction increases the binding of AP-1 factors to this element.
Cause ThemeTheme
Theme Theme
Theme
CoreferTheme
S1
S2
S3
Discourse analysis
![Page 12: NAISTビッグデータシンポジウム - 情報 松本先生](https://reader033.fdocument.pub/reader033/viewer/2022051006/58eefc9c1a28ab25588b46bf/html5/thumbnails/12.jpg)
Syntactic parsingNE chunking
Part-of-Speech(POS)tagging
Predicate-argumentStructure analysis
Coreferenceresolution
Relationextraction semantic/
contextprocessing
Machine Learning /Knowledge Acquisition
Document Structure Analysis
Knowledge
Bases(Dmain
Ontologies)
NLP Technologies for Document Analysis
12
![Page 13: NAISTビッグデータシンポジウム - 情報 松本先生](https://reader033.fdocument.pub/reader033/viewer/2022051006/58eefc9c1a28ab25588b46bf/html5/thumbnails/13.jpg)
What we can do with Scientific Papers
Knowledge extraction (domain knowledge)New fact discoveryContent-aware paper searchSummarization Automatic generation of abstracts Keyword generation Survey generation
Recommendation of related papersSimilar article/case search Structural similarity: papers, law reports,
patents
![Page 14: NAISTビッグデータシンポジウム - 情報 松本先生](https://reader033.fdocument.pub/reader033/viewer/2022051006/58eefc9c1a28ab25588b46bf/html5/thumbnails/14.jpg)
Example: Structured Abstract Generation
14
![Page 15: NAISTビッグデータシンポジウム - 情報 松本先生](https://reader033.fdocument.pub/reader033/viewer/2022051006/58eefc9c1a28ab25588b46bf/html5/thumbnails/15.jpg)
Related ProjectBig Mechanism (2014.07-, by DARPA)
http://www.darpa.mil/Our_Work/I2O/Programs/Big_Mechanism.aspx The Big Mechanism program aims to develop
technology to read research abstracts and papers to extract pieces of causal mechanisms, assemble these pieces into more complete causal models, and reason over these models to produce explanations. The domain of the program is cancer biology with an emphasis on signaling pathways.
![Page 16: NAISTビッグデータシンポジウム - 情報 松本先生](https://reader033.fdocument.pub/reader033/viewer/2022051006/58eefc9c1a28ab25588b46bf/html5/thumbnails/16.jpg)
Architecture of Big Mechanism
from Paul Cohen, “DARPA’s Big Mechanism Program”
![Page 17: NAISTビッグデータシンポジウム - 情報 松本先生](https://reader033.fdocument.pub/reader033/viewer/2022051006/58eefc9c1a28ab25588b46bf/html5/thumbnails/17.jpg)
Deep Language AnalysisComplex sentence structure analysisRobust Semantic ParsingDiscourse Analysis Co-reference Causal / Temporal relationRepresentation and Reasoning Explanation / AnticipationConfidence/credibility (of extracted facts / what is written in documents)
![Page 18: NAISTビッグデータシンポジウム - 情報 松本先生](https://reader033.fdocument.pub/reader033/viewer/2022051006/58eefc9c1a28ab25588b46bf/html5/thumbnails/18.jpg)
Large-scale Text Data
syntactic dependency structureargument structure, coreference
rhetorical / document structure
POS tags, phrase/NE chunking
relations ( temporal, causal, entailment )
18
Know
ledg
e Ba
seOn
tolo
gy
Language Processing and Document Analysis Layers
Document Analysis(Document Understanding, Similarity-based Search, Knowledge Discovery/Assembling)
![Page 19: NAISTビッグデータシンポジウム - 情報 松本先生](https://reader033.fdocument.pub/reader033/viewer/2022051006/58eefc9c1a28ab25588b46bf/html5/thumbnails/19.jpg)
We may be able to do more
Research Trend SurveyResearch (paper) Evaluation Content-aware citation analysis
Innovation Foresight Eg: Foresight and Understanding from
Scientific Exposition (FUSE) Project http://www.iarpa.gov/index.php/research-programs/fuse
Collaboration with people in application areas who need to read/understand documents