Post on 17-Jan-2018
description
DBpedia - A Crystallization Point
for the Web of Data2011.10.05
Junghee - Han
2
Outline
The DBpedia Project Understanding Linked Data The DBpedia Knowledge Extraction Framework The DBpedia Knowledge Base Accessing the DBpedia Knowledge Base Applications facilitated by DBpedia
DBpedia - A Crystallization Point for the Web of Data
3
The DBpedia Project
DBpedia 위키피디아로부터 구조화된 정보를 추출하고 , 이를
웹에서 이용할 수 있도록 만들기 위한 커뮤니티
Dbpedia is a community effort to Extract structured information from Wikipedia Make this information available on the Web under an open licenseInterlink the DBpedia dataset with other open datasets on the Web
DBpedia - A Crystallization Point for the Web of Data
4
DBpedia knowledge base Currently describes more than 2.6 million entities
- 198,000 persons - 328,000 places - 101,000 musical works - 34,000 films - 20,000 companies.
The knowledge base contains 3.1 million links to external web pages and 4.9 million RDF links into other Web data sources.
DBpedia - A Crystallization Point for the Web of Data
The DBpedia Project
5
Linked Data
참고 :
6
Linked Data
참고 :
WebBrowsers
SearchEngines
HTTP HTTP
7
Linked Data
RDF stands for
Resource : URI 를 갖는 모든 것 ( 웹페이지 , 이미지 , 동영상등 ) Description : 자원 (Resource) 들의 속성 , 특성 , 관계기술
Framework : 위의 것들을 기술하기 위한 모델 , 언어 , 문법
RDF 는 Graph Model 을 갖고 있다 .
참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data
8
Linked Data Graph Model 예시
RDF Syntax
Triple 형식표현
참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data
SPARQL(Simple Protocol and RDF Query Language) W3C 에서 만든 RDF 질의 언어
Linked Data
9
1. Use URI(Uniform Resource Identifier)s as names for things2. Use HTTP URIs so that people can look up those names3. When someone looks up a URI, provide useful RDF Information4. Include RDF statements that link to other URIs so that they
can discover related things
Tim Berners-Lee 2007 http://www.w3.org/DesignIssues/LinkedData.html
참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data
Linked Data
10
http://bibleontology.com/page/Bilhah
1. Use URIs as names for things
http://bibleontology.com/page/Bilhah
참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data
Linked Data
11
http://bibleontology.com/page/Bilhah
2. Use HTTP URIs so that people can look up those names
http://bibleontology.com/page/Bilhah
참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data
Linked Data
12
http://bibleontology.com/page/Bilhah
3. When someone looks up a URI, provide useful RDF Information
참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data
Linked Data
13
http:// http://bibleontology.com/page/Bilhah
4. Include RDF statements that link to other URIs so that they can discover related things
14
HongGilDong
Hong, Gil Dong 35
Seoul
SemanticWeb
[hasPhotoCollection]
http://dbpedia.org/resource/Semantic_Web
http://www4.wiwiss.fu-berlin.de/flickrwrappr/photos/Semantic_Web
[sameAs]
http://dbpedia.org/resource/Seoul
http://sws.geonames.org/1835848/
http://sws.geonames.org/1835848/nearby.rdf
[nearbyFeatures]
[residences]
[researches]
[name] [age]
Linked Data
참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data
15
SPARQL
Linked Data
참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data
SQL
16
공간정보
여행정보
교통정보
부동산정보문화재정
보
문헌정보토지정보
환경정보
XXX 정보
상품정보
일자리정보
단절된 국가 공공정보
공간정보
여행정보
교통정보
부동산정보문화재정
보
문헌정보토지정보
환경정보
XXX 정보
상품정보
일자리정보
연결된 국가 공공정보
포털 및 언론 대학 기타
민간 정보
DBPedia BBC etc해외 정보
여행정보 공간정보 문헌정보 환경정보 XXX 정보국가 공공정보
Linked Data
참고 : [KSWC2010] 데이터의 가치를 높이는 Linked Data
17
Wikipedia Content
Title
Description
Languages
Web Links
Categorization
Domain specificData
Images
Infoboxes
DBpedia - A Crystallization Point for the Web of Data
Until March 2010, the DBpedia project was using a PHP-based extraction framework to extract different kinds of structured information from Wikipedia. This framework has been superseded by the new Scala-based extraction framework and the old PHP framework is not maintained anymore
18
The DBpedia Knowledge Extraction Framework(1/2)
Labels(title,rdfs:label)Abstracts(first paragraph,rdfs:comment)Interlanguage links. Images. Redirects. Disambiguation(depedia:disambiguates) External links(dbpedia:reference)Page links(dbpedia:wikilink)Homepages(foaf:homepage)Geo-coordinates. Person data. PND. SKOS categories. Page ID. Revision ID. Category label. Article categories. Mappings. Infobox.
Currently 19 extractors
DBpedia - A Crystallization Point for the Web of Data
19
The DBpedia Knowledge Extraction Framework(2/2)
Two Work-Flows Dump-based extraction
-The Wikimedia Foundation publishes SQL dumps of all Wikipedia editions on a monthly basis-The dump-based workflow uses the DatabaseWikipedia page collection as the source of article texts and the N-Triples serializer as the output destination.
Live extraction
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)
DBpedia - A Crystallization Point for the Web of Data
20
Infobox Extraction
dbpedia:BBC p:network_name„British Broadcasting Corporation (BBC)“
dbpedia:BBC p:country dbpedia:United_Kingdom
dbpedia:BBC p:key_people dbpedia:Michael_Lyons dbpedia:Mark_Thompson
DBpedia - A Crystallization Point for the Web of Data
The DBpedia Knowledge Base
Identifying EntitiesResources are assigned a URI according to the pattern http://dbpedia.org/resource/Name (where Name is taken from the URL of the source Wikipedia article, which has the form http://en.wikipedia.org/wiki/Name)
Classifying EntitiesDBpedia entities are classified within four classification schemata in order to fulfill different application requirements.
- Wikipedia Categories - YAGO - UMBEL(Upper Mapping and Binding Exchange Layer) - DBpedia Ontology Describing Entities
Every DBpedia entity is described by a set of general properties
21DBpedia - A Crystallization Point for the Web of Data
Accessing the DBpedia Knowledge Base over the Web
Linked Data DBpedia resource identifiers(ex: http://dbpedia.org/resource/Berlin) SPARQL Endpoint
http://dbpedia.org/sparql
22
RDF Dumps http://wiki.dbpedia.org/Downloads32
Lookup Index http://lookup.dbpedia.org/api/search.asmx
DBpedia - A Crystallization Point for the Web of Data
Interlinked Web Content
23DBpedia - A Crystallization Point for the Web of Data
Currently contains 4.9 million outgoing RDF links
Applications facilitated by Dbpedia(1/3)
Browsing and Exploration DBpedia Mobile
24DBpedia - A Crystallization Point for the Web of Data
Applications facilitated by Dbpedia(2/3)
Querying and Search DBpedia Query Builder
.
25
http://querybuilder.dbpedia.orgDBpedia - A Crystallization Point for the Web of Data
Applications facilitated by Dbpedia(3/3)
Querying and Search Relationship Finder
.
26DBpedia - A Crystallization Point for the Web of Data
ConclusionThe resulting DBpedia knowledge base covers a wide range of different domains and connects entities across these domains.
27DBpedia - A Crystallization Point for the Web of Data
Future WorkCross-language infobox knowledge fusion
- Derive an astonishingly detailed multi-domain knowledge baseWikipedia article augmentation
- Develop a MediaWiki extension that augments Wikipedia articles with additional information as well as media items (pictures, audio) from these sourcesWikipedia consistency checking
- Improve the overall quality of Wikipedia
Conclusions and Future Work