China 2009 语义网与本体技术导论 An Introduction to the Semantic Web and Ontology...
-
date post
22-Dec-2015 -
Category
Documents
-
view
335 -
download
7
Transcript of China 2009 语义网与本体技术导论 An Introduction to the Semantic Web and Ontology...
China 2009 http://www.larkc.eu/ 1
语义网与本体技术导论 An Introduction to the Semantic Web and Ontology
Technology
黄智生
Zhisheng Huang
Vrije University Amsterdam
The Netherlands
China 2009 http://www.larkc.eu/ 2
语义网与本体技术系列讲座
• 第一部分:导论2009 年 9 月 9 日星期三 14 : 00-15 : 30
• 第二部分:逻辑基础2009 年 9 月 12 日星期六 10 : 00-11 : 30
• 第三部分:专题研究2009 年 9 月 13 日星期日 14 : 00-15 : 30
China 2009 http://www.larkc.eu/ 3
万维网 : 影响和展望WWW: Its impacts and
visions
China 2009 http://www.larkc.eu/ 4
从 Google 谈起starting from Google
China 2009 http://www.larkc.eu/ 5
存在的问题Existing Problems
China 2009 http://www.larkc.eu/ 6
我们能不能做得更好?Can we do it better?
• 基于语义的搜索 Semantics-based search• 概念组合描述 concept combination
specification• 指定特定领域 domain specific• 逼近搜索 approximate search• 搜索代理 search agent
China 2009 http://www.larkc.eu/ 7
语义网 (Semantic Web)
•核心思想 : 给网络信息赋于确切定义的意义 , 即语义。„The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in co-operation.“
[Berners-Lee et al., 2001]
China 2009 http://www.larkc.eu/ 8
语义是什么 ?What is the Semantics?
• Frege(1848-1925): Reference and Sense• Syntax, Semantics, Pragmatics• Denotational Semantics vs. Operational Semantics
Main features• 指称性 (denotation)• 唯一性 (uniqueness)• 相关性 (relatedness)
China 2009 http://www.larkc.eu/ 9
语义网想做什么?(What the Semantic Web wants
to do)
• 机器可自动处理• 机器可理解Content is machine-understandable if it
is bound to some formal description of itself (i.e. metadata).
China 2009 http://www.larkc.eu/ 10
HTML 标识 (HTML Markup)……<h2>Zhisheng Huang</h2><b>Affiliation</b>: Department of Computer Science<br>Faculty of Sciences<br>Vrije University Amsterdam<p><b>Email</b>: huang @ cs.vu.nl<br><b>Phone</b>: 31-20-4447740(office)……
</html>
China 2009 http://www.larkc.eu/ 11
XML 标注XML-Annotations
<researcher><name>Zhisheng Huang</name><affiliation><department>Department of Computer
Science</department><faculty>Faculty of Sciences</faculty><university>Vrije University Amsterdam</university></affiliation><email>huang @ cs.vu.nl</email><phone id=“office”> (31)-20-4447740</phone>……</researcher>
</html>
China 2009 http://www.larkc.eu/ 12
Data Structures
• 结构化数据 Structured Data:• Database
• 半结构化数据 Semi-structured Data:• HTML, XML, BibTex
• 非结构化数据 Non-structured Data:• Text
China 2009 http://www.larkc.eu/ 13
关系数据库的 XML 表示XML representation of a
relational database<group name=“AI”><member id=“001”><name>John</name><phone>1234567</phone></member><member id=“002”><name>Mary</name><phone>7654321</phone></member>…..</group>
member id name phone
001 John 1234567
002 Mary 7654321
… … …
AI group
China 2009 http://www.larkc.eu/ 14
文件类型定义Document Type Definition(DTD)
<!DOCTYPE researcher [<!ELEMENT researcher (name, affiliation, email,
phone)><!ELEMENT name (#PCDATA)><!ELEMENT email (#PCDATA)><!ELEMENT phone (#PCDATA)>
<!ATTLIST phone id CDATA #REQUIRED ><!ELEMENT affiliation (department, faculty,
university)>… ]>
China 2009 http://www.larkc.eu/ 15
XML 模式 XML Schema
• The purpose of an XML Schema is to define the legal building blocks of an XML document, just like a DTD.
China 2009 http://www.larkc.eu/ 16
Why XML Schemas
• XML Schemas are extensible to future additions
• XML Schemas are richer and more useful than DTDs
• XML Schemas are written in XML
• XML Schemas support data types
• XML Schemas support namespaces
China 2009 http://www.larkc.eu/ 17
名字冲突 Name Conflicts
• Since element names in XML are not fixed, very often a name conflict will occur when two different documents use the same names describing two different types of elements.
• If these two XML documents were added together, there would be an element name conflict because both documents contain a same element with different content and definition.
China 2009 http://www.larkc.eu/ 18
XML 名字空间 XML NameSpace
• Using Namespaces to solve Name Conflicts
Examples:
• xmlns:namespace prefix="namespace"
• xmlns:xsd="http://www.w3.org/2001/XMLSchema"
China 2009 http://www.larkc.eu/ 19
可扩展标识语言模式XML Schema
<xsd:element name="reseracher"> <xsd:complexType>
<xsd:element name="name" type="xsd:String"/><xsd:element name="affiliation" type="affil"
minOccurs="1" maxOccurs="unbounded"/><xsd:element name="phone" type="xsd:String"/><xsd:element name="email" type="xsd:String"/>
</xsd:complexType> </xsd:element> <xsd:complexType name="affil">
<xsd:element name= " department" type="xsd:String"/><xsd:element name= " faculty" type="xsd:String"/><xsd:element name="university" type="xsd:String"/>
</xsd:complexType>
China 2009 http://www.larkc.eu/ 20
资源描述框架Resource Description Framework(RDF)
• Metadata is machine understandable information about web resources or anything that has an URI, it is represented as a set of independent assertions:
http://wasp.cs.vu.nl/sekt/dig/dig.pdf
ZhishengCreator
CeesCreator
Triple: T(subject, attribute, values)
<rdf:Description about="http://wasp.cs.vu.nl/sekt/dig/dig.pdf"> <dc:Creator rdf:ressource="http://www.cs.vu.nl/~huang"/> <dc:Creator rdf:ressource="mailto:[email protected]"/> </rdf:Description>
China 2009 http://www.larkc.eu/ 21
RDF: Dublin Core
• The Dublin Core provides properties for describing network objects, suitable for use by network search engines.
• The Dublin Core is a set of predefined properties for describing documents.
• The first Dublin Core properties were defined at the Metadata Workshop in Dublin, Ohio in 1995 and is currently maintained by the Dublin Core Metadata Initiative.
China 2009 http://www.larkc.eu/ 22
Dublin Core Metadata Initiative
• The Dublin Core Metadata Initiative is an open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models.
• http://dublincore.org/
China 2009 http://www.larkc.eu/ 23
Annotating Metadata
<rdf:Description rdf:about=…dc-rdf/"> <dc:title>
Guidance on expressing the Dublin Core within the Resource Description Framework (RDF)
</dc:title> <dc:creator> Eric Miller </dc:creator> <dc:creator> Paul Miller </dc:creator> <dc:creator> Dan Brickley </dc:creator> <dc:subject> Dublin Core; RDF; XML </dc:subject> <dc:publisher> Dublin Core Metadata Initiative
</dc:publisher> <dc:contributor> Dublin Core Data Model Working
Group </dc:contributor> <dc:date> 1999-07-01 </dc:date> <dc:format> text/html </dc:format> <dc:language> en </dc:language> </rdf:Description>
China 2009 http://www.larkc.eu/ 24
资源描述框架模式RDF Schema (RDFS)
• RDFS defines vocabulary for RDF
• Organizes this vocabulary in a typed hierarchy• Class, subClassOf, type• Property, subPropertyOf• domain, range
China 2009 http://www.larkc.eu/ 25
RDFS
Prof. Zhong
Zeng , Yi
Person
PhDStudent Professor
subClassOfsubClassOf
type
hasSuperVisordomain range
type
China 2009 http://www.larkc.eu/ 26
概念与本体Concepts and Ontologies
• Philosophical discipline, branch of philosophy that deals with the nature and the organisation of reality.
• Science of Being (Aristotle, Metaphysics, IV,1)
• What is being?
• What are the features common to all beings?
China 2009 http://www.larkc.eu/ 27
Vocabulary and Ontology
• Controlled vocabulary (Jernst 2003) : • a list of controlled terms• unambiguous• non-redundant definition
• Ontology: a controlled vocabulary expressed in an ontology representation language (Jernst 2003)
China 2009 http://www.larkc.eu/ 28
In computer science …
• An ontology is an explicit specification of a conceptualization. [Gruber93]
• An ontology is a shared understanding of some domain of interest. [Uschold, Gruninger96]
• There are many definitions• a formal specification EXECUTABLE• of a conceptualization of a domain COMMUNITY• of some part of world that is of interest APPLICATION
• Defines• A common vocabulary of terms• Some specification of the meaning of the terms• A shared understanding for people and machines
China 2009 http://www.larkc.eu/ 29
Why develop an ontology?
• To make domain assumptions explicit• Easier to change domain assumptions• Easier to understand and update legacy data
• To separate domain knowledge from operational knowledge• Re-use domain and operational knowledge
separately• A community reference for applications• To share a consistent understanding of what
information means.
China 2009 http://www.larkc.eu/ 30
本体的主要特征Key features of an
Ontology• 概念层次性 Concept hierarchy,
• 概念包含关系 concept subsumption
•特殊与一般关系 InstanceOf Relation (Instances)
•部分与整体关系 PartOf Relation (property)
China 2009 http://www.larkc.eu/ 31
Why not other alternatives
• 一阶谓词逻辑 the first-order predicate logic
• 集合论 set theory
• 程序语言 programming languages
China 2009 http://www.larkc.eu/ 32
China 2009 http://www.larkc.eu/ 33
网络本体语言Web Ontology Language
(OWL)• OWL is built on top of RDF • OWL is for processing information
on the web • OWL was designed to be
interpreted by computers • OWL was not designed for being
read by people • OWL is written in XML • OWL is a web standard
China 2009 http://www.larkc.eu/ 34
Design Goals for OWL
China 2009 http://www.larkc.eu/ 35
Layered language
• OWL Lite:
• Classification hierarchy• Simple constraints
• OWL DL:• Maximal expressiveness• While maintaining tractability• Standard formalisation
• OWL Full:• Very high expressiveness• Loosing tractability• Non-standard formalisation• All syntactic freedom of RDF
(self-modifying) Syntactic layeringSemantic layering
Syntactic layeringSemantic layering
Full
DL
Lite
China 2009 http://www.larkc.eu/ 36
China 2009 http://www.larkc.eu/ 37
China 2009 http://www.larkc.eu/ 38
China 2009 http://www.larkc.eu/ 39
OWL Example: animals
<?xml version="1.0"?><rdf:RDFxmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:dc="http://purl.org/dc/elements/1.1/" xml:base="http://wasp.cs.vu.nl/sekt/ontology/animal"> <owl:Ontology rdf:about=“animal"/><owl:Class rdf:ID="Eagle"> <rdfs:subClassOf><owl:Class rdf:about="#Bird"/> </rdfs:subClassOf></owl:Class><owl:Class rdf:ID="Animal"/> <owl:Class rdf:ID="Fly"><owl:disjointWith> <owl:Class rdf:about="#Penguin"/></owl:disjointWith> <rdfs:subClassOf rdf:resource="#Animal"/> </owl:Class><owl:Class rdf:ID="Bird"> <rdfs:subClassOf rdf:resource="#Fly"/> </owl:Class> <owl:Class rdf:ID="Penguin"> <rdfs:subClassOf rdf:resource="#Bird"/> <owl:disjointWith rdf:resource="#Fly"/> </owl:Class></rdf:RDF>
China 2009 http://www.larkc.eu/ 40
China 2009 http://www.larkc.eu/ 41
网络 1.0 Web1.0
China 2009 http://www.larkc.eu/ 42
网络 2.0 Web2.0
China 2009 http://www.larkc.eu/ 43
对网络 3.0 的期待Expectations on Web3.0
• 新颖性 (Novelty) : 它不同于已有的 Web1.0 和 Web2.0 的技术,它能提供全新的一代网络服务模式 ( 即为什么不是Web1.0 或 Web2.0 )
从字面上看对 Web3.0 的特征期待:
• 可行性 (Achievability) :它在现有的网络环境下,经过努力是可能实现的, 它并不存在不可逾越的技术障碍(即为什么不是 Web4.0 或更高)。
• 迫切性 (Urgency) :它提供网络服务是当前社会迫切需要的,它的技术引入是能够对社会产生重大影响。(即为什么只能是 Web3.0 )
China 2009 http://www.larkc.eu/ 44
网络 3.0 Web3.0
China 2009 http://www.larkc.eu/ 45
网络 1.0 – 网络 2.0 – 网络 3.0Web1.0 – Web2.0 –
Web3.0• 网络 1.0 : 文件网
Web1.0: Web of documents
• 网络 2.0 : 人际 / 社会网 Web2.0: Web of persons
• 网络 3.0 : 数据网 Web3.0: Web of data (semantics)
China 2009 http://www.larkc.eu/ 46
网络发展整体观
China 2009 http://www.larkc.eu/ 47
语义联接的好处:从一个实例说起
Advantages of Linked Data
China 2009 http://www.larkc.eu/ 48
数据联接的好处:小结• 现有的网页是供人们阅读的,不便于机器自动处
理,数据联接便于机器自动处理• 文件联接在局部文字上只允许一个链接,而数据
联接对局部文字支持多重链接• 文件联接只提供部分文字链接,而数据联接保证
全文链接• 基于关键词的搜索引擎如 Google 虽然看起来支持
全文检索,但它不能区分同一个词的不同含义,这对于人名,地名等重复性频率较高的问题领域处理尤其困难,而且在许多具体应用领域一词多义的情形比比皆是。
China 2009 http://www.larkc.eu/ 49
数据联接的统一概念格式
• 三元组 (Triple)方法 : <subject, predicate, object>
例子: <zhishengHuang, isStaffof, VrijeUnivAm>
• 提供网络资源的描述能力例子: <http://wasp.cs.vu.nl/~huang,
isStaffof, http://www.vu.nl>• 提供语义的唯一标识• 让数据内容独立于表达形式• 提供初步的语义推理能力
China 2009 http://www.larkc.eu/ 50
为什么推理支持是必要的?
例子:从 ZhishengHuang 是自由大学的雇员和自由大学在阿姆斯特丹,能够推出ZhishengHuang 在阿姆斯特丹工作。
<ZhishengHuang, isStaffof, VrijeUnivAm>
<VrijeUniv, inCity,Amsterdam>,
<?x, isStaffof, ?y>,<?y,inCity,?z> -><?x,worksin,?z>
= 》 <ZhishengHuang, worksin, Amsterdam>
China 2009 http://www.larkc.eu/ 51
语义网与本体Semantic Web and
Ontologies
China 2009 http://www.larkc.eu/ 52
五句话介绍语义网的主要思想: Why the Semantic Web ?
•任任何信息系统都需要数据;•数数据表示要独立于具体的应用和平台,以保证最大程度地可重用;•采用统一的数据概念表示以保证数据表示独立于具体系统(即可采用 Triple/Tuple形式) ;•数数据应能描述网络资源(即要采用 RDF/RDFS 或其他类似的语言)•数数据应提供初步的推理支持(即要采用 OWL 或其他知识表示语言)
(注意; RDF/RDFS/OWL均采用 Triple 语义模型)
China 2009 http://www.larkc.eu/ 53
发展趋势
根据美国著名市场研究公司 Gartner 的 2007 五月份报告, 到 2012 年, 70% 的公开网页将带有一定程度的语义标注, 20% 将使用更强的基于语义网的本体。Gartner (May 2007):
"By 2012, 70% of public Web pages will have some level of semantic markup,
20% will use more extensive Semantic Web-based ontologies”
China 2009 http://www.larkc.eu/ 54
海量语义数据的一部分Ontologies and Metadata: Billion Triples
dataset(十亿三元组数据集)
• 雅虎数据• 东南大学数据• 马里兰大学• 英国 open 大学• SemWebBase
( DERI)• 维基百科• 地理名字• 出版物• 英文语义词典• Freebase• 美国政府数据
China 2009 http://www.larkc.eu/ 55
Linked Data 2009
China 2009 http://www.larkc.eu/ 56
一个具体的数据联接的实例http://sindice.com/apiv2/search?q=%22zhisheng%20huang%22&format=atom&page=1&qt=term
China 2009 http://www.larkc.eu/ 57
一个具体的数据联接的实例http://sindice.com/apiv2/search?q=%22zhisheng%20huang%22&format=atom&page=5&qt=term
China 2009 http://www.larkc.eu/ 58
Falcons
China 2009 http://www.larkc.eu/ 59
让数据内容独立于表达形式
China 2009 http://www.larkc.eu/ 60
Semantic Web Layers
China 2009 http://www.larkc.eu/ 61
语义网的逻辑基础Logical Foundation of the Semantic
Web
描述逻辑与框架逻辑之争Description Logic vs. Frame-Logic
• 封闭世界假说与开放世界假说Closed world assumption vs. Open world assumption• 唯一名假说与非唯一名假说Unique name assumption vs. Non-unique name assumption• 面向对象与非面向对象Object-oriented vs. non-object oriented• …..
China 2009 http://www.larkc.eu/ 62
一个实例
• 王老师有孩子:王一, 王二,王三。问:王老师有几个孩子?
•封闭世界与唯一名: 3 个•开放世界与唯一名:至少 3 个
•开放世界与非唯一名:至少 1 个
•思考:在网络环境下,哪种方式比较合适?
China 2009 http://www.larkc.eu/ 63
语义网应用的一些实例:DBpedia Mobile
• http://beckr.org/DBpediaMobile/?location=Beijing
• http://beckr.org/DBpediaMobile
China 2009 http://www.larkc.eu/ 64
芬兰医疗语义网 HealthFinland – Health Information on the Semantic Web
• http://www.seco.tkk.fi/applications/tervesuomi/• provide a new kind of solution approach to these
problems on a national Finnish level. The system consists of three main components: • Metadata, ontology, and service infrastructure. • Semantic content creation process. A content
creation and harvesting system has been implemented for producing semantically annotated contents, based on the shared metadata model and ontologies.
• Semantic portal HealthFinland (TerveSuomi) and its services. The material is published via a semantic portal that creates a single national entry-point for health information, health promotion and health-related news.
China 2009 http://www.larkc.eu/ 65
National Semantic Web Ontology Project in Finland
(FinnONTO), • National Semantic Web Ontology Project in Finland (FinnONTO), 2003-2007
• A large national continuation project of FinnONTO, called Semantic Web 2.0 (FinnONTO 2.0), started in the beginning of 2008.
• The research is directed and is mostly carried our by the Semantic Computing Research Group (SeCo) at the Helsinki University of Technology (TKK) and the University of Helsinki. Also the University of Tampere is contributing to the work.
• The consortium behind the project included 37 public organizations and companies funding the research during the final year 2007. This consortium represents a wide area of functions of the society including museums, libraries, business, health organizations, government, media, and education. Public organizations, companies, and universities are participating in the project.
China 2009 http://www.larkc.eu/ 66
荷兰国家文化传承工程The Dutch Cultural
Heritage Eculture Project STiTCH-Catch Chip Project
China 2009 http://www.larkc.eu/ 67
Project E-Culture http://e-culture.multimedian.nl/
China 2009 http://www.larkc.eu/ 68
China 2009 http://www.larkc.eu/ 69
China 2009 http://www.larkc.eu/ 70
China 2009 http://www.larkc.eu/ 71
China 2009 http://www.larkc.eu/ 72
Timeline
China 2009 http://www.larkc.eu/ 73
2006国际语义网技术挑战赛冠军
China 2009 http://www.larkc.eu/ 74
http://www.ontology-advisory.org/
China 2009 http://www.larkc.eu/ 75
China 2009 http://www.larkc.eu/ 76
China 2009 http://www.larkc.eu/ 77
Questions and Discussions