The Web of Data: The W3C Semantic Web Initiative
-
Upload
national-information-standards-organization-niso -
Category
Education
-
view
2.437 -
download
2
description
Transcript of The Web of Data: The W3C Semantic Web Initiative
The Web of Data
NISO Virtual Conference
19 February 2014
Ralph Swick, W3C
Agenda
• Data is changing our lives
• W3C’s traditional focus
• Expanding scope of W3C’s data activities
Web has transformed our relation
to computers and to data
• A computer in every pocket
• Apps leveraging context
– geolocation and other sensors
– social context (“I’m at the conference, too!”)
• Change in the use of search
– people search for answers, not sites
– answers from aggregated data
(Siri, Google Now, Wolfram Alpha)
Apps are using data from many
sources
• Social networking
• Mobile devices
• Sensors
• Open data
Imagine…
• A “Web” where
– documents are available for download
on the Internet
– but there would be no hyperlinks
among them
Data on the Web is not enough…
• We need a proper infrastructure for a
real Web of Data where:
– data are available on the Web
• accessible via standard Web technologies
– data are interlinked over the Web
– data can be integrated over the Web
• This is Linked Data
Agenda
• Data is changing our lives
• W3C’s traditional focus
• Expanding scope of W3C’s data activities
Semantic Web Core
• RDF data model
• RDF Schema vocabulary design
• RDB2RDF relational DB export
• SPARQL query
• SKOS vocabulary description
• OWL ontological inference
• RIF rules interchange
• LDP read-write Web of Data
• POWDER description resources
• GRDDL app-specific XML
Need for RDF schemas
• First step towards the “extra knowledge”:
– define the terms we can use
– what restrictions apply
– what extra relationships are there?
• “RDF Vocabulary Description Language”
– the term “Schema” is retained for historical
reasons…
Vocabularies
• There is a need for “languages” to
define such vocabularies
– to define those vocabularies
– to assign clear “semantics” on how new
relationships can be deduced
SKOS
• SKOS provides a simple bridge
between the “print world” and the
(Semantic) Web
• Thesauri, glossaries, etc., from the
library community can be made
available
• SKOS can also be used to organize,
e.g., tags, annotate other vocabularies,
…
Semantic Web/Linked Data Today
• Standards are mature
– some level of maintenance work is always needed
• Server-side applications dominate
• Commercial applications exist, e.g.:
– direct integration/usage of linked data on the Web
– consumption of other formats converted internally to a
common format (RDF)
Challenge: leverage data in
interoperable apps
• Public, private, behind enterprise firewalls
• From informal to highly curated
• From machine readable to human readable
– HTML tables, twitter feeds, local vocabularies,
spreadsheets, …
• Expressed in diverse data models
– tree, graph, table, …
• Serialized in many ways
– XML, CSV, RDF, PDF, JSON, HTML Tables,…
The Linking Open Data Project
Linked Data Principles
Is your data 5 Star?
Available on the Web in some format (i.e., use URI to access the data) Available as machine-readable structured data (e.g., excel instead of an image scan) As before, but using a non-proprietary format (e.g., CSV instead of excel) All the above, plus use open standards (RDF & Co.) to identify things, so that people could point at your stuff All the above, plus link your data to other people’s data to provide context
A Three Star Example
The importance of Linked Data
• Provide a core set of data that
applications can build on
– stable references for “things”,
• e.g., http://dbpedia.org/resource/Kolkata/
– many many relationships that applications
may reuse
– a “nucleus” for a larger, semantically
enabled Web!
Linked Data Platform (LDP)
• Define an HTTP/RESTful based infrastructure to publish, read, write, or modify linked data – typical usage: data intensive application in a
browser, application integration using shared data…
• The infrastructure should be easy to implement and install – provides an “entry point” for Linked Data
applications!
• The work is nearing completion
RDF with HTML: RDFa
• By adding some “meta” information, the same source can be reused – typical example: your personal information,
like address, should be readable for humans and processable by machines
• Some solutions have emerged: – add extra statements in microdata or RDFa
that can be converted to RDF • microdata can be used for a (useful) subset of RDF
• RDFa is, essentially, a complete serialization of RDF
schema.org
• Schema.org is a cooperation of search engines (Bing, Google, Yahoo!, and Yandex)
• It is a large vocabulary that they all understand
• The terms are extracted from HTML5+microdata or HTML5+RDFa
– the various partners use it for different purposes
– it can be used by anyone outside of the search world!
Some things to remember when
you publish data
• Publish your data first, do user interfaces later!
– the “raw data” can become useful on its own right and others may use it
– you can add your added value later by providing nice user access
• If possible, publish your data in RDF but if you cannot, others may help you in conversions
– trust the community…
• Add links to other data. “Just” publishing isn’t enough…
Some things to remember when
you publish data (2)
• Think about persistence and versioning
– others may depend on the data you publish…
• Be thoughtful about the URIs you choose
• Try to avoid reinventing the wheel when
choosing vocabularies
Some things to remember when
you publish data (3)
• Document your data, i.e., provide
metadata
– there are vocabularies to do this
• Data Catalog Vocabulary (DCAT)
• Vocabulary of Interlinked Datasets (VoID)
• DCTERMS
• vocabularies for licensing (Open Data Commons,
government licenses)
– this area is still very much in development…
Agenda
• Data is changing our lives
• W3C’s work on data integration
• Expanding scope of W3C’s data activities
New work underway
• CSV on the Web
• Data on the Web Best Practices
• Vocabulary management
What we are hearing
• CSV is everywhere
– can be huge data sets, not easily readable in a spreadsheet
or Google refine
– meaning of data not in machine-readable form
– data is not necessarily used for web-scale integration but
rather immediate usage
• Metadata is essential
• Conversion is an issue
• European Commission Study on business models
for Linked Open Government Data (BM4LOGD)
Linked Data Benefits (BM4LOD)
• Flexible data integration
– Streamlined internal processes
– Where working relationships already exist, much easier to
share
– Linking reference collections; discovery of new relationships
• Increase in data quality
– More use of data internally brings errors to light
– Use of open standards increases quality of system
• New services
• Cost reduction
– Increased efficiency
– Increase in data usage due to LOD enrichment
CSV on the Web
• How W3C can help
– metadata vocabulary to describe CSV data (structure,
reference to access rights, annotations, etc.)
– metadata discovery (e.g., part of an HTTP header, special
rows and columns, packaging formats…)
– mapping content to RDF, JSON, XML
Best practices
• Document best practices for the data publishers
– URI design, management of persistence, versioning
– business models
– use of core metadata vocabularies (provenance, access
control, ownership)
• Specific vocabularies
– quality, application descriptions, …
Vocabulary management:
challenge
• Interoperable vocabularies are key for (meta)data
• At the moment, it is a fairly chaotic world…
– many, possibly overlapping vocabularies
– difficult to locate the one that is needed
– vocabularies may not be properly managed, maintained,
versioned, provided persistence…
Vocabulary management: how
W3C can help
• Provide a space where
– communities can develop vocabularies (through, e.g.,
CGs, possibly WGs)
– host vocabularies at W3C if requested
– annotate vocabularies with a proper set of metadata terms
– establish a vocabulary directory
• The exact structure is still being discussed
Summary
• Data-driven smart apps are one of the major growth
engines for the worldwide software market.
• We need to meet developers where they are.
• 5 Star Benefits of LOD – Greater efficiency, better provision of the task
– Greater flexibility leads to lower costs for future projects
– New services, new connections, new discoveries
– Improved navigation within and between datasets
– Others can build apps based on your data
Available specifications:
Primers, Guides`
• Primers:
– RDF Primer
– OWL Guide
– SKOS Primer
– GRDDL Primer
– RDFa Primer
• The W3C Semantic Web Activity Wiki has links to all
the specifications
These slides are in the Web at http://www.w3.org/2014/Talks /0219-NISO-RRS with thanks to Ivan Herman, W3C and Phil Archer, W3C