The Web of Data: The W3C Semantic Web Initiative

35
The Web of Data NISO Virtual Conference 19 February 2014 Ralph Swick, W3C

description

From the Feb 19 2014 NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations The Web of Data - 
Ralph Swick, Domain Lead of the Information and Knowledge Domain at W3C

Transcript of The Web of Data: The W3C Semantic Web Initiative

Page 1: The Web of Data: The W3C Semantic Web Initiative

The Web of Data

NISO Virtual Conference

19 February 2014

Ralph Swick, W3C

Page 2: The Web of Data: The W3C Semantic Web Initiative

Agenda

• Data is changing our lives

• W3C’s traditional focus

• Expanding scope of W3C’s data activities

Page 3: The Web of Data: The W3C Semantic Web Initiative

Web has transformed our relation

to computers and to data

• A computer in every pocket

• Apps leveraging context

– geolocation and other sensors

– social context (“I’m at the conference, too!”)

• Change in the use of search

– people search for answers, not sites

– answers from aggregated data

(Siri, Google Now, Wolfram Alpha)

Page 4: The Web of Data: The W3C Semantic Web Initiative

Apps are using data from many

sources

• Social networking

• Mobile devices

• Sensors

• Open data

Page 5: The Web of Data: The W3C Semantic Web Initiative

Imagine…

• A “Web” where

– documents are available for download

on the Internet

– but there would be no hyperlinks

among them

Page 6: The Web of Data: The W3C Semantic Web Initiative

Data on the Web is not enough…

• We need a proper infrastructure for a

real Web of Data where:

– data are available on the Web

• accessible via standard Web technologies

– data are interlinked over the Web

– data can be integrated over the Web

• This is Linked Data

Page 7: The Web of Data: The W3C Semantic Web Initiative

Agenda

• Data is changing our lives

• W3C’s traditional focus

• Expanding scope of W3C’s data activities

Page 8: The Web of Data: The W3C Semantic Web Initiative

Semantic Web Core

• RDF data model

• RDF Schema vocabulary design

• RDB2RDF relational DB export

• SPARQL query

• SKOS vocabulary description

• OWL ontological inference

• RIF rules interchange

• LDP read-write Web of Data

• POWDER description resources

• GRDDL app-specific XML

Page 9: The Web of Data: The W3C Semantic Web Initiative

Need for RDF schemas

• First step towards the “extra knowledge”:

– define the terms we can use

– what restrictions apply

– what extra relationships are there?

• “RDF Vocabulary Description Language”

– the term “Schema” is retained for historical

reasons…

Page 10: The Web of Data: The W3C Semantic Web Initiative

Vocabularies

• There is a need for “languages” to

define such vocabularies

– to define those vocabularies

– to assign clear “semantics” on how new

relationships can be deduced

Page 11: The Web of Data: The W3C Semantic Web Initiative

SKOS

• SKOS provides a simple bridge

between the “print world” and the

(Semantic) Web

• Thesauri, glossaries, etc., from the

library community can be made

available

• SKOS can also be used to organize,

e.g., tags, annotate other vocabularies,

Page 12: The Web of Data: The W3C Semantic Web Initiative

Semantic Web/Linked Data Today

• Standards are mature

– some level of maintenance work is always needed

• Server-side applications dominate

• Commercial applications exist, e.g.:

– direct integration/usage of linked data on the Web

– consumption of other formats converted internally to a

common format (RDF)

Page 13: The Web of Data: The W3C Semantic Web Initiative

Challenge: leverage data in

interoperable apps

• Public, private, behind enterprise firewalls

• From informal to highly curated

• From machine readable to human readable

– HTML tables, twitter feeds, local vocabularies,

spreadsheets, …

• Expressed in diverse data models

– tree, graph, table, …

• Serialized in many ways

– XML, CSV, RDF, PDF, JSON, HTML Tables,…

Page 14: The Web of Data: The W3C Semantic Web Initiative

The Linking Open Data Project

Page 15: The Web of Data: The W3C Semantic Web Initiative

Linked Data Principles

Is your data 5 Star?

Available on the Web in some format (i.e., use URI to access the data) Available as machine-readable structured data (e.g., excel instead of an image scan) As before, but using a non-proprietary format (e.g., CSV instead of excel) All the above, plus use open standards (RDF & Co.) to identify things, so that people could point at your stuff All the above, plus link your data to other people’s data to provide context

Page 16: The Web of Data: The W3C Semantic Web Initiative

A Three Star Example

Page 17: The Web of Data: The W3C Semantic Web Initiative

The importance of Linked Data

• Provide a core set of data that

applications can build on

– stable references for “things”,

• e.g., http://dbpedia.org/resource/Kolkata/

– many many relationships that applications

may reuse

– a “nucleus” for a larger, semantically

enabled Web!

Page 18: The Web of Data: The W3C Semantic Web Initiative

Linked Data Platform (LDP)

• Define an HTTP/RESTful based infrastructure to publish, read, write, or modify linked data – typical usage: data intensive application in a

browser, application integration using shared data…

• The infrastructure should be easy to implement and install – provides an “entry point” for Linked Data

applications!

• The work is nearing completion

Page 19: The Web of Data: The W3C Semantic Web Initiative

RDF with HTML: RDFa

• By adding some “meta” information, the same source can be reused – typical example: your personal information,

like address, should be readable for humans and processable by machines

• Some solutions have emerged: – add extra statements in microdata or RDFa

that can be converted to RDF • microdata can be used for a (useful) subset of RDF

• RDFa is, essentially, a complete serialization of RDF

Page 20: The Web of Data: The W3C Semantic Web Initiative

schema.org

• Schema.org is a cooperation of search engines (Bing, Google, Yahoo!, and Yandex)

• It is a large vocabulary that they all understand

• The terms are extracted from HTML5+microdata or HTML5+RDFa

– the various partners use it for different purposes

– it can be used by anyone outside of the search world!

Page 21: The Web of Data: The W3C Semantic Web Initiative
Page 22: The Web of Data: The W3C Semantic Web Initiative

Some things to remember when

you publish data

• Publish your data first, do user interfaces later!

– the “raw data” can become useful on its own right and others may use it

– you can add your added value later by providing nice user access

• If possible, publish your data in RDF but if you cannot, others may help you in conversions

– trust the community…

• Add links to other data. “Just” publishing isn’t enough…

Page 23: The Web of Data: The W3C Semantic Web Initiative

Some things to remember when

you publish data (2)

• Think about persistence and versioning

– others may depend on the data you publish…

• Be thoughtful about the URIs you choose

• Try to avoid reinventing the wheel when

choosing vocabularies

Page 24: The Web of Data: The W3C Semantic Web Initiative

Some things to remember when

you publish data (3)

• Document your data, i.e., provide

metadata

– there are vocabularies to do this

• Data Catalog Vocabulary (DCAT)

• Vocabulary of Interlinked Datasets (VoID)

• DCTERMS

• vocabularies for licensing (Open Data Commons,

government licenses)

– this area is still very much in development…

Page 25: The Web of Data: The W3C Semantic Web Initiative

Agenda

• Data is changing our lives

• W3C’s work on data integration

• Expanding scope of W3C’s data activities

Page 26: The Web of Data: The W3C Semantic Web Initiative

New work underway

• CSV on the Web

• Data on the Web Best Practices

• Vocabulary management

Page 27: The Web of Data: The W3C Semantic Web Initiative

What we are hearing

• CSV is everywhere

– can be huge data sets, not easily readable in a spreadsheet

or Google refine

– meaning of data not in machine-readable form

– data is not necessarily used for web-scale integration but

rather immediate usage

• Metadata is essential

• Conversion is an issue

• European Commission Study on business models

for Linked Open Government Data (BM4LOGD)

Page 28: The Web of Data: The W3C Semantic Web Initiative

Linked Data Benefits (BM4LOD)

• Flexible data integration

– Streamlined internal processes

– Where working relationships already exist, much easier to

share

– Linking reference collections; discovery of new relationships

• Increase in data quality

– More use of data internally brings errors to light

– Use of open standards increases quality of system

• New services

• Cost reduction

– Increased efficiency

– Increase in data usage due to LOD enrichment

Page 29: The Web of Data: The W3C Semantic Web Initiative

CSV on the Web

• How W3C can help

– metadata vocabulary to describe CSV data (structure,

reference to access rights, annotations, etc.)

– metadata discovery (e.g., part of an HTTP header, special

rows and columns, packaging formats…)

– mapping content to RDF, JSON, XML

Page 30: The Web of Data: The W3C Semantic Web Initiative

Best practices

• Document best practices for the data publishers

– URI design, management of persistence, versioning

– business models

– use of core metadata vocabularies (provenance, access

control, ownership)

• Specific vocabularies

– quality, application descriptions, …

Page 31: The Web of Data: The W3C Semantic Web Initiative

Vocabulary management:

challenge

• Interoperable vocabularies are key for (meta)data

• At the moment, it is a fairly chaotic world…

– many, possibly overlapping vocabularies

– difficult to locate the one that is needed

– vocabularies may not be properly managed, maintained,

versioned, provided persistence…

Page 32: The Web of Data: The W3C Semantic Web Initiative

Vocabulary management: how

W3C can help

• Provide a space where

– communities can develop vocabularies (through, e.g.,

CGs, possibly WGs)

– host vocabularies at W3C if requested

– annotate vocabularies with a proper set of metadata terms

– establish a vocabulary directory

• The exact structure is still being discussed

Page 33: The Web of Data: The W3C Semantic Web Initiative

Summary

• Data-driven smart apps are one of the major growth

engines for the worldwide software market.

• We need to meet developers where they are.

• 5 Star Benefits of LOD – Greater efficiency, better provision of the task

– Greater flexibility leads to lower costs for future projects

– New services, new connections, new discoveries

– Improved navigation within and between datasets

– Others can build apps based on your data

Page 34: The Web of Data: The W3C Semantic Web Initiative

Available specifications:

Primers, Guides`

• Primers:

– RDF Primer

– OWL Guide

– SKOS Primer

– GRDDL Primer

– RDFa Primer

• The W3C Semantic Web Activity Wiki has links to all

the specifications

Page 35: The Web of Data: The W3C Semantic Web Initiative

These slides are in the Web at http://www.w3.org/2014/Talks /0219-NISO-RRS with thanks to Ivan Herman, W3C and Phil Archer, W3C