Open statistics Belgium

30
Open Statistics Open Belgium 6 March 2017 Statistics Belgium Lucia Decuyper Youri Baeyesn

Transcript of Open statistics Belgium

Page 1: Open statistics Belgium

Open Statistics

Open Belgium 6 March 2017

Statistics Belgium

Lucia Decuyper

Youri Baeyesn

Page 2: Open statistics Belgium

Open Statistics – Agenda Statistics Belgium => Open Data

• Statistics Belgium

• Open Data Start

• Statbel Open Data Portal

• Statistics Belgium in the EU

Open Data => Linked Open Data

• 5*****?

• RDF

• LOD

• Semantic Web

• Ontologies for statisticians

• LOD in the NSIs

• RDF@statbel

Questions

Contact

Page 3: Open statistics Belgium

Open Statistics – Statistics Belgium 1

Statistics Belgium ?

– National Statistical Institute (before NIS)

• largest producer of official statistics in Belgium

What do we do?

– Collect data: administrative sources (registers) or surveys

– Process and analyse data:

• common methodology, definitions (national, European)

– Publish data

• => +/- 400y releases on Statbel

Page 4: Open statistics Belgium

Open Statistics – Statistics Belgium 2

One of the core tasks consists in making all produced statistics

available to everyone (European Statistics Code of Practice)

– Website Statbel since 1997

– Free re-(use) => source

– ‘open by default’

+/-100 statistics

– The main fields covered are population, society, work,

economy, real estate, construction, mobility and transport.

– Census

Page 5: Open statistics Belgium

Open Statistics – Open Data Start?

Why?

• 2nd PSI – directive

• Belgian Federal Open Data strategy 2015

• Digital agenda (EU)

• Eurostat => EU Open Data Portal

• Crossroad Bank Enterprises (KBO) company register

• Users

Benefits

Page 6: Open statistics Belgium

Open Statistics – Statbel Open Data Portal 1 Open Data Portal on the Statbel website since Q4 2015 : www.statbel.fgov.be/opendata

– Population & Census

– Labour market &

living conditions

• Fiscal statistics

on income

– Environment

– Prices

• CPI

– Tools

• Geography

• Codes and Classifications

Page 7: Open statistics Belgium

Open Statistics – Statbel Open Data Portal 2 +-/ 110 datasets

Formats

• XLSX Excel Pivot tables

• CSV, TXT R, SAS, …, PostgreSQL,

• GML, SHP QGIS, ArcGIS, … ,

• Json, XML, CSV, XLSX be.STAT=> dynamic databank of Statbel

Special care

– Privacy

– Continuity

Goal : 1 new dataset/month

– Next : population, households, real estate

Page 8: Open statistics Belgium

Open Statistics – Statistics Belgium in the EU

European Statistical System = Eurostat + NSI’s

– Key provider of public open data

– Draft Open Data Strategy (feb 2017)

Statistics Belgium

• Statbel.fgov.be/opendata

Eurostat

• Key contributor to the open data portals

EU Open Data Portal

• Data.europa.eu/euodp

Belgium

• Data.gov.be

Metadata harvesting European Data Portal

• www.europeandataportal.eu

meta

data

Me

tad

ata

h

arv

estin

g

Page 9: Open statistics Belgium

Open Statistics – 5***** ?

Statistics Belgium => Open Data

Statbel: Situation actuelle

Statbel: Ambition

Page 10: Open statistics Belgium

Open Statistics – RDF

Resource description framework (RDF)

Page 11: Open statistics Belgium

Open Statistics – RDF - Uniform resource identifier URI

Use URIs to identify things, so that people can point at your

stuff

– A URI identifies a concept.

– Example of a URI for the Rixensart

commune:http://vocab.belgif.be/refnis/25091#id

– In general, a URI is associated with a web page that documents the

concept. For Rixensart:

http://vocab.belgif.be/refnis/25091

Page 12: Open statistics Belgium

Open Statistics – Resource description framework (RDF)

In the RDF files, triplets of the type “subject-predicate-object” are stored

In RDF files,

– subjects are URIs.

– predicats are URIs.

– objects are URIs ou des litéraux

Example (nomenclature):

<http://vocab.belgif.be/refnis/25091#id>

<http://www.w3.org/2004/02/skos/core#prefLabel> "Rixensart"@fr .

There are "standard vocabularies" (rules for forming triplets). Skos is one

of them.

Page 13: Open statistics Belgium

Open Statistics – Resource description framework (RDF)

It’s possible to use "prefixes" to "abbreviate" URIs in RDF files

Example:

@prefix refnis: http://vocab.belgif.be/refnis/ .

@prefix skos: http://www.w3.org/2004/02/skos/core# .

refnis:25091#id skos:prefLabel "Rixensart"@fr.

refnis:25091#id skos:broader refnis:25000#id.

Page 14: Open statistics Belgium

Open Statistics – Resource description framework (RDF)

Sample RDF file to describe a study(metadata): – ddi:Study_1 a disco:Study.

– ddi:Study_1 dcterms:title "National Population and Housing Census, 1980"@en.

– ddi:Study_1 dcterms:identifier "ARG_1980_PHC_v01_A_IPUMS“ .

This description uses the vocabulary « ddi-rdf » (disco):

– DDI-RDF is “A vocabulary for publishing metadata about data sets

(research and survey data) into the Web of Linked Data”

– Described here : http://rdf-vocabulary.ddialliance.org/discovery.html

Page 15: Open statistics Belgium

Open Statistics – Resource description framework (RDF)

RDF = forming triplets

There are several syntaxes to form them

– turtle,

– N-triples,

– xml,

– …

Page 16: Open statistics Belgium

Open Statistics – Linked Open Data (LOD)

Linked open-data (LOD)

Page 17: Open statistics Belgium

Open Statistics – Linked Open Data (LOD)

It’s possible to link several RDF sources. This is referred to as Linked

Open Data (LOD).

Examples of LOD sites on which to link :

– Dbpedia

– Wikidata

– Geonames

A simple way to link to another DB is to re-use its URIs

Page 18: Open statistics Belgium

Open Statistics – Linked Open Data (LOD)

Example of LOD (nomenclature):

– @prefix refnis: http://vocab.belgif.be/refnis/ .

@prefix skos: http://www.w3.org/2004/02/skos/core# .

refnis:25091#id skos:prefLabel "Rixensart"@fr.

refnis:25091#id skos:broader refnis:25000#id.

refnis:25091#id skos:exactMatch <http://sws.geonames.org/2787990>.

refnis:25091#id skos:exactMatch <http://www.wikidata.org/entity/Q630478> .

Page 19: Open statistics Belgium

Open Statistics – Semantic web

Semantic web

Page 20: Open statistics Belgium

Open Statistics – Semantic web

All the " sujet-prédicat-objet " sentences of the different LODs

form a giant "knowledge graph" whose size increases rapidly

Page 21: Open statistics Belgium

Open Statistics – Semantic web

Page 22: Open statistics Belgium

Open Statistics – Ontologies for statisticians

Standard vocabularies

Page 25: Open statistics Belgium

Open Statistics – Standard vocabularies

Other interesting vocabularies recommended by Eurostat

– The Organization Ontology

– The PROV ontology

– Time Ontology in OWL

– Dublin Core

– ISA Core Vocabularies in RDF (Person, Public Organisation,

Business, Public Service, Location)

– Vocabulary of Interlinked Datasets (VoID)

Page 26: Open statistics Belgium

Open Statistics – Nomenclatures

Some nomenclatures, "controlled vocabularies" & thesauri

recommended by Eurostat:INSPIRE code lists

– EuroVoc thesaurus

– Named Authority Lists (NAL)

Page 27: Open statistics Belgium

Open Statistics – LOD IN THE NSIs

Some NSIs already have LOD:

– Insee: Some code tables + legal population

– Istat

– ONS + Geoportal UK

– Census 2011 in Ireland

Page 28: Open statistics Belgium

Open Statistics – RDF@Statbel

What to publish as LOD?

Priorities for publication as LOD:

– Nomenclatures (create URIs for NACEBEL, REFNIS, … +

create files that expose hierarchies, …)

– Catalog of the data (to let the ‘machines’ all over the world

know that our datasets are available in csv, …)

– Metadata

– A selection of datasets (For example: legal population of

municipalities)

Page 29: Open statistics Belgium

Open Statistics – Questions

Page 30: Open statistics Belgium

Open Statistics – Contact

Check out our websites

Explore our datasets

Re-use our data

and

Contact us!

For questions please contact :

[email protected]

[email protected]

[email protected]

To find out more check:

http://statbel.fgov.be

https://bestat.statbel.fgov.be

http://statbel.fgov.be/opendata/

http://statbel.fgov.be/en/statistics/opendata/licence/

Follow Statbel on Twitter