How to Publish Open Data

Post on 11-May-2015

3.884 views 5 download

description

A practical guide to publishing open data, presented at the Galway event of Irish Open Data Week 2011. Introducing the “five-shamrock scheme”!

Transcript of How to Publish Open Data

Copyright 2010 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

How to publish Open Data

Richard CyganiakOpening Up Government Data – Galway, 8 Nov 2011

Stefan.Decker@deri.orghttp://www.StefanDecker.org/

Digital Enterprise Research Institute www.deri.ie

TimBL’s 5-star plan for open data

★ Make your stuff available on the Web

★★ Make it available as structured data(e.g., an Excel sheet instead of image scan of a table)

★★★ Use a non-proprietary format(e.g., a CSV file instead of an Excel sheet)

★★★★ Use linked data format(i.e., URIs to identify things, and RDF to represent data)

★★★★★ Link your data to other people’s data to provide contextSource: http://inkdroid.org/journal/2010/06/04/the-5-stars-of-open-linked-data/

Digital Enterprise Research Institute www.deri.ie

Five-shamrock scheme

Digital Enterprise Research Institute www.deri.ie

Five-shamrock scheme

1. Publish data on the web

Digital Enterprise Research Institute www.deri.ie

Five-shamrock scheme

1. Publish data on the web

2. Publish data in a machine-processable format

Digital Enterprise Research Institute www.deri.ie

Five-shamrock scheme

1. Publish data on the web

2. Publish data in a machine-processable format

3. Use an open standard format

Digital Enterprise Research Institute www.deri.ie

Five-shamrock scheme

1. Publish data on the web

2. Publish data in a machine-processable format

3. Use an open standard format

4. Publish under an open license

Digital Enterprise Research Institute www.deri.ie

Five-shamrock scheme

1. Publish data on the web

2. Publish data in a machine-processable format

3. Use an open standard format

4. Publish under an open license

5. List your data in a data catalog

Digital Enterprise Research Institute www.deri.ie

1. Publish data on the web

Digital Enterprise Research Institute www.deri.ie

Why?

The web is where people look for it first Google can index it Less phone calls and emails (and FoI requests) to

answer

Digital Enterprise Research Institute www.deri.ie

Lots of data is already there

Databases Reports Spreadsheets Maps

Digital Enterprise Research Institute www.deri.ie

2. Publish data in a machine-processable

format

Digital Enterprise Research Institute www.deri.ie

Why?

Allow others to do their own processing, analysis and visualisation of your data

New services, new ideas

Digital Enterprise Research Institute www.deri.ie

Examples

CSO Quarterly National Household Survey http://cso.ie/qnhs/calendar_quarters_qnhs.htm

EPA enforcement files and ScraperWiki http://www.epa.ie/whatwedo/enforce/lic/info/ https://views.scraperwiki.com/run/irish-epa-visuals/

Galway and Fingal planning applications http://lab.linkeddata.deri.ie/2010/planning-apps/ Getting the data: 210 lines of code vs. 30 lines of code

Digital Enterprise Research Institute www.deri.ie

Symptom: screenscraping

People use tools like ScraperWiki to get at data that isn't machine-readable https://scraperwiki.com/tags/ireland

Scraping is not the right way of doing this Expensive Brittle Strain on computing resources

Digital Enterprise Research Institute www.deri.ie

Formats

Good: MS Excel, CSV, XML, JSON, Microdata Not so good: Pure websites, MS Word Bad: PDF Really bad: Only charts/maps without numbers

Digital Enterprise Research Institute www.deri.ie

Good practices

Publish in multiple formats, at least one machine-readable

Publish Excel files alongside large PDF reports Publish CSV alongside database-backed web

applications

Digital Enterprise Research Institute www.deri.ie

3. Use an open standard format

Digital Enterprise Research Institute www.deri.ie

Why?

Not all formats are created equal Some formats bring many tools and applications

that people can already use

Digital Enterprise Research Institute www.deri.ie

Quick tour of formats

CSV – Comma-Separated Values More open (and simpler) alternative to Excel format Can be opened in and exported from Excel, Google

Spreadsheets, Google Refine, … KML – Keyhole Markup Language

Simple format for presenting geographic data Can be opened in Google Maps

RSS – Really Simple Syndication Notifications of updates of any kind Can be opened in RSS readers and many email clients

Digital Enterprise Research Institute www.deri.ie

Developer-oriented formats

XML – Extensible Markup Language W3C (World Wide Web Consortium) standard, 1997 established, reliable, ubiquitous

JSON – Javascript Object Notation IETF (Internet Engineering Task Force) standard, 2006 great for web APIs very simple; very fashionable right now

RDF – Resource Description Framework W3C standard, 2004 great for data integration steeper learning curve

Digital Enterprise Research Institute www.deri.ie

Also: standard classifications

Within your data, use the same categories as everybody else

CSO http://www.cso.ie/surveysandmethodologies/

classifications_stan.htm StatCentral list of classifications

http://www.statcentral.ie/classifications.asp

Digital Enterprise Research Institute www.deri.ie

Also: standard identifiers

Example: School roll numbers Department of Education publishes an Excel file with all

school roll numbers Can be used to Google the same school on other

websites, school evaluation reports etc Example: Ordnance Survey UK geo identifiers

Uses URIs (web addresses) as identifiers http://data.ordnancesurvey.co.uk/doc/7000000000037256 Great for use in RDF

Digital Enterprise Research Institute www.deri.ie

Linked Open Data Cloud

Digital Enterprise Research Institute www.deri.ie

Summary

Prefer open, widely used standards But: also prefer what you know best Support multiple formats for different audiences

where it makes sense Great: CSV, KML, RSS, XML, JSON

Digital Enterprise Research Institute www.deri.ie

4. Publish under anopen license

Digital Enterprise Research Institute www.deri.ie

Why?

Regulates what others can and cannot do with the data

For re-users, uncertainty about rights is a major concern

A good way to ensure that your organisation gets acknowledged

You need some non-discriminatory policy for giving rights to the data anyway (PSI directive)

Digital Enterprise Research Institute www.deri.ie

Complex topic

Destroying a potential income stream? Content licenses vs database licenses Mixing and compatibility of licenses

Wikipedia, OpenStreetMap

Digital Enterprise Research Institute www.deri.ie

Irish PSI License

Created in response to PSI Directive Available at http://psi.gov.ie/ Problems: Documents may not be used “for the

principal purpose of advertising or promoting a particular product or service” Can't be combined with Wikipedia or OpenStreetMap

Not an open license according to Open Definition http://opendefinition.org/

Digital Enterprise Research Institute www.deri.ie

Open database licenses

http://opendefinition.org/licenses/

Digital Enterprise Research Institute www.deri.ie

License features

You're allowed to do pretty much anything, provided you…

Attribution (“By”) – give credit ShareAlike (“SA”) – adapted data must be

published in the same way

Digital Enterprise Research Institute www.deri.ie

Does Open Data have to be free?

Many would say yes A matter of terminology and definitions Either way there is nothing wrong with charging

for certain data

Digital Enterprise Research Institute www.deri.ie

Data protection

Personal information is not open data Freedom of Information legislation

http://foi.gov.ie/

Digital Enterprise Research Institute www.deri.ie

Summary

Stating an explicit license is important Irish PSI License: It's readily available, but not

“open enough” for some applications Open Data Commons licenses with various

constraints

Digital Enterprise Research Institute www.deri.ie

5. List your data in adata catalog

Digital Enterprise Research Institute www.deri.ie

Why?

So that people know it exists This is how the world learns about available data This is how you learn what they do and need

Digital Enterprise Research Institute www.deri.ie

Some key information about a dataset

What data is being published? What's the license? When was the data collected? When will it be updated, if at all? How was/is this data collected? What was/is the data used for? Contact person? Where to give feedback?

Digital Enterprise Research Institute www.deri.ie

How to do this in practice?

Have a simple page on your website Use an open community data catalog Set up your own catalog Use a national Irish data catalog???

Digital Enterprise Research Institute www.deri.ie

Open community catalogs

The Data Hub http://thedatahub.org

Irish CKAN http://ie.ckan.net

Digital Enterprise Research Institute www.deri.ie

Set up your own catalog

Requires a budget Roll your own software?

data.fingal.ie Use open source, e.g., CKAN?

data.gov.uk Berlin Open Data …

Digital Enterprise Research Institute www.deri.ie

National Irish data catalog?

CSO's StatCentral? Marine Institute's ISDE? Who publishes the catalog in other countries?

UK: Cabinet Office US: White House Australia: Dept of Finance and Deregulation New Zealand: Dept of Internal Affairs

Digital Enterprise Research Institute www.deri.ie

Summary

Data catalogs make it easy to find data Basic metadata, how to give feedback etc Important: How often are datasets accessed? “Request a dataset” feature Also: Open Data Ireland Google Group

http://groups.google.com/group/open-data-ireland

Digital Enterprise Research Institute www.deri.ie

Five-shamrock scheme

1. Publish data on the web

2. Publish data in a machine-processable format

3. Use an open standard format

4. Publish under an open license

5. List your data in a data catalog