STI Summit 2011 - Mlr-sm




Transcript of STI Summit 2011 - Mlr-sm

Page 1: STI Summit 2011 - Mlr-sm
Page 2: STI Summit 2011 - Mlr-sm

My  Seman)cs  of  “Train”  <owl:Thing  rdf:about="#LevisTrain">                  <rdf:type  rdf:resource="#Train"/>                  <rdfs:label>Levi’s  Train</rdfs:label>                  <madeOf  rdf:resource="#Plas&c"/>          </owl:Thing>  

<owl:Thing  rdf:about="#LevisTrain">                  <rdf:type  rdf:resource="#Train"/>                  <rdfs:label>Levi’s  Train</rdfs:label>                  <madeOf  rdf:resource="#Wood"/>          </owl:Thing>  

Page 3: STI Summit 2011 - Mlr-sm


A  Usage-­‐dependent  Life  Cycle  

• toy  train  • made  of  plas)c  

Enter  the  room  

• SELECT  *  WHERE  ?t      a:madeOf  a:Plas)c  

• SELECT  *  WHERE  ?t  b:madeOf  b:Wood  

Request  to  put  away  the  “train”   • toy  train  

• made  of  wood  

Nego)ate  understanding  

Page 4: STI Summit 2011 - Mlr-sm

Make  it  less  a  methodology  but  support  the  people  to  get  their  “Things”  done!  

Yet  another…  

The  Maintenance  Black  Box  METHONTOLOGY  


OTK  NeOn  


Page 5: STI Summit 2011 - Mlr-sm

Who  is  hurt  by  that?  

•  rather  small/simple  ontologies  – min.  effort  for  OE  – “under-­‐engineered”  

•  unknown  user  requirements  

Page 6: STI Summit 2011 - Mlr-sm

Hey  “LOD  people”,  do  you  think  that  ontology  engineering  maaers?  

Usage-­‐based  ontology  engineering  

Page 7: STI Summit 2011 - Mlr-sm

 Publishers  of  99%  of  the  dataset  do  not  feel  responsible  for  their  data?  

Survey  ran  in  October  2010  

Survey  covering  approx.  25%  of  all  cloud  datasets  

•  size  •  complexity  •  engineering  methodology  •  …    Publishers  of  75%  of  the  dataset  do  not  feel  

responsible  for  their  data?  

Page 8: STI Summit 2011 - Mlr-sm

Concrete  Example  of  Usage-­‐based  Approach  

digging  in  log  files  

Page 9: STI Summit 2011 - Mlr-sm



• SELECT  *  WHERE  ?t      a:madeOf  a:Plas)c  

• SELECT  *  WHERE  ?t  b:madeOf  b:Wood  

Request  to  put  away  the  “train”  

Yes*!  But  beyond?  

•  What  about  the  future  of  SPARQL  endpoints  on  the  WoD?  

*  W.r.t.  an  architecture  proposed  by  a  famous  “Web-­‐Extremist”  

Page 10: STI Summit 2011 - Mlr-sm

You  should  have  a  query  endpoint!  

•  You  get  something  valuable  out  of  it  which  helps  you  to  play  your  role  on  the  WoD!  

Effort Distribution between Publisher and Consumer

Consumer generates/ data mines linksdata mines links

Effort Distribution

Publisher provides links

Links as hints

Christian Bizer: Pay-as-you-go Data Integration (21/9/2010)


The overall data integration effort is split between the data publisher the

!"#$%&'()) *(+(,-+&.'(+"/-

split between the data publisher, the data consumer and third parties.

Data Publisherpublishes data as RDF


reuses terms from common vocabularies

sets links and publishes mappings

Third Partiesset links pointing at your data 23"'4

567)"83&'98p g y

publish mappings to the Web

Data Consumer



Data Consumerhas to do the rest

using data mining techniques for


Christian Bizer: Pay-as-you-go Data Integration (21/9/2010)

using data mining techniques foridentity resolution and schema matching

Page 11: STI Summit 2011 - Mlr-sm

Usage  Analysis  •  queries  •  paaerns  •  triples  •  primi)ves  

zoom  in  and  see  details  

visualize  heat  maps  

Page 12: STI Summit 2011 - Mlr-sm

Some  Results  (DBpedia  Analysis)  

Complete  analysis  can  be  found  at  hap://page.mi.fu-­‐­‐analysis-­‐of-­‐web-­‐of-­‐data-­‐usage-­‐dbpedia33/  

missing  facts  

inconsistent  data  

•  ns:Band  ns:knownFor  ?x  •  ns:Band  ns:na)onality  ?y  

•  ns:Band  ns:instrument  ?x  •  ns:Band  ns:genre  ?y  •  ns:Band  ns:associatedBand  ?z  

Page 13: STI Summit 2011 - Mlr-sm

Some  Thoughts  about  Benefit  

•  usage  analysis  helps  to  acquire  new  knowledge  –  links  between  data    helps  to  increase  the  quality  of  data  on  the  Web  

– external  schema  

•  lightweight  approach  helps  to  bootstrap  linked  data  

It  is  not  necessary  to  automate  everything  if  the  result  has  enough  (business)  value  in  a  problem  domain  anyway.    

Page 14: STI Summit 2011 - Mlr-sm

•  LOD  vocabularies  are  specific  ontologies  and  need  specific  life  cycle  support  

•  usage  analysis  can  help  to  maintain  them  (and  the  data)  

•  this  is  a  benefit  for  the  dataset  publisher  and  the  Web  of  data  as  a  whole  

•  make  it  less  a  methodology  •  provide  (query)  access  to  your  data  endpoint  and  play  your  role  on  the  WoD  

•  it  is  not  implicitly  necessary  to  automate  things  that  enable  automa)on  

Take  Away  

Hey  “LOD  people”,  do  you  think  that  dataset  maintenance  maaers?  

Markus  Luczak-­‐Rösch  ([email protected]­‐  Freie  Universität  Berlin,  Networked  Informa)on  Systems  (­‐  

Page 15: STI Summit 2011 - Mlr-sm

Actual  Addi)on  

•  “15.500.000  people  in  Germany  are  not  willing  to  use  the  internet”  

– emphasis  on  the  ESWC  discussion:  bridging  the  gap  (directly  or  indirectly)  between  these  people  and  the  internet/Web  has  a  high  poten)al  to  influence  societal  transforma)on  (they  are  not  going  to  use  a  browser  or  an  iPhone  and  they  do  not  care  for  seman)cs)  

Source:  ARD-­‐Morgenmagazin,  08-­‐07-­‐2011