8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

29
Integrating Data using Graphs and Semantics Juan F. Sequeda [email protected]

Transcript of 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

Page 1: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

Integrating  Data  using  Graphs  and  Semantics

Juan  F.  [email protected]

Page 2: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

IT Biz

Total  net  sales  of  

all  Orders  today

Reports

Page 3: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

What  do  you  mean  by  …

How  many  orders  were  placed  in  May  2016?

317,595

317,124

316,899

Billing

Shipping

E-­‐Commerce

Page 4: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

What  do  you  mean  by  …

What  is  an  Order?

When  a  user  clicks   “Order”  on  

the  websiteWhen  the  

customer   has  received   the  product

When  it  comes  out  of  the   billing  system  and  the  CC  has  been  charged

Billing

Shipping

E-­‐Commerce

Data  resides  in  different  sources

Ambiguity

No  Shared  Understanding Lack   of  

Semantics

Page 5: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

IT

Biz

Total  net  sales  of  

all  Orders  today

DataArchitect

SELECT  ..  

FROM  …

csv csvcsv

MSAccess

T=1T=2T=3

XLS

Did  the  Biz  User  communicate   the  correct  message   to  IT?  

Did  IT  understand  correctly   what  the  Biz  User  wanted?  

Did  IT  deliver  the  correct/precise   results?   ReportsXLS

XLS

Status  Quo  1

Page 6: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

EnterpriseData  Warehouse

IT Biz

Reports

Time   and  $

Total  net  sales  of  

all  Orders  today

ETL

ETL

ETL

Total  net  sales  of  all  Orders  

today  with  FX

Status  Quo  2

DataArchitect

Page 7: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

Cross  Organizational  Data  Integration  

Organization  1

Organization  2

Organization  n

Page 8: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

8

Page 9: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

GRAPHS  ARE  COOL!

9

Page 10: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

Flexible

:US_Constitution_1992/section/123

“Excessive   bail  shall  not  be  required,  nor  

excessive   fines  imposed,  nor  cruel  and  unusual  punishments   inflicted.”

:text

:US_Constitution_1992 “United   States   of  America  1789  (rev.  1992)”

:text

:isSectionOf

:Cruelty:hasTopic

“Prohibition  of  cruel  or  degrading  treatment”

:label

“inhumane   treatment”

:keyword

10

Page 11: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

Integration

:US_Constitution_1992/section/123

“Excessive   bail  shall   not  be  required,   nor   excessive  fines   imposed,   nor  cruel  and  unusual   punishments  

inflicted.”

:text

:US_Constitution_1992 “United   States  of  America  1789   (rev.  1992)”

:isSectionOf

:Cruelty:hasTopic

“Prohibition   of  cruel   or  degrading   treatment”

:label

“inhumane   treatment”

:keyword

:text

:EighthAmendment_USConstitution :Farmer_vs_Brennan

:lawsApplied

“A  prison   official’s  ‘deliberate   indifference’  to  a  substantial   risk  of  a  serious   harm   to  an  inmate    

violates   the  Eighth  Amendment”

:holding:sameAs

:Prisons_in_Indiana

:LGBT_right_case_laws

:subject :subject

11

Page 12: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

Data  and  Metadata  are  One

:US_Constitution_1992/section/123

“Excessive   bail  shall   not  be  required,   nor   excessive  fines   imposed,   nor  cruel  and  unusual   punishments  

inflicted.”

:text

:US_Constitution_1992 “United   States  of  America  1789   (rev.  1992)”

:isSectionOf

:Cruelty:hasTopic

“Prohibition   of  cruel   or  degrading   treatment”

:label

“inhumane   treatment”

:keyword

:text

:Section :Constitution:Topic

:Rights_and_Duties

:Physical_Integrity_Rights

:subClass

:subClass

:subClass

:hasTopic :isSectionOf

:type

:type

12

Page 13: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

Common  denominator  <constitution id=“US_Constitution_1992”>

<section id="US_Constitution_1992/section/123"><text>Excessive bail shall ...</text>

</section><topic>Cruelty</topic>

</constitution>

“Excessive   bail  shall  not  be  required,  nor  excessive   fines  imposed,   nor  cruel and  unusual  punishments   inflicted.”

id text topic123 Excessive  bail  shall…   Cruelty

:US_Constitution_1992/section/123

“Excessive   bail  shall   not  be  required,   nor   excessive  fines   imposed,   nor  cruel  and  unusual   punishments  

inflicted.”

:text:Cruelty

:hasTopic

XML Text

Tabular

13

Page 14: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

Traversal,  Navigation,  Reachability

:US_Constitution_1992/section/123

“Excessive   bail  shall   not  be  required,   nor   excessive  fines   imposed,   nor  cruel  and  unusual   punishments  

inflicted.”

:text

:US_Constitution_1992 “United   States  of  America  1789   (rev.  1992)”

:isSectionOf

:Cruelty:hasTopic

“Prohibition   of  cruel   or  degrading   treatment”

:label

“inhumane   treatment”

:keyword

:text

:EighthAmendment_USConstitution :Farmer_vs_Brennan

:lawsApplied

“A  prison   official’s  ‘deliberate   indifference’  to  a  substantial   risk  of  a  serious   harm   to  an  inmate    

violates   the  Eighth  Amendment”

:holding:sameAs

:Prisons_in_Indiana

:LGBT_right_case_laws

:subject :subject

14

Page 15: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

Semantics

:US_Constitution_1992/section/123

“Excessive   bail  shall   not  be  required,   nor   excessive  fines   imposed,   nor  cruel  and  unusual   punishments  

inflicted.”

:text:Cruelty

:hasTopic

“Prohibition   of  cruel   or  degrading   treatment”

:label

“inhumane   treatment”

:keyword

:Physical_Integrity_Rights

:subClass

:hasTopic

15

Page 16: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

(Summary)  Why  are  Graphs  Cool?

• Flexible• Integration• Data  and  Metadata  are  one

• Common  Denominator• Traversal,  Navigation,  Reachability

• Semantics

ACM  Computing  Surveys  200816

Page 17: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

Integrating  Data  using  Graphs  and  Semantics

17

HIVEImpala,   etc

OracleSQL  

Server

PostgresUnstructured

Semi-­‐Structured

Mappings

Enterprise  Knowledge  Graph

Search ReportsAPI Dashboard

Page 18: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

MAPPING  RELATIONAL  DATABASES  TO  GRAPHS

18

Page 19: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

Relational  Database  to  RDF  (RDB2RDF)

ID NAME AGE CID

1 Alice 25 100

2 Bob NULL 100

Person

CID NAME

100 Austin

200 Madrid

City

<Person/1>

<City/100>

Alice25

Austin

<Person/2>

Bob

<City/200> Madrid

foaf:namefoaf:name foaf:age

rdfs:label

rdfs:label

foaf:based_near

Mapping

19

Page 20: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

W3C  RDB2RDF  Standards

• Standards  to  map  Relational  Data  to  RDF

• A  Direct  Mapping  of  Relational  Data  to  RDF– Default  automatic  mapping  of  relational  data  to  RDF

• R2RML:  RDB  to  RDF  Mapping  Language– Customizable  language  to  map  relational  data  to  RDF

20

Page 21: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

RDF

W3C  Direct  Mapping

RelationalDatabase

Direct  MappingEngine

Input:  Database  (Schema  and  Data)Primary  KeysForeign  Keys

OutputRDF  graph

21

Page 22: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

W3C  Direct  Mapping  Result

ID NAME AGE CID

1 Alice 25 100

2 Bob NULL 100

Person

CID NAME

100 Austin

200 Madrid

City

<Person/ID=1>

<City/CID=100>

Alice25

Austin

<Person/ID=2>

Bob

<City/CID=200> Madrid

Person#Name Person#Age

City#Name

City#Name

Person#ref-­‐CID

Direct  Mapping

Person#Name

22

Page 23: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

R2RML

R2RMLEngine

R2RMLFile

:Cruelty

:Section :Constitution:Topic

:Rights_and_Duties

:Physical_Integrity_Rights

:subClass:subClass

:subClass

:hasTopic :isSectionOf

RDF

RelationalDatabase

Target  Schema

23

Page 24: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

<TriplesMap1>a rr:TriplesMap;

rr:logicalTable [ rr:tableName”Person" ];

rr:subjectMap [ rr:template"http://www.ex.com/Person/{ID}";

rr:classfoaf:Person ];

rr:predicateObjectMap [ rr:predicate foaf:based_near ; rr:objectMap [

rr:parentTripelMap <TripleMap2>;rr:joinCondition [

rr:child “CID”;rr:parent “CID”;

]]

].

<TriplesMap2>a rr:TriplesMap;

rr:logicalTable [ rr:tableName ”City" ];

rr:subjectMap [ rr:template "http://ex.com/City/{CID}";rr:class ex:City ];

rr:predicateObjectMap [ rr:predicate foaf:name; rr:objectMap [ rr:column ”TITLE" ]

].

Example  R2RML

24

Page 25: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

Graph  Data  Virtualization

SPARQL

RDBMS Graph

SQL

SQL  Results

SPARQLResults

R2RML  Mapping

25

Page 26: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

RDBMS RDBMS RDBMS

UltrawrapNoETL

UltrawrapNoETL

UltrawrapNoETLR2RML R2RML R2RML

SPARQL  Federator

RDBMS

UltrawrapNoETLR2RML

NoETL  Architecture

26

Page 27: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

RDBMS RDBMS RDBMS

UltrawrapNoETL

UltrawrapNoETL RDF

Triplestore

R2RML R2RML

SPARQL  Federator

RDBMS

R2RML

R2RML

UltrawrapETL

Hybrid  NoETL  and  ETL  Architecture

27

Page 28: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

Scalability

• Seconds  vs  Months• Reuse  existing  relational  infrastructure

– 30+  years  of  optimizations– Semantic  Query  Optimizations

• Result:  SPARQL  as  fast  as  SQL  under  mappings

Sequeda &  Miranker.  Ultrawrap:  SPARQL  Execution  on  Relational  Data.  J.  of  Web  Semantics  2013

Page 29: 8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and Semantics

The  Tipping  Point  Problem

Relational  Database

Graphs

• Flexible• Integration• Data  and  Metadata  are  One• Common  Denominator• Traversal,  Navigation,  Reachability  • Semantics

29

Sequeda  (2015)  Integrating  Relational  Databases  with  the  Semantic  Web

An  overarching  theme  is  the  need  to  create  systematic  and  real-­‐world  benchmarks  in  order  to  evaluate  different  solutions  for  these  features.