Post on 13-Apr-2017
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Virtualizing Relational Databases as Graphs: a multi-model approach
Juan F. Sequeda, Ph.DCo-FounderCapsenta
1
(i.e. Want Graphs? Have Relational? No Problem!)
Smart Data/Graphorum Conference – February 1, 2017
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Background
• Ph.D. from UT Austin Computer Science• Research on Integrating Relational Databases with Semantics and Graphs
• Editor W3C Standard on Mapping Relational Databases to Graphs
• Co-‐Founder Capsenta, spinout from UTCS
2
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Takeaway
• Relational Databases can be virtualized as Graphs!
• Do you really need to create another database?
3
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Flexible
6
:US_Constitution_1992/section/123
“Excessive bail shall not be required, nor
excessive fines imposed, nor cruel and unusual punishments inflicted.”
:text
:US_Constitution_1992 “United States of America 1789 (rev. 1992)”
:text
:isSectionOf
:Cruelty:hasTopic
“Prohibition of cruel or degrading treatment”
:label
“inhumane treatment”
:keyword
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Integration
7
:US_Constitution_1992/section/123
“Excessive bail shall not be required, nor excessive fines imposed, nor cruel and unusual punishments
inflicted.”
:text
:US_Constitution_1992 “United States of America 1789 (rev. 1992)”
:isSectionOf
:Cruelty:hasTopic
“Prohibition of cruel or degrading treatment”
:label
“inhumane treatment”
:keyword
:text
:EighthAmendment_USConstitution :Farmer_vs_Brennan
:lawsApplied
“A prison official’s ‘deliberate indifference’ to a substantial risk of a serious harm to an inmate
violates the Eighth Amendment”
:holding:sameAs
:Prisons_in_Indiana
:LGBT_right_case_laws
:subject :subject
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Data and Metadata are One
8
:US_Constitution_1992/section/123
“Excessive bail shall not be required, nor excessive fines imposed, nor cruel and unusual punishments
inflicted.”
:text
:US_Constitution_1992 “United States of America 1789 (rev. 1992)”
:isSectionOf
:Cruelty:hasTopic
“Prohibition of cruel or degrading treatment”
:label
“inhumane treatment”
:keyword
:text
:Section :Constitution:Topic
:Rights_and_Duties
:Physical_Integrity_Rights
:subClass
:subClass
:subClass
:hasTopic :isSectionOf
:type
:type
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Common denominator
9
<constitution id=“US_Constitution_1992”><section id="US_Constitution_1992/section/123">
<text>Excessive bail shall ...</text></section><topic>Cruelty</topic>
</constitution>
“Excessive bail shall not be required, nor excessive fines imposed, nor cruel and unusual punishments inflicted.”
id text topic123 Excessive bail shall… Cruelty
:US_Constitution_1992/section/123
“Excessive bail shall not be required, nor excessive fines imposed, nor cruel and unusual punishments
inflicted.”
:text:Cruelty
:hasTopic
XML Text
Tabular
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Traversal, Navigation, Reachability
10
:US_Constitution_1992/section/123
“Excessive bail shall not be required, nor excessive fines imposed, nor cruel and unusual punishments
inflicted.”
:text
:US_Constitution_1992 “United States of America 1789 (rev. 1992)”
:isSectionOf
:Cruelty:hasTopic
“Prohibition of cruel or degrading treatment”
:label
“inhumane treatment”
:keyword
:text
:EighthAmendment_USConstitution :Farmer_vs_Brennan
:lawsApplied
“A prison official’s ‘deliberate indifference’ to a substantial risk of a serious harm to an inmate
violates the Eighth Amendment”
:holding:sameAs
:Prisons_in_Indiana
:LGBT_right_case_laws
:subject :subject
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Semantics
11
:US_Constitution_1992/section/123
“Excessive bail shall not be required, nor excessive fines imposed, nor cruel and unusual punishments
inflicted.”
:text:Cruelty
:hasTopic
“Prohibition of cruel or degrading treatment”
:label
“inhumane treatment”
:keyword
:Physical_Integrity_Rights
:subClass
:hasTopic
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
(Summary) Why are Graphs Cool?
12
• Flexible• Integration• Data and Metadata are one
• Common Denominator• Traversal, Navigation, Reachability
• Semantics
ACM Computing Surveys 2008
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Flexible
14
id attr1 attr2 attr3 attr4 … attrn …
id attribute value
id attr1 val1 attr2 val2 attr3 val3
id valueattr1
id valueattr2
id valueattr3
Copeland and Khoshafian. A decomposition storage model. SIGMOD 1985
Agrawal et al. Storage and Querying of E-‐Commerce Data. VLDB 2001
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Integration
15
ExtractTransformLoad
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Data and Metadata are One
16
CREATE TABLE Person (id int primary key,Name varchar not null,…
)
Company closePrice closeDate
IBM 130 1/15/2016
MSFT 50.99 1/15/2016
IBM MSFT closeDate
130 50.99 1/15/2016
closePrice closeDate130 1/15/2016
closePrice closeDate
50.99 1/15/2016
IBM MSFT
-‐ Krishnamurthy et al. Language features for interoperability of databases with schematic discrepancies. SIGMOD 1991-‐ Lakshmanan et al. SchemaSQL -‐ A Language for Interoperability in Relational Multi-‐database Systems. VLDB 1996
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Common denominator
• Social Network• Hierarchical Data
17
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Traversal, Navigation, Reachability
• Write a bunch of Joins• Recursion
18
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
(Summary) Are Relational Databases Cool?
• Flexible• Integration• Data and Metadata are one• Common Denominator• Traversal, Navigation, Reachability• Semantics
20
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Ditch your Relational Database and move to Graphs!?
Feasible?
21
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Query Federation
What if …
22
Virtualize Relational Databases as Graphs
Keep your legacy data in the RDBMS
Run graph queries over the virtual graph data
Add new data that doesn’t fit into the schema into a separate graph
Federate queries over Virtualized Graph and the Real Graph
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
What type of graphs are we talking about?
24
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Property Graphs vs RDF Graphs
25
:Bob :Alicefoaf:knows
“Bob Smith”
foaf:name
“Alice Smith”
foaf:name
id1 id2
knowskey value
name Bob Smith
key value
name AliceSmith
key value
since 2005
:g1
2005
:since
http://db-‐engines.com/en/ranking/graph+dbms http://db-‐engines.com/en/ranking/rdf+store
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
RDF: Resource Description Framework
• Graph Data Model• Subject (Node) – Predicate (Edge) – Object (Node)• W3C Standard for data on the web• URIs
26
:US_Constitution_1992/section/123
:US_Constitution_1992 “United States of America 1789 (rev. 1992)”
:isSectionOf
:text
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Schemas, Taxonomies, Ontologies
27
:Cruelty
:Section :Constitution:Topic
:Rights_and_Duties
:Physical_Integrity_Rights
:subClass
:subClass
:subClass
:hasTopic :isSectionOf
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
SELECT ?x ?titleWHERE{ ?x :hasTopic :cruelty.?x :text ?t
}
SPARQL Graph
RDF Graph
:USConst/146
:cruelty
:prerel
“Excesses bail shall not be required … “
:text
?x :cruelty:hasTopic
?text
SPARQL Protocol and RDF Query Language
28
SPARQL is a Query Language• Graph query language for RDF• Match SPARQL graph with RDF graph• Much more features in SPARQL 1.1: Property Paths
SPARQL is a Protocol• Send query over HTTP GET or POST• Response to a query is either in XML, JSON or CSV format
:text
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
SPARQL Graph
RDF Graph
:USConst/146
:cruelty
:prerel
“Excesses bail shall not be required … “
:text
?x :cruelty:hasTopic
?text
:text
SPARQL Protocol and RDF Query Language
29
?x ?text
:USConst/146 “Excesses bail shall not be required … “
SELECT ?x ?titleWHERE{ ?x :hasTopic :cruelty.?x :text ?t
}
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Virtualizing Relational Databases as Graphs
30
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Relational Database to RDF (RDB2RDF)
31
ID NAME AGE CID
1 Alice 25 100
2 Bob NULL 100
Person
CID NAME
100 Austin
200 Madrid
City
<Person/1>
<City/100>
Alice25
Austin
<Person/2>
Bob
<City/200> Madrid
foaf:namefoaf:name foaf:age
rdfs:label
rdfs:label
foaf:based_near
Mapping
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
W3C RDB2RDF Standards
• W3C Standards to map Relational Data to RDF
• A Direct Mapping of Relational Data to RDF– Default automatic mapping of relational data to RDF
• R2RML: RDB to RDF Mapping Language– Customizable language to map relational data to RDF
32
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
RDF
W3C Direct Mapping
33
RelationalDatabase
Direct MappingEngine
Input: Database (Schema and Data)Primary KeysForeign Keys
OutputRDF graph
https://www.w3.org/TR/rdb-‐direct-‐mapping/
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
W3C Direct Mapping Result
34
ID NAME AGE CID
1 Alice 25 100
2 Bob NULL 100
Person
CID NAME
100 Austin
200 Madrid
City
<Person/ID=1>
<City/CID=100>
Alice25
Austin
<Person/ID=2>
Bob
<City/CID=200> Madrid
Person#Name Person#Age
City#Name
City#Name
Person#ref-‐CID
Direct Mapping
Person#Name
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
R2RML
35
R2RMLEngine
R2RMLFile
:Cruelty
:Section :Constitution:Topic
:Rights_and_Duties
:Physical_Integrity_Rights
:subClass:subClass
:subClass
:hasTopic :isSectionOf
RDF
RelationalDatabase
Target Schema
https://www.w3.org/TR/r2rml/
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
<TriplesMap1>a rr:TriplesMap;
rr:logicalTable [ rr:tableName”Person" ];
rr:subjectMap [ rr:template"http://www.ex.com/Person/{ID}";
rr:classfoaf:Person ];
rr:predicateObjectMap [ rr:predicate foaf:based_near ; rr:objectMap [
rr:parentTripelMap <TripleMap2>;rr:joinCondition [
rr:child “CID”;rr:parent “CID”;
]]
].
<TriplesMap2>a rr:TriplesMap;
rr:logicalTable [ rr:tableName ”City" ];
rr:subjectMap [ rr:template "http://ex.com/City/{CID}";rr:class ex:City ];
rr:predicateObjectMap [ rr:predicate foaf:name; rr:objectMap [ rr:column ”TITLE" ]
].
Example R2RML
36
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Graph Data Virtualization
37
SPARQL
RDBMS Graph
SQL
SQL Results
SPARQLResults
R2RML Mappingby
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Performance of Ultrawrap
• Reuse existing relational infrastructure– 30+ years of optimizations– Semantic Query Optimizations
• Result: SPARQL as fast as SQL
38
Sequeda J. Integrating Relational Databases with the Semantic Web. IOS Press. 2016http://www.iospress.nl/book/integrating-‐relational-‐databases-‐with-‐the-‐semantic-‐web/
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Semantic Data Virtualization as a Bridge for BI
40
HIVEImpala, etc
OracleSQL
Server
PostgresUnstructured
Semi-‐Structured
Enterprise Knowledge Graph
Search ReportsAPI BI Connectors(Tableau, …)
Smart Data for Smarter Business | © 2016 Capsenta | capsenta.com
Takeaway: Tipping Point
41
Relational Database
Graphs
• Flexible• Integration• Data and Metadata are One• Common Denominator• Traversal, Navigation, Reachability • Semantics
Do you really need another database?
Relational Databases can be virtualized as Graphs