Propagation of Policies in Rich Data Flows
-
Upload
enrico-daga -
Category
Data & Analytics
-
view
769 -
download
0
Transcript of Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data Flows
1
Enrico Daga† Mathieu d’Aquin† Aldo Gangemi‡ Enrico Motta† † Knowledge Media Ins2tute, The Open University (UK) ‡ Université Paris13 (France) and ISTC-‐CNR (Italy)
The 8th Interna2onal Conference on Knowledge Capture (K-‐CAP 2015) October 10th, 2015 -‐ Palisades, NY (USA) hRp://www.k-‐cap2015.org/
Feedback welcome: @enridaga #kmiou
Motivation
• Governing the life cycle of data on the web is a challenging issue for organisations and users.
• Assessing what policies on input data propagate to the output of a process is a crucial problem.
2
Policies
3
Foundations
Constraints and permissions set by the data owner regarding the reuse of the data, ie. mostly licences.We can describe licences/policies (RDF+ODRL).
RDF License Databasehttp://datahub.io/dataset/rdflicense
Describes ~140 licenses using RDF and the Open Digital Rights Language (ODRL).Reports 113 policies.
lic:cc-by-nc4.0 a odrl:Policy ; rdfs:label "CC-BY-NC" ; odrl:permission [ odrl:action cc:Distribution, ldr:extraction , ldr:reutilization , cc:DerivativeWorks , cc:Reproduction ; odrl:duty [ odrl:action cc:Attribution , cc:Notice ] ] ; odrl:prohibition [ odrl:action cc:CommercialUse ]
Data flows
5
Foundations
What are the semantic relations between input and output?
Data flows can be thought as graphs of data objects.Datanode is an ontology for data centric description of applications.Built as hierarchy of relations.More then 100 relations so far.
http://purl.org/datanode/ns/http://purl.org/datanode/docs/
Policy Propagation Rule (PPR)
6
Relying on Datanode for an enumeration of the possible relations and on the RDF License Database for the possible policies, we can setup rules like the following:
A Horn clauses of the form:
propagates(odrl:duty cc:Attribution, dn:isCopyOf)
For example:
It can be simplified as:
Foundations
Problem
A description of policies and data flows implies a huge number of Policy Propagation Rules to be specified and computed (number of possible policies times number of possible relations between data objects).
How to abstract this KB to make the management and reasoning on policy propagation rules easier?
7
Contributions
(1) A methodology to obtain an abstraction that allows to reduce the number of rules significantly (using an Ontology). (2) Evaluate how effective this methodology is when using the Datanode ontology.(3) Demonstrate how this ontology can evolve in order to better represent the behaviour of Policy Propagation Rules.
8
(A)AAAA Methodology
9
1.Aquire rules2.Analyse the rules: find clusters using Formal Concept Analysis3.Abstract the rules: match clusters & ontology hierarchy4.Assess the compression, and diagnose errors or refinements in the
rules or the ontology5.Adjust the rules or the ontologyrepeat from 2.
Preparing the Rules Base• This phase required a manual supervision of all associations
between policies and relations in order to establish the initial set of propagation rules.
• We used Contento, it was possible to prepare manually the rule base with a reasonable effort.
• The initial knowledge base was then composed of 3363 Policy Propagation Rules
10
Acquisition
Formal Concept Analysis (FCA)
• Input is a Formal Context (a binary matrix of objects/attributes)• Basic unit is a Close Concept: – (O,A) => (Extension,Intension)– Closure operator ’ … (O,A) is a concept when O’=A and A’=O
• Classifies concepts hierarchically in a concept lattice– Top: all objects, no attr, bottom: all attributes, no obj
11
Analysis
Applying FCA to the Rule Base
80 conceptsClusters of rulesRelations that have a common behaviour: they propagate the same policies.But why do they do it?
12
Analysis
Detect matches with the ontology
13
Abstraction
Search for matches between concepts and branches taken from the ontology hierarchy.When found, subtract the rules from the KB accordingly.
Example/1
14
Abstraction
dn:hasPart and dn:isVocabularyOf are valid abstractions
dn:isPartOf is not, because dn:isSelectionOf apparently does not propagate (all) the policies…
Example/2
15
Abstraction
dn:hasPart (49 rules)
propagates(dn:hasPart,duty cc:ARribu2on) propagates(dn:hasPart,duty cc:Copyle^) propagates(dn:hasPart,duty cc:No2ce) propagates(dn:hasPart,duty cc:SourceCode) propagates(dn:hasPart,duty odrl:aRachPolicy) propagates(dn:hasPart,duty odrl:aRachSource) propagates(dn:hasPart,duty odrl:aRribute)
propagates(dn:hasSec2on,duty cc:ARribu2on) propagates(dn:hasSec2on,duty cc:Copyle^) propagates(dn:hasSec2on,duty cc:No2ce) propagates(dn:hasSec2on,duty cc:SourceCode) propagates(dn:hasSec2on,duty odrl:aRachPolicy) propagates(dn:hasSec2on,duty odrl:aRachSource) propagates(dn:hasSec2on,duty odrl:aRribute) propagates(dn:hasSelec2on,duty cc:ARribu2on) propagates(dn:hasSelec2on,duty cc:Copyle^) propagates(dn:hasSelec2on,duty cc:No2ce) propagates(dn:hasSelec2on,duty cc:SourceCode) propagates(dn:hasSelec2on,duty odrl:aRachPolicy) propagates(dn:hasSelec2on,duty odrl:aRachSource) propagates(dn:hasSelec2on,duty odrl:aRribute) propagates(dn:hasSample,duty cc:ARribu2on) propagates(dn:hasSample,duty cc:Copyle^) propagates(dn:hasSample,duty cc:No2ce) propagates(dn:hasSample,duty cc:SourceCode) propagates(dn:hasSample,duty odrl:aRachPolicy) propagates(dn:hasSample,duty odrl:aRachSource) propagates(dn:hasSample,duty odrl:aRribute) propagates(dn:hasPor2on,duty cc:ARribu2on) propagates(dn:hasPor2on,duty cc:Copyle^) propagates(dn:hasPor2on,duty cc:No2ce) propagates(dn:hasPor2on,duty cc:SourceCode) propagates(dn:hasPor2on,duty odrl:aRachPolicy) propagates(dn:hasPor2on,duty odrl:aRachSource) propagates(dn:hasPor2on,duty odrl:aRribute) propagates(dn:hasIden2fiers,duty cc:ARribu2on) propagates(dn:hasIden2fiers,duty cc:Copyle^) propagates(dn:hasIden2fiers,duty cc:No2ce) propagates(dn:hasIden2fiers,duty cc:SourceCode) propagates(dn:hasIden2fiers,duty odrl:aRachPolicy) propagates(dn:hasIden2fiers,duty odrl:aRachSource) propagates(dn:hasIden2fiers,duty odrl:aRribute) propagates(dn:hasExample,duty cc:ARribu2on) propagates(dn:hasExample,duty cc:Copyle^) propagates(dn:hasExample,duty cc:No2ce) propagates(dn:hasExample,duty cc:SourceCode) propagates(dn:hasExample,duty odrl:aRachPolicy) propagates(dn:hasExample,duty odrl:aRachSource) propagates(dn:hasExample,duty odrl:aRribute)
propagates(dn:isVocabularyOf,duty cc:ARribu2on) propagates(dn:isVocabularyOf,duty cc:Copyle^) propagates(dn:isVocabularyOf,duty cc:No2ce) propagates(dn:isVocabularyOf,duty cc:SourceCode) propagates(dn:isVocabularyOf,duty odrl:aRachPolicy) propagates(dn:isVocabularyOf,duty odrl:aRachSource) propagates(dn:isVocabularyOf,duty odrl:aRribute)
propagates(dn:aRributesOf,duty cc:ARribu2on) propagates(dn:aRributesOf,duty cc:Copyle^) propagates(dn:aRributesOf,duty cc:No2ce) propagates(dn:aRributesOf,duty cc:SourceCode) propagates(dn:aRributesOf,duty odrl:aRachPolicy) propagates(dn:aRributesOf,duty odrl:aRachSource) propagates(dn:aRributesOf,duty odrl:aRribute) propagates(dn:datatypesOf,duty cc:ARribu2on) propagates(dn:datatypesOf,duty cc:Copyle^) propagates(dn:datatypesOf,duty cc:No2ce) propagates(dn:datatypesOf,duty cc:SourceCode) propagates(dn:datatypesOf,duty odrl:aRachPolicy) propagates(dn:datatypesOf,duty odrl:aRachSource) propagates(dn:datatypesOf,duty odrl:aRribute) propagates(dn:descriptorsOf,duty cc:ARribu2on) propagates(dn:descriptorsOf,duty cc:Copyle^) propagates(dn:descriptorsOf,duty cc:No2ce) propagates(dn:descriptorsOf,duty cc:SourceCode) propagates(dn:descriptorsOf,duty odrl:aRachPolicy) propagates(dn:descriptorsOf,duty odrl:aRachSource) propagates(dn:descriptorsOf,duty odrl:aRribute) propagates(dn:typesOf,duty cc:ARribu2on) propagates(dn:typesOf,duty cc:Copyle^) propagates(dn:typesOf,duty cc:No2ce) propagates(dn:typesOf,duty cc:SourceCode) propagates(dn:typesOf,duty odrl:aRachPolicy) propagates(dn:typesOf,duty odrl:aRachSource) propagates(dn:typesOf,duty odrl:aRribute) propagates(dn:rela2onsOf,duty cc:ARribu2on) propagates(dn:rela2onsOf,duty cc:Copyle^) propagates(dn:rela2onsOf,duty cc:No2ce) propagates(dn:rela2onsOf,duty cc:SourceCode) propagates(dn:rela2onsOf,duty odrl:aRachPolicy) propagates(dn:rela2onsOf,duty odrl:aRachSource) propagates(dn:rela2onsOf,duty odrl:aRribute)
dn:isVocabularyOf (42 rules)
7 rules! 7 rules!
Compression Factor
16
Assessment
By applying the original Datanode ontology, 1925 rules could be removed out of 3363, for a compression factor of 0.572.
We calculate the CF as the number of abstracted rules divided by the total number of rules:
Considerations
17
Assessment
2. The Datanode ontology has not been designed for the purpose of representing a common behaviour of relations in terms of propagation of policies. It is possible to refine the ontology in order to make it cover this use case better (and possibly reduce the number of rules even more).
1. The size of the matrix that was manually supervised is large, and it is possible that errors have been made at that stage of the process.
Observing the measures
18
Assessment
concepts * relations=~9040 (partial) matches. Hard to explore!Inspecting a partial match with high precision and low recall highlights a problem that might be easy to fix, as the number of relations and policies to compare will be low.
Operations
19
Adjustment
We try to make adjustments in order to improve the compression factor. We defined a set of operations, targeted to (a) fix errors in the initial rule base and (b) refine the ontology.
• Fill: makes a branch be fully in a cluster of concept c, attempting to push Pre up to 1.
• Group: some relations that share a concept, but belong to different branches, are abstracted by a new relation.
• Merge: two distinct branches are abstracted by a new relation.• Wedge: change the top relation of a branch to make it fully match
the concept.
(and we run our process again from the Analysis phase to the Assessment)
Example: Fill
20
Adjustment
… dn:isSelectionOf should indeed propagate all the policies listed in this concept. Fill adds the following rules:
propagates(dn:isSelectionOf, duty cc:Attribution)propagates(dn:isSelectionOf, duty cc:Copyleft)propagates(dn:isSelectionOf, duty cc:Notice)propagates(dn:isSelectionOf, duty cc:SourceCode)propagates(dn:isSelectionOf, duty odrl:attachPolicy)propagates(dn:isSelectionOf, duty odrl:attachSource)propagates(dn:isSelectionOf, duty odrl:attribute)
Example: Wedge
21
Adjustment
… dn:sameCapabilityAs does *not* propagate all the policies listed in this concept. We Wedge dn:sameIdentityAs
Evaluation
22
As a result we obtained: 3865 rules in total, 78 concepts, 2817 rules abstracted and 1048 rules remaining - for a compression factor of 0.729.
Thanks to this methodology we have been able to fix many errors in the initial data, and to refine Datanode by clarifying the semantics of many properties and adding new ones.
Conclusions
23
• We presented a method to abstract Policy Propagation Rules applying an ontology, the Datanode ontology, a hierarchical organisation of the possible relations between data objects.
• Datanode allowed us to reduce the number of rules to a factor of 0.5. • Applying the ontology and the method we were able to find and correct
errors in the rules. • Moreover we have been able to analyse the ontology in relation to this
task and enhance it having as result not only a further reduction of rules - 0.7, but also a better ontology.
Future work
• Apply the rule base to specific use cases (perform a practical evaluation), and publish it for reuse.
• Extend the Assessment phase of the methodology to also include consistency check between the hierarchy of the FCA lattice and the ontology.
• Define additional operations to support the Adjustment phase. • Study the rule base evolution via continuous integration of new
policies/actions/rules.• Apply the methodology to other use cases. Should we integrate the
measures and operations of the methodology in the Contento tool?
24
76Bottom-Up Ontology Construction with CONTENTO
http://bit.ly/contento-tool
Contento supports the user in the generation and curation of concept lattices to produce Semantic Web Ontologies from data. In the demo, we show how to use CONTENTO with Linked Data.
Formal Context
Concept Lattice
Modeling (Naming &
Pruning)SPARQL
Export as OWL
Ontology
(adver2sement)
@ISWC 2015 Poster and Demo session, next week…
Operations: Fill
27
Annex
Observation: the branch is close to be fully in the concept (high Pre)Diagnosis: the branch must be fully in the concept…Effect: The Fill operation makes a branch be fully in a cluster of concept c, attempting to push Pre up to 1.
Operations: Merge
28
Annex
Observation: two distinct branches match the same conceptDiagnosis: the respective top relations can be abstracted by a new common relation Effect: a new relation is added to the ontology
Operations: Group
29
Annex
Observation: A set of relations are all together in the extent of a concept, but belong to different branches. Diagnosis: they are actually related, and a common parent is missing.Effect: a new relation is added to the ontology.