Pal gov.tutorial2.session12 2.architectural solutions for the integration issues
-
Upload
mustafa-jarrar -
Category
Education
-
view
791 -
download
0
Transcript of Pal gov.tutorial2.session12 2.architectural solutions for the integration issues
1PalGov © 2011
أكاديمية الحكومة اإللكترونية الفلسطينيةThe Palestinian eGovernment Academy
www.egovacademy.ps
Tutorial II: Data Integration and Open Information Systems
Session 12.2
Architectural Solutions for the Integration Issues
Dr. Mustafa Jarrar
University of Birzeit
www.jarrar.info
2PalGov © 2011
About
This tutorial is part of the PalGov project, funded by the TEMPUS IV program of the
Commission of the European Communities, grant agreement 511159-TEMPUS-1-
2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.ps
University of Trento, Italy
University of Namur, Belgium
Vrije Universiteit Brussel, Belgium
TrueTrust, UK
Birzeit University, Palestine
(Coordinator )
Palestine Polytechnic University, Palestine
Palestine Technical University, PalestineUniversité de Savoie, France
Ministry of Local Government, Palestine
Ministry of Telecom and IT, Palestine
Ministry of Interior, Palestine
Project Consortium:
Coordinator:
Dr. Mustafa Jarrar
Birzeit University, P.O.Box 14- Birzeit, Palestine
Telfax:+972 2 2982935 [email protected]
3PalGov © 2011
© Copyright Notes
Everyone is encouraged to use this material, or part of it, but should
properly cite the project (logo and website), and the author of that part.
No part of this tutorial may be reproduced or modified in any form or by
any means, without prior written permission from the project, who have
the full copyrights on the material.
Attribution-NonCommercial-ShareAlike
CC-BY-NC-SA
This license lets others remix, tweak, and build upon your work non-
commercially, as long as they credit you and license their new creations
under the identical terms.
PalGov © 2011 4
Tutorial Map
Topic h
Session 1: XML Basics and Namespaces 3
Session 2: XML DTD’s 3
Session 3: XML Schemas 3
Session 4: Lab-XML Schemas 3
Session 5: RDF and RDFs 3
Session 6: Lab-RDF and RDFs 3
Session 7: OWL (Ontology Web Language) 3
Session 8: Lab-OWL 3
Session 9: Lab-RDF Stores -Challenges and Solutions 3
Session 10: Lab-SPARQL 3
Session 11: Lab-Oracle Semantic Technology 3
Session 12_1: The problem of Data Integration 1.5
Session 12_2: Architectural Solutions for the Integration Issues 1.5
Session 13_1: Data Schema Integration 1
Session 13_2: GAV and LAV Integration 1
Session 13_3: Data Integration and Fusion using RDF 1
Session 14: Lab-Data Integration and Fusion using RDF 3
Session 15_1: Data Web and Linked Data 1.5
Session 15_2: RDFa 1.5
Session 16: Lab-RDFa 3
Intended Learning Objectives
A: Knowledge and Understanding
2a1: Describe tree and graph data models.
2a2: Understand the notation of XML, RDF, RDFS, and OWL.
2a3: Demonstrate knowledge about querying techniques for data
models as SPARQL and XPath.
2a4: Explain the concepts of identity management and Linked data.
2a5: Demonstrate knowledge about Integration &fusion of
heterogeneous data.
B: Intellectual Skills
2b1: Represent data using tree and graph data models (XML &
RDF).
2b2: Describe data semantics using RDFS and OWL.
2b3: Manage and query data represented in RDF, XML, OWL.
2b4: Integrate and fuse heterogeneous data.
C: Professional and Practical Skills
2c1: Using Oracle Semantic Technology and/or Virtuoso to store
and query RDF stores.
D: General and Transferable Skills2d1: Working with team.
2d2: Presenting and defending ideas.
2d3: Use of creativity and innovation in problem solving.
2d4: Develop communication skills and logical reasoning abilities.
5PalGov © 2011
Module ILOs
After completing this module students will be able to:
- Explain different architectural solutions to the problem of data
integration.
6PalGov © 2011
Architectural Solutions for the Integration
Issues
• Two families of solutions for the integration issue:
– Application-driven Integration
• Various types of middleware (e.g. Web Services, Remote
Procedure Call (RPC), Publish & Subscribe) that achieve
reconciliation through application to middleware communication
– Data-driven Integration
• Various types of data reconciliation and integration
– Consolidation
– Data Warehouse
– Data Integration
7PalGov © 2011
Architectures of application-driven
Integration
e.g., Service Oriented Architecture
. . . . . .MSG-1
ASSS
ASSS
ASSS
ASSS
ASSS
ASSS. . .
LegendSS = Security ServerAS = Adapter ServerMSG = Data Message
MSG-N
8PalGov © 2011
Source 1Source 2
Source nApplication 1 Application 2 Application n
Middleware
1
2
347
5
6
Update of an object O
PublishesSubscribes
Architectures of application-driven
Integration
e.g., Publish-Subscribe ArchitectureTypical application-driven integration architecture for integration of updates.
Source: Carlo Batini
9PalGov © 2011
Information Integration Architectures
Source 1
Source 2
Source n
…..
Source 2
Source 1
Source n
Unique DB
New architecture
once for all
Consolidation
Source: Carlo Batini
10PalGov © 2011
Information Integration Architectures
Source 1
Source 2
Source n
…..
Unique DB
New architecture:
periodically updated
Data Warehouse
middleware
New data base
Data Warehouse
Source: Carlo Batini
11PalGov © 2011
Information Integration Architectures
Virtual Data Integration
Source 1
Source 2
Source n
…..
Mediator
Local
schema
Local
schema
Local
schema
Local
schemaLocal
schemaLocal
schema
Global
schema
New architectureNo new data base!
Source: Carlo Batini
12PalGov © 2011
The integration problem…
Source 2
Source 1Registry
of clients 1
Source 3
Source 4
Source n
…..
Which kind of
integration?
New
architecture
Registry
of clients 2
Retail
sales
On line
sales
Other
How to decide?
Source: Carlo Batini
Additional Reading
13PalGov © 2011
Criteria to be adopted
• autonomy, the degree of independence between the different data
base administrators in their design choices;
• relevance of historical data, and consequent need to periodically store
new data without deleting the old ones;
• query complexity, in terms of amount of data and tables visited and
number of operators on them, and consequent time complexity in
query execution;
• relevance of currency in queries, the need for queries to extract current
data;
• economic value of integration, the relevance of having integrated
information in input for business operational and decisional processes
in order to produce effective outputs;
Source: Carlo Batini
Additional Reading
14PalGov © 2011
Criteria to be adopted
• volatility of sources, frequency of adding or deleting sources, and
frequency of change of source schemas;
• relevance of queries w.r.t transactions, relative importance and
frequency of queries with respect to changes in data;
• management complexity, the effort to be spent in management
activities related to databases and hw-sw infrastructures, due to the
corresponding complexity of the organizations using the data bases;
• costs of heterogeneity, hidden and explicit costs related to business
processes that are due to making use of heterogeneous data.
Source: Carlo Batini
Additional Reading
15PalGov © 2011
References
• Carlo Batini: Course on Data Integration. BZU IT Summer School
2011.
• Stefano Spaccapietra: Information Integration. Presentation at the IFIP
Academy. Porto Alegre. 2005.
• Chris Bizer: The Emerging Web of Linked Data. Presentation at SRI
International, Artificial Intelligence Center. Menlo Park, USA. 2009.