Linked in 4eme table ronde 20120601
-
Upload
dario-mangano -
Category
Business
-
view
848 -
download
3
Transcript of Linked in 4eme table ronde 20120601
Réseau des Professionnels de la Business Intelligence en Suisse Romande
4ème Table Ronde
Lausanne, le 1er juin 2012
Dario ManganoHead Of Knowledge ManagementNestlé Nespresso S.A. HQ
AGENDA
14h00 Welcome 14h30 Le groupe LinkedIn 14h45 Les métadonnées de chargement 15h30 Coffee Break 15h45 Le chargement par fuseaux
horaires 16h30 Futures tables rondes 16h45 Coffee 17h30 Fin
Lausanne, le 1er juin 2012
Le groupe LinkedIn
AGENDA
Dario ManganoHead Of Knowledge ManagementNestlé Nespresso S.A.
Le groupe LinkedIn
Le groupe LinkedIn
Le groupe LinkedIn
Le groupe LinkedIn
Le groupe LinkedIn
Le groupe LinkedIn
Le groupe LinkedIn
Le groupe LinkedIn
La gestion des métadonnées de
chargement
AGENDA
Anthony BrouardBusiness Intelligence ExpertCapital International
Métadonnées de chargement
Métadonnées de chargement
ANTHONY’s Slides
Data Management Services
Data Integration FrameworkOverview
17
Data Integration Best Practices oriented
Data integration is a family of techniques, most commonly including ETL (extract, transform, and load), but also lots of related techniques that are inevitable when dealing with Data Integration: Metadata, Change Data Capture, File loading, Publication, Data quality,... moreover it is always involving different technologies: DB server, DB scripting, Shell scripting, …
All these techniques and technologies require development and support for a wide range of interfaces using solution that can be hand-coded, based on vendor’s tool , or mix of both.
With such complexity in Data Integration systems, to develop and support these solutions is becoming very challenging.
Having Best Practices and standards will ensure that all the systems are developed in a way that it is much easier to support and also much safer and scalable to afford future needs and data volume.
The Data Integration Framework is a metadata driven development environment that is providing turn key solutions for all these tasks around Data Integration:
- Metadata- Change Data Capture- File loading- Data quality- Publication- Archiving,…
It ensures that all those tasks are performed in a good, efficient and standard way, and so it keeps the development Team focused on the real value added of the data integration system: making all the data coming from different source system available for business users, and applying required business rules.
18
Metadata Management oriented
Metadata is a key feature in Data Integration and Data Warehousing.
This is the only way to get answers to the following questions:- Which column did this data come from?- When was this data populated in the system?- How is calculated this result on my report?- Is my report up to date?- Is my system scalable?
Having these answers will just increase the trust in the data, enable a pro active monitoring of Data Integration processes, ensure that the data are loaded in a effective manner and at the end prevent our system to lose value over the time by decreasing and absorbing the costs of understanding, maintenance and repair.
The Data Integration Framework is providing a metadata management solution without any development effort required from the project team:
Collecting operational metadata in real time
Capturing business and technical metadata related to data integration processes
Integrating all these metadata in a metadata repository
Proposing report to access these integrated metadata with user friendly navigation capabilities (drill down, drill through, direct access to log files from monitoring reports, impact analysis,…)
19
Data Integration Framework (DIF): Data Mngmt Framework
Source systems
Downstream systems
Support Teams
File
load
ing
Notification
Chan
ge D
ata
Capt
ure
Arch
iving
Data
pub
licat
ion
Business Rules
Development Teams
develop
monitor
DIF components
(see slide notes for comments)
File
load
ing
Notification
Chan
ge D
ata
Capt
ure
Arch
iving
Data
pub
licat
ion
Metadata
Monitoring
developUse
Business Rules
Graphical User Interface
20
Designs methods and tools to perform data integration services
Reference Reference ArchitectureArchitecture
Event-Triggered ETL
Batch ETL
ETL Development Methodology
MethodMethodss
DIF DIF ComponenComponentstsWrapper
File Loader
Parsing, Matching & Merging, Consolidation…
Change Data Capture
Reject Management
Standard Integration Methods by Subject Area
Pub/Sub Event Bulk Pub/Sub Pattern Publisher Module
Metadata Management
(Operational & Technical)
Op. & Technical Metadata Collection Standards
Metadata data model
Metadata collection daemon
Quality Auditing Data Quality Control Methods DQ module
21
DIF: Back end modular architecture
MetadataRepositorySheduler Wrapper module
DIF Minimal installation
FileLoader module
Notification module
Purge module
Archiver module
Publisher module
Project specific code to apply business rules and requirements (Powercenter, Shell script, Sql script, Store procedures,…)
DIF available Modules/Services/Re-usable components
HP OV Metrics collection services
Powercenter Metrics collection services
Re-usable inlcudes (logging routines, mail sending routines,…)
Data Quality module
22
Potential for Global Monitoring
Application environments
APP1
APP2
APP3
Shared Metadata Repository
Metadata Repository
- Autosys wrapper- File loader- CDC- Rejects recycling- Archiver- Publisher- ...
Project Team
Support Team
Middleware Team
External system
Application process
PowerCenter SchedulerUnix
serversOracle Dbs
HP Openview
use
Reports
PublicationExtract
Data access layer
Reusable components
Engines
Web
Ser
ver
Oracle Dbs
Retrieve key metadata from infrastructure and middleware components
23
Reporting services (Cognos/BO reports) – Ex1Using the reporting layer we can have access to the integrated metadata repository for all kind of report or ad hoc query:
- monitoring report- capacity planning- impact analysis
Example of monitoring report with embedded navigation capabilities:Dril down
button
Open log file for more details
24
Reporting services (Cognos/BO reports) - Ex2The value added of having integrated metadata, is to have report showing correlated metadata on the same view.
For example this Gantt view execution report, will show if there is a correlation between a given interface execution and server workload.This is very useful to understand performance issue, but also for capacity planning purposes….
Drill through this interface details report
Drill down to interface step Gantt view for this
interface
25
Reporting services (Cognos/BO reports) - Ex3Another example of the details we can get from the reports.
Using publication module, the metadata will tell you what are the XML files that were produced, how many rows were extracted from the database,..
And also to which downstream applications the package was pushed to:
26
Appendix
27
Staging Layer
Integration Layer DWH
Data Publication Layer
Flat File
DB Table(s)
Source system
Publication Layer
Data Integration Application Architecture
ArchivingException
MgmtRejection recycle
AuditingNotificationData
MovementWorkflow
ETL
FileLoader
Script
Oracle store proc
Sql script
ETL
Script
Oracle store proc
Sql script
ETL
Script
Oracle store proc
Sql script
ETL
Publisher
Script
Oracle store proc
Sql script
Data Integration Framework Services (modules, monitoring services)
Operational Metadata Repository
MetricsMetadata Exception Logs
Data Quality PublicationChange data capture
Reporting services
Cognos
Level 2 SupprtDEV Team (L3 support)Business Users
Business Objects
configuration Metadata
Step(s) Step(s) Step(s) Step(s)
Task(s)
Task(s)
Task(s)
Task(s)
Task(s)
Task(s)
Task(s)
Task(s)
Interface
Data Flow
Scheduler
28
Reports &Dashboards
1st Level Support Application/service level view enables Service Desk to rapidly intercept fatal alerts & communicate service outages to affected users
2nd Level Support
Identify root cause of issue & take effective action. Data is available for analysis to anticipate issues and bottlenecks.
3rd Level Support
Perform complex analysis, troubleshooting, storage capacity planning, improve efficiency (identify weak points, alarming trends).
Monitoring Implementation
PowerCenter
DB Servers
Application & Web Servers
Scheduler DIF Modules and Services
MetadataRepository
Service-Level View of the Data Integration Application
Le chargement par fuseaux horaires
AGENDA
Dario ManganoHead Of Knowledge ManagementNestlé Nespresso S.A.
Cedric ZbindenBI architectNestlé Nespresso S.A.
Le charement par fuseaux horaires
Question: Comment gérer la cohérence et l’intégrité des données du DWH lorsque les
données sont chargées par zones géographiques et par fuseaux
horaires ?
Le charement par fuseaux horaires
Le charement par fuseaux horaires
Le charement par fuseaux horaires
Le charement par fuseaux horaires
Le charement par fuseaux horaires
Le charement par fuseaux horaires
Le charement par fuseaux horaires
Le charement par fuseaux horaires
Le charement par fuseaux horaires
Le charement par fuseaux horaires
Débat – propositions ?
Le charement par fuseaux horaires
Résumé des discussions:-Le HQ et les marchés n’ont pas les mêmes besoins en terme de rafraicissmeent de données revoir si le HQ peut se satisfaire de J-2 ?-Utiliser le chargement de schémas différents afin de ne pas requêter le schémas qui est en train d’être lo9ader, puis faire un drop partition à la fin ?-Utiliser les master cubes Cognos
Futures tables rondes
AGENDA
Futures Tables Rondes
- Sujets- Lieux- Communication
Futures Tables RondesPropositions:-Big Data-Démonstration d’un POC ClickView (groupe Mutuelle)-Exemple de governance permettant de mieux cadrer les demandes des businesses (Business Case, Demand management committee, etc.)-In Memory appliances-BI Mobile-Column based DB / NoSQL-Data Virtualization-BI SaaS
MERCICafé !