Introduction to ETL - Pentaho

Post on 05-Jul-2015

2.155 views 10 download

description

สมาคมศูนบ์วิชาการไทย-ออสเตรเลีย การอบรม รุ่นที่ 1

Transcript of Introduction to ETL - Pentaho

ETL

Pentaho Data Integration

ruangsak@swu.ac.th

ETL

stands for extract, transform and load

a process in database usage and especially

in data warehousing that involves:

Extracting data from outside sources

Transforming it to fit operational needs (which can include quality levels)

Loading it into the end target (database or data warehouse)

Example

Input

Transform

Load

Hands-on Experience

DEMO

Data Integrationrepository

ETL

course2550

input

course2550

course2550

Create

import

Browse

02course2550_forETL.sql

Data Integration

Programs > Pentaho Enterprise Edition

> Design Tools > Data Integration

Repository Connection

Repository

Login

File > New > Transformation

?

?

File > New > Database Connection

ETL

Input > Table input

Double Click Table input

Table input > at_course_enrollment

Get SQL select statement…

at_course_enrollment

ORDER BY course_id

Statistics > Group by

Shift

Double Click Group by

Get Fields

Get lookup fields

Order By ?

Group by ?

Aggregates /

Transform > Select values

Double Click Select values

Get fields to select

Rename to

member_id num_enroll

Selected Value ?

Field

Input > Table input

Double Click Table input

Table input > at_courses

Get SQL select statement…

at_courses

ORDER BY course_id

Joins > Merge Join

Double Click Merge Join

Get key fields Keys

Order By ?

Transform > Select values

Double Click Select values 2

Remove

Fieldname course_id_1

Selected Value 2 ?

Field

Output > Table output

?

o_course

course2550

Double Click Table output

Browse Target Table

o_course

at_courses ( )

DDL > Use Current Connection

SQL

Execute

Refresh

o_course

Truncate table

OK

Action > Run

Launch

Save

MySQL

o_course

feature

Excel Output

Google Docs