Managing etl with microsoft sql server integration services

Post on 03-Sep-2014

357 views 2 download

Tags:

description

 

Transcript of Managing etl with microsoft sql server integration services

Why ColdFusion isn’t the best tool in your toolbelt

• Manager of Data and Development for Data Diver Technologies in Chandler, AZ

• Managed the Phoenix CFUG

• I’ve worked with ColdFusion and SQL Server since the late 90’s

• Hates CCS front end work and CSS hates me

• How to find me on the Interwebs:• Email: jeff@garzasixpack.com Twitter: @cf_coder

• Extract, Transform and Load

• Can be data from many types of sources

• Flat Text files, Excel, XML, Other Databases

• Destinations can also vary

• Cubes, Flat Text Files, Excel, XML, Other Databases

• Basic Process• Get Data, Mess with it and Stuff it somewhere else

• Familiarity with platform is tempting

• Inherently Slow

• Reads and Transforms data Line by line

• “Line by line” kills performance

• Doesn’t take advantage of processes inherent in the DB platform for bulk data processing

• Native database functionality for bulk data

• BCP : Bulk Copy Process

• Clunky

• Requires Command Text files

• Run from command line or Powershell

• Import Data Wizard (SQL Server)• One off Imports, no transformations

• SQL Server Integration Services (SSIS)

• Runs in BIDS

• Business Intelligence Development Studio

• Multiple inputs and outputs

• Workflow and process based

• Scriptable and dynamic inputs

• Starting up the environment

• Project Based

• Can open just files, but projects allow for more options.

• Integration with source control

• SVN (Tortise and VisualSVN)/GIT/Others

Project Deployment Models

•Package Deployment Model

• Stores packages in a disk based folder structure

•Project Deployment Model

• Deploys packages to a special database

•Either model can be run outside of BIDS.

• Connection Managers

• These will set up the input/output files and database connections.

• Can be set at project or package level.

• Create DB Connection

• Create Connections to any Files

• Can create new files for output or input

• Workflow using Pre-Built Components• Lots of options• Get Files, run commands, loop over things• Primarily Execute SQL and Data Flow Tasks

• Connectors• Go/No Go, evaluation, data viewers• Red connectors re-direct on error.

• Variables• Create at the package level• Set at the component level

• Uses most Basic Types of variable classes

• Ints, Strings, Booleans, Dates and Objects

• Objects are typically Recordsets but can also be Arrays and COM objects.

• File Connections

• Multi-File pulls from Directories

• QUERY Result sets

• Web Services

• OLEDB/ODBC Connections

Getting Data Into Your Package

• FTP Component

• Built in and very simple

• Does not do advanced FTP functions

• HTTP File Requests

• No built in component for this

• Use Script or Command Line (wget)

4/10/14

• Data Flow Task

• Heavy lifting of getting data from A to B

• Utilizes the pre-defined connections

• Can create them on the fly if necessary

• Can transform data in the pipeline en-route to the destination tables/files

4/10/14

• Change the data in the pipeline

• Can be helpful if you’re not a SQL person

• Editors make choosing transformations easy

• Performance can be impacted

• Anything that impedes the pipeline slows the data transfer process down.

• Becomes a larger issue with larger datasets

4/10/14

• Component Level Errors

• Can Redirect Flow to another path

• Allows for cleanup of temp tables/files, send Email to administrator

• Data Flow Errors

• Redirection of Bad Rows to an Error File

• Can trap General or Truncation Errors.

• For and For Each Containers

• For Containers are basically count loopers

• For Each Containers

• Loops over files, arrays, and objects

• Must be used with Variables

• Great way to deal with sequential files

• The Marketing Database of Choice!

• Not terribly difficult to work with

• Data is in UTF-8 format, Nvarchar → Varchar

• Need special connection parameters to skip top level rows

• Make Sure project Debug Options set to 32-bit

• Report Format (multi-row records)

• Most difficult format to work with

• Conditional Split Transformer

• Built in and works with the pipeline

• Cursor and String Parsing

• Allows for keeping rows grouped with IDs

• XML

• You must have a DTD

• SSIS will create one for you

• Data is in UTF-8 format

• Either Transform to VARCHAR or use NVARCHAR for result

• Cannot call packages directly from ColdFusion• Run manually through BIDS (easiest)• SQL Agent - Scheduler for SQL Server• Options for the Agent

• Run the Agent on a schedule and look for work • Set up the Agent task and then execute via TSQL

Executing a SQL Agent Job from TSQLYou need to be in MSDB to run the Agent.Access to MSDB limited to SysAdmin roles.USE msdb ;GOEXEC dbo.sp_start_job N'AgentTaskName' ;GO

• Data Validation – Delay Validation “YES”• SSIS will check for existence of Tables/Files.• Create temp tables beforehand or turn off.

• Excel Files• Must use 32-bit debug execution mode.

• Data Type conversions• Try importing to VARCHARs and convert

afterwards if needed.

• Learn SQL String Manipulation!• This is the toughest aspect of dealing with

text files and textual data.• Very limited number of functions to use• Left(), Right, Substring(), Len()• CharIndex() and PatIndex()• String Length Math

• Temp Tables• Where is TempDB?• #Table vs. ##Table vs. Temp_Table• Usage as part of aggregate queries

• Make sure you have enough space for TempDB operations.

• When dealing with large data sets, TempDB can get very large very fast.