Managing etl with microsoft sql server integration services

39
Why ColdFusion isn’t the best tool in your toolbelt

description

 

Transcript of Managing etl with microsoft sql server integration services

Page 1: Managing etl with microsoft sql server integration services

Why ColdFusion isn’t the best tool in your toolbelt

Page 2: Managing etl with microsoft sql server integration services

• Manager of Data and Development for Data Diver Technologies in Chandler, AZ

• Managed the Phoenix CFUG

• I’ve worked with ColdFusion and SQL Server since the late 90’s

• Hates CCS front end work and CSS hates me

• How to find me on the Interwebs:• Email: [email protected] Twitter: @cf_coder

Page 3: Managing etl with microsoft sql server integration services
Page 4: Managing etl with microsoft sql server integration services
Page 5: Managing etl with microsoft sql server integration services

• Extract, Transform and Load

• Can be data from many types of sources

• Flat Text files, Excel, XML, Other Databases

• Destinations can also vary

• Cubes, Flat Text Files, Excel, XML, Other Databases

• Basic Process• Get Data, Mess with it and Stuff it somewhere else

Page 6: Managing etl with microsoft sql server integration services

• Familiarity with platform is tempting

• Inherently Slow

• Reads and Transforms data Line by line

• “Line by line” kills performance

• Doesn’t take advantage of processes inherent in the DB platform for bulk data processing

Page 7: Managing etl with microsoft sql server integration services
Page 8: Managing etl with microsoft sql server integration services

• Native database functionality for bulk data

• BCP : Bulk Copy Process

• Clunky

• Requires Command Text files

• Run from command line or Powershell

• Import Data Wizard (SQL Server)• One off Imports, no transformations

Page 9: Managing etl with microsoft sql server integration services

• SQL Server Integration Services (SSIS)

• Runs in BIDS

• Business Intelligence Development Studio

• Multiple inputs and outputs

• Workflow and process based

• Scriptable and dynamic inputs

Page 10: Managing etl with microsoft sql server integration services

• Starting up the environment

• Project Based

• Can open just files, but projects allow for more options.

• Integration with source control

• SVN (Tortise and VisualSVN)/GIT/Others

Page 11: Managing etl with microsoft sql server integration services

Project Deployment Models

•Package Deployment Model

• Stores packages in a disk based folder structure

•Project Deployment Model

• Deploys packages to a special database

•Either model can be run outside of BIDS.

Page 12: Managing etl with microsoft sql server integration services
Page 13: Managing etl with microsoft sql server integration services

• Connection Managers

• These will set up the input/output files and database connections.

• Can be set at project or package level.

• Create DB Connection

• Create Connections to any Files

• Can create new files for output or input

Page 14: Managing etl with microsoft sql server integration services
Page 15: Managing etl with microsoft sql server integration services
Page 16: Managing etl with microsoft sql server integration services

• Workflow using Pre-Built Components• Lots of options• Get Files, run commands, loop over things• Primarily Execute SQL and Data Flow Tasks

• Connectors• Go/No Go, evaluation, data viewers• Red connectors re-direct on error.

Page 17: Managing etl with microsoft sql server integration services

• Variables• Create at the package level• Set at the component level

• Uses most Basic Types of variable classes

• Ints, Strings, Booleans, Dates and Objects

• Objects are typically Recordsets but can also be Arrays and COM objects.

Page 18: Managing etl with microsoft sql server integration services

• File Connections

• Multi-File pulls from Directories

• QUERY Result sets

• Web Services

• OLEDB/ODBC Connections

Getting Data Into Your Package

Page 19: Managing etl with microsoft sql server integration services

• FTP Component

• Built in and very simple

• Does not do advanced FTP functions

• HTTP File Requests

• No built in component for this

• Use Script or Command Line (wget)

Page 20: Managing etl with microsoft sql server integration services

4/10/14

• Data Flow Task

• Heavy lifting of getting data from A to B

• Utilizes the pre-defined connections

• Can create them on the fly if necessary

• Can transform data in the pipeline en-route to the destination tables/files

Page 21: Managing etl with microsoft sql server integration services

4/10/14

• Change the data in the pipeline

• Can be helpful if you’re not a SQL person

• Editors make choosing transformations easy

• Performance can be impacted

• Anything that impedes the pipeline slows the data transfer process down.

• Becomes a larger issue with larger datasets

Page 22: Managing etl with microsoft sql server integration services
Page 23: Managing etl with microsoft sql server integration services
Page 24: Managing etl with microsoft sql server integration services

4/10/14

• Component Level Errors

• Can Redirect Flow to another path

• Allows for cleanup of temp tables/files, send Email to administrator

• Data Flow Errors

• Redirection of Bad Rows to an Error File

• Can trap General or Truncation Errors.

Page 25: Managing etl with microsoft sql server integration services
Page 26: Managing etl with microsoft sql server integration services

• For and For Each Containers

• For Containers are basically count loopers

• For Each Containers

• Loops over files, arrays, and objects

• Must be used with Variables

• Great way to deal with sequential files

Page 27: Managing etl with microsoft sql server integration services
Page 28: Managing etl with microsoft sql server integration services

• The Marketing Database of Choice!

• Not terribly difficult to work with

• Data is in UTF-8 format, Nvarchar → Varchar

• Need special connection parameters to skip top level rows

• Make Sure project Debug Options set to 32-bit

Page 29: Managing etl with microsoft sql server integration services

• Report Format (multi-row records)

• Most difficult format to work with

• Conditional Split Transformer

• Built in and works with the pipeline

• Cursor and String Parsing

• Allows for keeping rows grouped with IDs

Page 30: Managing etl with microsoft sql server integration services

• XML

• You must have a DTD

• SSIS will create one for you

• Data is in UTF-8 format

• Either Transform to VARCHAR or use NVARCHAR for result

Page 31: Managing etl with microsoft sql server integration services
Page 32: Managing etl with microsoft sql server integration services

• Cannot call packages directly from ColdFusion• Run manually through BIDS (easiest)• SQL Agent - Scheduler for SQL Server• Options for the Agent

• Run the Agent on a schedule and look for work • Set up the Agent task and then execute via TSQL

Page 33: Managing etl with microsoft sql server integration services

Executing a SQL Agent Job from TSQLYou need to be in MSDB to run the Agent.Access to MSDB limited to SysAdmin roles.USE msdb ;GOEXEC dbo.sp_start_job N'AgentTaskName' ;GO

Page 34: Managing etl with microsoft sql server integration services
Page 35: Managing etl with microsoft sql server integration services

• Data Validation – Delay Validation “YES”• SSIS will check for existence of Tables/Files.• Create temp tables beforehand or turn off.

• Excel Files• Must use 32-bit debug execution mode.

• Data Type conversions• Try importing to VARCHARs and convert

afterwards if needed.

Page 36: Managing etl with microsoft sql server integration services

• Learn SQL String Manipulation!• This is the toughest aspect of dealing with

text files and textual data.• Very limited number of functions to use• Left(), Right, Substring(), Len()• CharIndex() and PatIndex()• String Length Math

Page 37: Managing etl with microsoft sql server integration services

• Temp Tables• Where is TempDB?• #Table vs. ##Table vs. Temp_Table• Usage as part of aggregate queries

• Make sure you have enough space for TempDB operations.

• When dealing with large data sets, TempDB can get very large very fast.

Page 38: Managing etl with microsoft sql server integration services
Page 39: Managing etl with microsoft sql server integration services