Informatica DW Consolidated

download Informatica DW Consolidated

of 40

Transcript of Informatica DW Consolidated

  • 7/28/2019 Informatica DW Consolidated

    1/40

    session properties -> general tab -> treat input link as 'AND'/'OR'session partition with aggregator transformationpreview data********************************************************High availability. You can use the high availability option to eliminate singlepoints offailure in the PowerCenter environment and reduce service interruptions in the e

    vent offailure. High availability provides resilience, failover, and recovery for services.

    infacmdinfasetuppmrep

    Versioned Objects

    Workflow recovery

    Custom transformation. The Custom transformation has the following enhancements:Procedures with thread-specific operations.Java transformation.

    ***********************************************************Error codesBR - error related reader process, including ERP, flat file, relation sourcesCMN - error related databases, memory allocation, lookup, joiner and internal errorsDBGR - error related to debuggerSE, TM, WID - errors related to transformationsPMF - error related caching in aggregator, lookup, joiner, rank

    RR - error related to relational sourcesEP - error related to external procedureLM - error related Load ManagerREP - error related to repository functionsVAR - error related to mapping variableWRT - error related to writer

    ***********************************************************Netezza and oracle - rownum and limit

    ************************************************************Aggregator - Active & Connectedsource qualifier - Active & ConnectedFilter - Active & ConnectedExpression - Passive & Connectedjoiner - Active & Connectedlookup - passive & connected/unconnectedHTTP - passive & connectednormalizer - active & connectedrank - active & connectedrouter - active & connectedsequence - passive & connectedsorter - active & connected

    stored procedure - passive & connected/unconnectedunion - active & connected

  • 7/28/2019 Informatica DW Consolidated

    2/40

    *************************************************************************8A transformation is a repository object that generates, modifies, or passes data. The Designer provides a set of transformations that perform specific functions. For example, an Aggregator transformation performs calculations on groups ofdata.

    Lookup transformation:be sure to delete the unwanted columns from the lookup as they affect the loo

    kup cache very much.if the Lookup transformation is after the source qualifier and there is no ac

    tive transformation in-between, you can as well go for the SQL over ride of source qualifier

    The cache that you assigned for the lookup is not sufficient to hold the dataor index of the lookup. Whatever data that doesn't fit into the cache is spiltinto the cache files designated in $PMCacheDir. When the PowerCenter doesn't find the data you are lookingup in the cache, it swaps the data from the file to the cache and keeps doing this until it finds the data. This is quite expensive for obvious reasons being an I/O operation. Increase the cache so that the whol

    e data resides in the memorySequential and Concurrent caches: The 8.x version of PowerCenter gives us this wonderful option to build the caches of the lookups either concurrently or in a sequential manner depending on the business rule. If no business rule dictates otherwise, concurrent cache building is a very handy option.

    [HINTS]

    Difference b/w Aggregator and Expression Transformation? Expression transformation permits you to perform calculations row by row basis only. In Aggregator youcan perform calculations on groups.

    HTTP Transformation

    Passive & Connected. It allows you to connect to an HTTP server to use its services and applications. With an HTTP transformation, the Integration Service connects to the HTTP server, and issues a request to retrieves data or posts data tothe target or downstream transformation in the mapping.

    ***********************************************************************888

    ****************************************************************************

    Q. What type of repositories can be created using Informatica Repository Manager?

    A. Informatica PowerCenter includeds following type of repositories :

    Standalone Repository : A repository that functions individually and this is unrelated to any other repositories.

    Global Repository : This is a centralized repository in a domain. This repository can contain shared objects across the repositories in a domain. The objects are shared through global shortcuts.

  • 7/28/2019 Informatica DW Consolidated

    3/40

    Local Repository : Local repository is within a domain and its not a global repository. Local repository can connect to a global repository using global shortcuts and can use objects in its shared folders.Versioned Repository : This can either be local or global repository but it allows version control for the repository. A versioned repository can store multiplecopies, or versions of an object. This features allows to efficiently develop,test and deploy metadata in the production environment.

    Q. What is a code page?

    A. A code page contains encoding to specify characters in a set of one or more languages. The code page is selected based on source of the data. For example ifsource contains Japanese text then the code page should be selected to support Japanese text.

    When a code page is chosen, the program or application for which the code page is set, refers to a specific set of data that describes the characters the application recognizes. This influences the way that application stores, receives, andsends character data.

    Q. Which all databases PowerCenter Server on Windows can connect to?

    A. PowerCenter Server on Windows can connect to following databases:

    IBM DB2InformixMicrosoft AccessMicrosoft ExcelMicrosoft SQL ServerOracleSybaseTeradataQ. Which all databases PowerCenter Server on UNIX can connect to?

    A. PowerCenter Server on UNIX can connect to following databases:

    IBM DB2InformixOracleSybaseTeradata

    Infomratica Mapping DesignerQ. How to execute PL/SQL script from Informatica mapping?

    A. Stored Procedure (SP) transformation can be used to execute PL/SQL Scripts. In SP Transformation PL/SQL procedure name can be specified. Whenever the sessionis executed, the session will call the pl/sql procedure.

    Q. How can you define a transformation? What are different types of transformations available in Informatica?

    A. A transformation is a repository object that generates, modifies, or passes data. The Designer provides a set of transformations that perform specific functions. For example, an Aggregator transformation performs calculations on groups of data. Below are the various transformations available in Informatica:

    AggregatorApplication Source Qualifier

  • 7/28/2019 Informatica DW Consolidated

    4/40

    CustomExpressionExternal ProcedureFilterInputJoinerLookup

    NormalizerOutputRankRouterSequence GeneratorSorterSource QualifierStored ProcedureTransaction ControlUnionUpdate StrategyXML Generator

    XML ParserXML Source Qualifier

    Q. What is a source qualifier? What is meant by Query Override?

    A. Source Qualifier represents the rows that the PowerCenter Server reads from arelational or flat file source when it runs a session. When a relational or a flat file source definition is added to a mapping, it is connected to a Source Qualifier transformation.

    PowerCenter Server generates a query for each Source Qualifier Transformation whenever it runs the session. The default query is SELET statement containing allthe source columns. Source Qualifier has capability to override this default que

    ry by changing the default settings of the transformation properties. The list of selected ports or the order they appear in the default query should not be changed in overridden query.

    Q. What is aggregator transformation?

    A. The Aggregator transformation allows performing aggregate calculations, suchas averages and sums. Unlike Expression Transformation, the Aggregator transformation can only be used to perform calculations on groups. The Expression transformation permits calculations on a row-by-row basis only.

    Aggregator Transformation contains group by ports that indicate how to group thedata. While grouping the data, the aggregator transformation outputs the last row of each group unless otherwise specified in the transformation properties.

    Various group by functions available in Informatica are : AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM, VARIANCE.

    Q. What is Incremental Aggregation?

    A. Whenever a session is created for a mapping Aggregate Transformation, the session option for Incremental Aggregation can be enabled. When PowerCenter performs incremental aggregation, it passes new source data through the mapping and uses historical cache data to perform new aggregation calculations incrementally.

    Q. How Union Transformation is used?

    A. The union transformation is a multiple input group transformation that can be

  • 7/28/2019 Informatica DW Consolidated

    5/40

    used to merge data from various sources (or pipelines). This transformation works just like UNION ALL statement in SQL, that is used to combine result set of two SELECT statements.

    Q. Can two flat files be joined with Joiner Transformation?

    A. Yes, joiner transformation can be used to join data from two flat file source

    s.

    Q. What is a look up transformation?

    A. This transformation is used to lookup data in a flat file or a relational table, view or synonym. It compares lookup transformation ports (input ports) to the source column values based on the lookup condition. Later returned values canbe passed to other transformations.

    Q. Can a lookup be done on Flat Files?

    A. Yes.

    Q. What is the difference between a connected look up and unconnected look up?

    A. Connected lookup takes input values directly from other transformations in the pipleline.

    Unconnected lookup doesnt take inputs directly from any other transformation, butit can be used in any transformation (like expression) and can be invoked as afunction using :LKP expression. So, an unconnected lookup can be called multipletimes in a mapping.

    Q. What is a mapplet?

    A. A mapplet is a reusable object that is created using mapplet designer. The mapplet contains set of transformations and it allows us to reuse that transformation logic in multiple mappings.

    Q. What does reusable transformation mean?

    A. Reusable transformations can be used multiple times in a mapping. The reusable transformation is stored as a metadata separate from any other mapping that uses the transformation. Whenever any changes to a reusable transformation are made, all the mappings where the transformation is used will be invalidated.

    Q. What is update strategy and what are the options for update strategy?

    A. Informatica processes the source data row-by-row. By default every row is marked to be inserted in the target table. If the row has to be updated/inserted based on some logic Update Strategy transformation is used. The condition can be specified in Update Strategy to mark the processed row for update or insert.

    Following options are available for update strategy :

    DD_INSERT : If this is used the Update Strategy flags the row for insertion. Equivalent numeric value of DD_INSERT is 0.DD_UPDATE : If this is used the Update Strategy flags the row for update. Equivalent numeric value of DD_UPDATE is 1.

    DD_DELETE : If this is used the Update Strategy flags the row for deletion. Equivalent numeric value of DD_DELETE is 2.DD_REJECT : If this is used the Update Strategy flags the row for rejection. Equ

  • 7/28/2019 Informatica DW Consolidated

    6/40

    ivalent numeric value of DD_REJECT is 3.

    *******************************************************8

    SESSION LOGS

    Information that reside in a session log:- Allocation of system shared memory- Execution of Pre-session commands/ Post-session commands- Session Initialization- Creation of SQL commands for reader/writer threads- Start/End timings for target loading- Error encountered during session- Load summary of Reader/Writer/ DTM statistics

    Other Information- By default, the server generates log files based on the server code page.

    Thread IdentifierEx: CMN_1039Reader and Writer thread codes have 3 digit and Transformation codes have 4 digits.The number following a thread name indicate the following:(a) Target load order group number(b) Source pipeline number(c) Partition number(d) Aggregate/ Rank boundary number

    Log File CodesError Codes Description

    BR - Related to reader process, including ERP, relational and flat file.CMN - Related to database, memory allocationDBGR - Related to debuggerEP- External ProcedureLM - Load ManagerTM - DTMREP - RepositoryWRT - Writer

    Load Summary(a) Inserted(b) Updated(c) Deleted(d) Rejected

    Statistics details(a) Requested rows shows the no of rows the writer actually received for thespecified operation(b) Applied rows shows the number of rows the writer successfully applied tothe target (Without Error)(c) Rejected rows show the no of rows the writer could not apply to the target(d) Affected rows shows the no of rows affected by the specified operation

    Detailed transformation statisticsThe server reports the following details for each transformation in the mapping

  • 7/28/2019 Informatica DW Consolidated

    7/40

    (a) Name of Transformation(b) No of I/P rows and name of the Input source(c) No of O/P rows and name of the output target(d) No of rows dropped

    Tracing LevelsNormal - Initialization and status information, Errors encountered, Transf

    ormation errors, rows skipped, summarize session details (Not at the level of individual rows)

    Terse - Initialization information as well as error messages, and notification of rejected data

    Verbose Init - Addition to normal tracing, Names of Index, Data files usedand detailed transformation statistics.Verbose Data - Addition to Verbose Init, Each row that passes in to mapping detailed transformation statistics.

    NOTE

    When you enter tracing level in the session property sheet, you override tracinglevels configured for transformations in the mapping.MULTIPLE SERVERSWith Power Center, we can register and run multiple servers against a local or global repository. Hence you can distribute the repository session load across available servers to improve overall performance. (You can use only one Power Martserver in a local repository)Issues in Server Organization- Moving target database into the appropriate server machine may improve efficiency- All Sessions/Batches using data from other sessions/batches need to use thesame server and be incorporated into the same batch.- Server with different speed/sizes can be used for handling most complicated

    sessions.

    Session/Batch Behavior- By default, every session/batch run on its associated Informatica server. That is selected in property sheet.- In batches, that contain sessions with various servers, the property goes to the servers, that?s of outer most batch.

    Session Failures and Recovering SessionsTwo types of errors occurs in the server- Non-Fatal- Fatal

    (a) Non-Fatal Errors

    It is an error that does not force the session to stop on its first occurrence.Establish the error threshold in the session property sheet with the stop on option. When you enable this option, the server counts Non-Fatal errors that occurin the reader, writer and transformations.Reader errors can include alignment errors while running a session in Unicode mode.Writer errors can include key constraint violations, loading NULL into the NOT-NULL field and database errors.Transformation errors can include conversion errors and any condition set up asan ERROR,. Such as NULL Input.

    (b) Fatal Errors

    This occurs when the server can not access the source, target or repository. Thi

  • 7/28/2019 Informatica DW Consolidated

    8/40

    s can include loss of connection or target database errors, such as lack of database space to load data.If the session uses normalizer (or) sequence generator transformations, the server can not update the sequence values in the repository, and a fatal error occurs. OthersUsages of ABORT function in mapping logic, to abort a session when the server en

    counters a transformation error.Stopping the server using pmcmd (or) Server Manager

    Performing Recovery- When the server starts a recovery session, it reads the OPB_SRVR_RECOVERY table and notes the rowid of the last row commited to the target database. The server then reads all sources again and starts processing from the next rowid.- By default, perform recovery is disabled in setup. Hence it won?t make entries in OPB_SRVR_RECOVERY table.

    - The recovery session moves through the states of normal session schedule, waiting to run, Initializing, running, completed and failed. If the initial recovery fails, you can run recovery as many times.- The normal reject loading process can also be done in session recovery process.- The performance of recovery might be low, ifo Mapping contain mapping variableso Commit interval is high

    Un recoverable SessionsUnder certain circumstances, when a session does not complete, you need to truncate the target and run the session from the beginning.

    Commit IntervalsA commit interval is the interval at which the server commits data to relationaltargets during a session.(a) Target based commit

    - Server commits data based on the no of target rows and the key constraintson the target table. The commit point also depends on the buffer block size andthe commit interval.- During a session, the server continues to fill the writer buffer, after itreaches the commit interval. When the buffer block is full, the Informatica server issues a commit command. As a result, the amount of data committed at the commit point generally exceeds the commit interval.- The server commits data to each target based on primary ?foreign key constraints.

    (b) Source based commit

    - Server commits data based on the number of source rows. The commit point isthe commit interval you configure in the session properties.- During a session, the server commits data to the target based on the numberof rows from an active source in a single pipeline. The rows are referred to assource rows.- A pipeline consists of a source qualifier and all the transformations and targets that receive data from source qualifier.- Although the Filter, Router and Update Strategy transformations are active

    transformations, the server does not use them as active sources in a source based commit session.- When a server runs a session, it identifies the active source for each pipe

  • 7/28/2019 Informatica DW Consolidated

    9/40

    line in the mapping. The server generates a commit row from the active source atevery commit interval.- When each target in the pipeline receives the commit rows the server performs the commit.

    Reject LoadingDuring a session, the server creates a reject file for each target instance in t

    he mapping. If the writer of the target rejects data, the server writers the rejected row into the reject file.You can correct those rejected data and re-load them to relational targets, using the reject loading utility. (You cannot load rejected data into a flat file target)Each time, you run a session, the server appends a rejected data to the reject file.Locating the BadFiles$PMBadFileDirFilename.bad

    When you run a partitioned session, the server creates a separate reject file fo

    r each partition.Reading Rejected dataEx: 3,D,1,D,D,0,D,1094345609,D,0,0.00To help us in finding the reason for rejecting, there are two main things.(a) Row indicator

    Row indicator tells the writer, what to do with the row of wrong data.Row indicator Meaning Rejected By0 Insert Writer or target1 Update Writer or target2 Delete Writer or target3 Reject Writer

    If a row indicator is 3, the writer rejected the row because an update strategyexpression marked it for reject.(b) Column indicator

    Column indicator is followed by the first column of data, and another column indicator. They appears after every column of data and define the type of data preceding it

    Column Indicator Meaning Writer Treats asD Valid Data Good Data. The target accepts

    it unless a database erroroccurs, such as findingduplicate key.

    ? Overflow Bad Data.N Null Bad Data.T Truncated Bad Data

    NOTENULL columns appear in the reject file with commas marking their column.Correcting Reject FileUse the reject file and the session log to determine the cause for rejected data.Keep in mind that correcting the reject file does not necessarily correct the source of the reject.

    Correct the mapping and target database to eliminate some of the rejected data when you run the session again.Trying to correct target rejected rows before correcting writer rejected rows is

  • 7/28/2019 Informatica DW Consolidated

    10/40

    not recommended since they may contain misleading column indicator.For example, a series of ?N? indicator might lead you to believe the target database does not accept NULL values, so you decide to change those NULL values to Zero.However, if those rows also had a 3 in row indicator. Column, the row was rejected b the writer because of an update strategy expression, not because of a target database restriction.

    If you try to load the corrected file to target, the writer will again reject those rows, and they will contain inaccurate 0 values, in place of NULL values.

    Why writer can reject ?- Data overflowed column constraints- An update strategy expression

    Why target database can Reject ?- Data contains a NULL column- Database errors, such as key violations

    Steps for loading reject file:

    - After correcting the rejected data, rename the rejected file to reject_file.in- The rejloader used the data movement mode configured for the server. It also used the code page of server/OS. Hence do not change the above, in middle of the reject loading- Use the reject loader utilityPmrejldr pmserver.cfg [folder name] [session name]Other pointsThe server does not perform the following option, when using reject loader(a) Source base commit(b) Constraint based loading(c) Truncated target table(d) FTP targets

    (e) External Loading

    Multiple reject loadersYou can run the session several times and correct rejected data from the severalsession at once. You can correct and load all of the reject files at once, or work on one or two reject files, load then and work on the other at a later time.External LoadingYou can configure a session to use Sybase IQ, Teradata and Oracle external loaders to load session target files into the respective databases.The External Loader option can increase session performance since these databases can load information directly from files faster than they can the SQL commandsto insert the same data into the database.Method:When a session used External loader, the session creates a control file and target flat file. The control file contains information about the target flat file,such as data format and loading instruction for the External Loader. The controlfile has an extension of ?*.ctl ? and you can view the file in $PmtargetFilesDir.For using an External Loader:The following must be done:- configure an external loader connection in the server manager- Configure the session to write to a target flat file local to the server.- Choose an external loader connection for each target file in session property sheet.

    Issues with External Loader:- Disable constraints- Performance issues

  • 7/28/2019 Informatica DW Consolidated

    11/40

    o Increase commit intervalso Turn off database logging- Code page requirements- The server can use multiple External Loader within one session (Ex: you arehaving a session with the two target files. One with Oracle External Loader andanother with Sybase External Loader)

    Other Information:- The External Loader performance depends upon the platform of the server- The server loads data at different stages of the session- The serve writes External Loader initialization and completing messaging inthe session log. However, details about EL performance, it is generated at EL log, which is getting stored as same target directory.- If the session contains errors, the server continues the EL process. If thesession fails, the server loads partial target data using EL.- The EL creates a reject file for data rejected by the database. The rejectfile has an extension of ?*.ldr? reject.- The EL saves the reject file in the target file directory- You can load corrected data from the file, using database reject loader, an

    d not through Informatica reject load utility (For EL reject file only)

    Configuring EL in session- In the server manager, open the session property sheet- Select File target, and then click flat file options

    Caches- server creates index and data caches in memory for aggregator ,rank ,joinerand Lookup transformation in a mapping.- Server stores key values in index caches and output values in data caches :if the server requires more memory ,it stores overflow values in cache files .- When the session completes, the server releases caches memory, and in most

    circumstances, it deletes the caches files .- Caches Storage overflow :- releases caches memory, and in most circumstances, it deletes the caches files .

    Caches Storage overflow :Transformation index cache data cacheAggregator stores group values stores calculations

    As configured in the based on Group-by portsGroup-by ports.

    Rank stores group values as stores ranking informationConfigured in the Group-by based on Group-by ports .

    Joiner stores index values for stores master source rows .The master source tableAs configured in Joiner condition.

    Lookup stores Lookup condition stores lookup data that?sInformation. Not stored in the index cache.

    Determining cache requirements

    To calculate the cache size, you need to consider column and row requirements aswell as processing overhead.- server requires processing overhead to cache data and index information.

    Column overhead includes a null indicator, and row overhead can include row to key information.Steps:

  • 7/28/2019 Informatica DW Consolidated

    12/40

    - first, add the total column size in the cache to the row overhead.- Multiply the result by the no of groups (or) rows in the cache this gives the minimum cache requirements .- For maximum requirements, multiply min requirements by 2.

    Location:? by default , the server stores the index and data files in the directory $P

    MCacheDir.? the server names the index files PMAGG*.idx and data files PMAGG*.dat. if the size exceeds 2GB,you may find multiple index and data files in the directory.The server appends a number to the end of filename(PMAGG*.id*1,id*2,etc).

    Aggregator Caches? when server runs a session with an aggregator transformation, it stores data in memory until it completes the aggregation.? when you partition a source, the server creates one memory cache and one disk cache and one and disk cache for each partition .It routes data from one partition to another based on group key values of the transformation.? server uses memory to process an aggregator transformation with sort ports.

    It doesn?t use cache memory .you don?t need to configure the cache memory, thatuse sorted ports.

    Index cache:#Groups (( column size) + 7)Aggregate data cache:#Groups (( column size) + 7)Rank Cache- when the server runs a session with a Rank transformation, it compares an input row with rows with rows in data cache. If the input row out-ranks a storedrow,the Informatica server replaces the stored row with the input row.- If the rank transformation is configured to rank across multiple groups, the server ranks incrementally for each group it finds .

    Index Cache :#Groups (( column size) + 7)Rank Data Cache:#Group [(#Ranks * ( column size + 10)) + 20]Joiner Cache:- When server runs a session with joiner transformation, it reads all rows from the master source and builds memory caches based on the master rows.- After building these caches, the server reads rows from the detail source and performs the joins- Server creates the Index cache as it reads the master source into the datacache. The server uses the Index cache to test the join condition. When it findsa match, it retrieves rows values from the data cache.- To improve joiner performance, the server aligns all data for joiner cacheor an eight byte boundary.

    Index Cache :#Master rows [( column size) + 16)Joiner Data Cache:#Master row [( column size) + 8]Lookup cache:- When server runs a lookup transformation, the server builds a cache in memory, when it process the first row of data in the transformation.- Server builds the cache and queries it for the each row that enters the tra

    nsformation.- If you partition the source pipeline, the server allocates the configured amount of memory for each partition. If two lookup transformations share the cach

  • 7/28/2019 Informatica DW Consolidated

    13/40

    e, the server does not allocate additional memory for the second lookup transformation.- The server creates index and data cache files in the lookup cache drectoryand used the server code page to create the files.

    Index Cache :#Rows in lookup table [( column size) + 16)

    Lookup Data Cache:#Rows in lookup table [( column size) + 8]TransformationsA transformation is a repository object that generates, modifies or passes data.(a) Active Transformation:a. Can change the number of rows, that passes through it (Filter, Normalizer,Rank ..)

    (b) Passive Transformation:a. Does not change the no of rows that passes through it (Expression, lookup..)

    NOTE:- Transformations can be connected to the data flow or they can be unconnected- An unconnected transformation is not connected to other transformation in the mapping. It is called with in another transformation and returns a value to that transformation

    Reusable Transformations:When you are using reusable transformation to a mapping, the definition of transformation exists outside the mapping while an instance appears with mapping.All the changes you are making in transformation will immediately reflect in instances.

    You can create reusable transformation by two methods:(a) Designing in transformation developer(b) Promoting a standard transformation

    Change that reflects in mappings are like expressions. If port name etc. are changes they won?t reflect.Handling High-Precision Data:- Server process decimal values as doubles or decimals.- When you create a session, you choose to enable the decimal data type or let the server process the data as double (Precision of 15)

    Example:- You may have a mapping with decimal (20,0) that passes through. The value may be 40012030304957666903.

    If you enable decimal arithmetic, the server passes the number as it is. If youdo not enable decimal arithmetic, the server passes 4.00120303049577 X 1019.If you want to process a decimal value with a precision greater than 28 digits,the server automatically treats as a double value.MappletsWhen the server runs a session using a mapplets, it expands the mapplets. The server then runs the session as it would any other sessions, passing data througheach transformations in the mapplet.If you use a reusable transformation in a mapplet, changes to these can invalidate the mapplet and every mapping using the mapplet.

    You can create a non-reusable instance of a reusable transformation.Mapplet Objects:(a) Input transformation

  • 7/28/2019 Informatica DW Consolidated

    14/40

    (b) Source qualifier(c) Transformations, as you need(d) Output transformation

    Mapplet Won?t Support:- Joiner- Normalizer

    - Pre/Post session stored procedure- Target definitions- XML source definitions

    Types of Mapplets:(a) Active Mapplets - Contains one or more active transformations(b) Passive Mapplets - Contains only passive transformation

    Copied mapplets are not an instance of original mapplets. If you make changes tothe original, the copy does not inherit your changesYou can use a single mapplet, even more than once on a mapping.

    PortsDefault value for I/P port - NULLDefault value for O/P port - ERRORDefault value for variables - Does not support default valuesSession ParametersThis parameter represent values you might want to change between sessions, suchas DB Connection or source file.We can use session parameter in a session property sheet, then define the parameters in a session parameter file.The user defined session parameter are:(a) DB Connection(b) Source File directory(c) Target file directory

    (d) Reject file directory

    Description:Use session parameter to make sessions more flexible. For example, you have thesame type of transactional data written to two different databases, and you usethe database connections TransDB1 and TransDB2 to connect to the databases. Youwant to use the same mapping for both tables.Instead of creating two sessions for the same mapping, you can create a databaseconnection parameter, like $DBConnectionSource, and use it as the source database connection for the session.When you create a parameter file for the session, you set $DBConnectionSource toTransDB1 and run the session. After it completes set the value to TransDB2 andrun the session again.NOTE:You can use several parameter together to make session management easier.Session parameters do not have default value, when the server can not find a value for a session parameter, it fails to initialize the session.

    Session Parameter File- A parameter file is created by text editor.- In that, we can specify the folder and session name, then list the parameters and variables used in the session and assign each value.- Save the parameter file in any directory, load to the server- We can define following values in a parametero Mapping parameter

    o Mapping variableso Session parameters- You can include parameter and variable information for more than one sessio

  • 7/28/2019 Informatica DW Consolidated

    15/40

    n in a single parameter file by creating separate sections, for each session with in the parameter file.- You can override the parameter file for sessions contained in a batch by using a batch parameter file. A batch parameter file has the same format as a session parameter fileLocaleInformatica server can transform character data in two modes

    (a) ASCIIa. Default oneb. Passes 7 byte, US-ASCII character data

    (b) UNICODEa. Passes 8 bytes, multi byte character datab. It uses 2 bytes for each character to move data and performs additional checks at session level, to ensure data integrity.

    Code pages contains the encoding to specify characters in a set of one or more languages. We can select a code page, based on the type of character data in themappings.

    Compatibility between code pages is essential for accurate data movement.The various code page components are- Operating system Locale settings- Operating system code page- Informatica server data movement mode- Informatica server code page- Informatica repository code page

    Locale(a) System Locale - System Default(b) User locale - setting for date, time, display Input localeMapping Parameter and Variables

    These represent values in mappings/mapplets.If we declare mapping parameters and variables in a mapping, you can reuse a mapping by altering the parameter and variable values of the mappings in the session.This can reduce the overhead of creating multiple mappings when only certain attributes of mapping needs to be changed.When you want to use the same value for a mapping parameter each time you run the session.Unlike a mapping parameter, a mapping variable represent a value that can changethrough the session. The server saves the value of a mapping variable to the repository at the end of each successful run and used that value the next time yourun the session.Mapping objects:Source, Target, Transformation, Cubes, Dimension

    Debugger

    We can run the Debugger in two situations(a) Before Session: After saving mapping, we can run some initial tests.(b) After Session: real Debugging process

    Metadata Reporter:- Web based application that allows to run reports against repository metadata

    - Reports including executed sessions, lookup table dependencies, mappings and source/target schemas.

  • 7/28/2019 Informatica DW Consolidated

    16/40

    RepositoryTypes of Repository(a) Global Repositorya. This is the hub of the domain use the GR to store common objects that multiple developers can use through shortcuts. These may include operational or application source definitions, reusable transformations, mapplets and mappings

    (b) Local Repositorya. A Local Repository is with in a domain that is not the global repository.Use4 the Local Repository for development.

    Standard Repositorya. A repository that functions individually, unrelated and unconnected to other repository

    NOTE:- Once you create a global repository, you can not change it to a local repos

    itory- However, you can promote the local to global repositoryBatches- Provide a way to group sessions for either serial or parallel execution byserver- Batcheso Sequential (Runs session one after another)o Concurrent (Runs sessions at same time)

    Nesting BatchesEach batch can contain any number of session/batches. We can nest batches several levels deep, defining batches within batchesNested batches are useful when you want to control a complex series of sessions

    that must run sequentially or concurrentlySchedulingWhen you place sessions in a batch, the batch schedule override that session schedule by default. However, we can configure a batched session to run on its ownschedule by selecting the ?Use Absolute Time Session? Option.Server BehaviorServer configured to run a batch overrides the server configuration to run sessions within the batch. If you have multiple servers, all sessions within a batchrun on the Informatica server that runs the batch.The server marks a batch as failed if one of its sessions is configured to run if ?Previous completes? and that previous session fails.Sequential BatchIf you have sessions with dependent source/target relationship, you can place them in a sequential batch, so that Informatica server can run them is consecutiveorder.They are two ways of running sessions, under this category(a) Run the session, only if the previous completes successfully(b) Always run the session (this is default)

    Concurrent BatchIn this mode, the server starts all of the sessions within the batch, at same timeConcurrent batches take advantage of the resource of the Informatica server, reducing the time it takes to run the session separately or in a sequential batch.Concurrent batch in a Sequential batch

    If you have concurrent batches with source-target dependencies that benefit fromrunning those batches in a particular order, just like sessions, place them into a sequential batch.

  • 7/28/2019 Informatica DW Consolidated

    17/40

    Server ConceptsThe Informatica server used three system resources(a) CPU(b) Shared Memory(c) Buffer Memory

    Informatica server uses shared memory, buffer memory and cache memory for sessio

    n information and to move data between session threads.LM Shared MemoryLoad Manager uses both process and shared memory. The LM keeps the information server list of sessions and batches, and the schedule queue in process memory.Once a session starts, the LM uses shared memory to store session details for the duration of the session run or session schedule. This shared memory appears asthe configurable parameter (LMSharedMemory) and the server allots 2,000,000 bytes as default.This allows you to schedule or run approximately 10 sessions at one time.DTM Buffer MemoryThe DTM process allocates buffer memory to the session based on the DTM buffer poll size settings, in session properties. By default, it allocates 12,000,000 by

    tes of memory to the session.DTM divides memory into buffer blocks as configured in the buffer block size settings. (Default: 64,000 bytes per block)Running a SessionThe following tasks are being done during a session1. LM locks the session and read session properties2. LM reads parameter file3. LM expands server/session variables and parameters4. LM verifies permission and privileges5. LM validates source and target code page6. LM creates session log file7. LM creates DTM process8. DTM process allocates DTM process memory

    9. DTM initializes the session and fetches mapping10. DTM executes pre-session commands and procedures11. DTM creates reader, writer, transformation threads for each pipeline12. DTM executes post-session commands and procedures13. DTM writes historical incremental aggregation/lookup to repository14. LM sends post-session emailsStopping and aborting a session- If the session you want to stop is a part of batch, you must stop the batch- If the batch is part of nested batch, stop the outermost batch- When you issue the stop command, the server stops reading data. It continues processing and writing data and committing data to targets- If the server cannot finish processing and committing data, you can issue the ABORT command. It is similar to stop command, except it has a 60 second timeout. If the server cannot finish processing and committing data within 60 seconds, it kills the DTM process and terminates the session.

    Recovery:- After a session being stopped/aborted, the session results can be recovered. When the recovery is performed, the session continues from the point at whichit stopped.- If you do not recover the session, the server runs the entire session the next time.- Hence, after stopping/aborting, you may need to manually delete targets before the session runs again.

    NOTE:ABORT command and ABORT function, both are different.

  • 7/28/2019 Informatica DW Consolidated

    18/40

    When can a Session Fail- Server cannot allocate enough system resources- Session exceeds the maximum no of sessions the server can run concurrently- Server cannot obtain an execute lock for the session (the session is already locked)- Server unable to execute post-session shell commands or post-load stored pr

    ocedures- Server encounters database errors- Server encounter Transformation row errors (Ex: NULL value in non-null fields)- Network related errors

    When Pre/Post Shell Commands are useful- To delete a reject file- To archive target files before session beginsSession Performance- Minimum log (Terse)- Partitioning source data.

    - Performing ETL for each partition, in parallel. (For this, multiple CPUs are needed)- Adding indexes.- Changing commit Level.- Using Filter trans to remove unwanted data movement.- Increasing buffer memory, when large volume of data.- Multiple lookups can reduce the performance. Verify the largest lookup table and tune the expressions.- In session level, the causes are small cache size, low buffer memory and small commit interval.- At system level,o WIN NT/2000-U the task manager.o UNIX: VMSTART, IOSTART.

    Hierarchy of optimization- Target.- Source.- Mapping- Session.- System.

    Optimizing Target Databases:- Drop indexes /constraints- Increase checkpoint intervals.- Use bulk loading /external loading.- Turn off recovery.- Increase database network packet size.

    Source level

    - Optimize the query (using group by, group by).- Use conditional filters.- Connect to RDBMS using IPC protocol.

    Mapping- Optimize data type conversions.- Eliminate transformation errors.

    - Optimize transformations/ expressions.

  • 7/28/2019 Informatica DW Consolidated

    19/40

    Session:- concurrent batches.- Partition sessions.- Reduce error tracing.- Remove staging area.- Tune session parameters.

    System:- improve network speed.- Use multiple preservers on separate systems.- Reduce paging.

    Session ProcessInfo server uses both process memory and system shared memory to perform ETL process.It runs as a daemon on UNIX and as a service on WIN NT.The following processes are used to run a session:(a) LOAD manager process: - starts a session? creates DTM process, which creates the session.

    (b) DTM process: - creates threads to initialize the session? read, write and transform data.? handle pre/post session opertions.Load manager processes:- manages session/batch scheduling.- Locks session.- Reads parameter file.- Expands server/session variables, parameters .- Verifies permissions/privileges.- Creates session log file.

    DTM process:The primary purpose of the DTM is to create and manage threads that carry out the session tasks.The DTM allocates process memory for the session and divides it into buffers. This is known as buffer memory. The default memory allocation is 12,000,000 bytes.it creates the main thread, which is called master thread .this manages all other threads.

    Various threads functions

    Master thread- handles stop and abort requests from loadmanager.

    Mapping thread- one thread for each session.Fetches session and mapping information.Compiles mapping.Cleans up after execution.Reader thread- one thread for each partition.

    Relational sources uses relational threads andFlat files use file threads.

    Writer thread- one thread for each partition writes to target.

    Transformation thread- one or more transformation for each partit

    ion.

    Note:

  • 7/28/2019 Informatica DW Consolidated

    20/40

    When you run a session, the threads for a partitioned source execute concurrently. The threads use buffers to move/transform data.

    *************************************************************************8

    What r the out put files that the informatica server creates during the session

    running?

    Informatica server log: Informatica server(on unix) creates a log for all statusand error messages(default name: pm.server.log).It also creates an error log for error messages.These files will be created in informatica home directory.

    Session log file: Informatica server creates session log file for each session.It writes information about session into log files such as initialization process,creation of sql commands for reader and writer threads,errors encountered and load summary.The amount of detail in session log file depends on the tracing level that u set.

    Session detail file: This file contains load statistics for each targets in mapping.Session detail include information such as table name,number of rows writtenor rejected.U can view this file by double clicking on the session in monitor window

    Performance detail file: This file contains information known as session performance details which helps U where performance can be improved.To genarate this file select the performance detail option in the session property sheet.

    Reject file: This file contains the rows of data that the writer does notwrite to targets.

    Control file: Informatica server creates control file and a target file when U r

    un a session that uses the external loader.The control file contains the information about the target flat file such as data format and loading instructios forthe external loader.

    Post session email: Post session email allows U to automatically communicate information about a session run to designated recipents.U can create two differentmessages.One if the session completed sucessfully the other if the session fails.

    Indicator file: If u use the flat file as a target,U can configure the informatica server to create indicator file.For each target row,the indicator file contains a number to indicate whether the row was marked for insert,update,delete or reject.

    output file: If session writes to a target file,the informatica server creates the target file based on file prpoerties entered in the session property sheet.

    Cache files: When the informatica server creates memory cache it also creates cache files.For the following circumstances informatica server creates index and datacache files.

    Aggreagtor transformationJoiner transformationRank transformationLookup transformation

    ******************************************************************************

  • 7/28/2019 Informatica DW Consolidated

    21/40

    ******************************************************************************************************************************************************************************************************************************************************

    Add EXPRESSION transformation after a SQ and before the target. If the source ortarget definition changes reconnecting ports is much easier.

    Denormalization using aggregatorLAST function in aggregatorReturns the last row in the selected port. Optionally, you can apply a filter tolimit the rows the Integration Service reads. You can nest only one other aggregate function within LAST.

    LAST(AMOUNT, MONTH)

    MQ Source qualifier transformation

    ***********************************************

    Test load takes the no.of rows from the SQ as specified in the test rows. and sh

  • 7/28/2019 Informatica DW Consolidated

    22/40

    ows to which instance it goes.not loaded to the target.Not working for flat file target***********************************************

    EVENTWAIT-EVENTRAISE tasks

    First create an event by right click the workflow and edit-> events tab -> create an event

    place eventraise task in the workflow space and give the created event name in the "user defined event" (properties tab)

    Place eventwait task in the workflow space-> 2 options in events tabpredefined-this is a file watchuserdefined-this is an event crea

    ted on the workflow properties events tabuser-defined event to watch and give the the event created

    when the eventraise is executed, it creates the event-> it triggers the event wa

    it to continue

    ***********************************************

    Timer Task

    You can specify the period of time to wait before the Integration Service runs the next task in the workflow with the Timer task.

    two options:Absolute time - give the exact time when to start the next task or refer a datetime variableRelative time - give the hours, minute, seconds - from the start time of this ta

    sk- from the start time of the par

    ent workflow/worklet- from the start time of the to

    p-level workflow

    ***********************************************

    Decision Task

    Use the Decision task instead of multiple link conditions in a workflow. Insteadof specifying multiple link conditions, use the predefined condition variable in a Decision task to simplify link conditions.

    can be achieve without this task also

    ***********************************************Assignment task

    You can assign a value to a user-defined workflow variable with the Assignment task.

    **********************************************

    Control Task

  • 7/28/2019 Informatica DW Consolidated

    23/40

    fail mefail parentabort parentfail parentlevel workflowstop parent level workflowabort parent level workflow

    **********************************************

    DECODE( ITEM_ID, 10, 'Flashlight',14, 'Regulator',20, 'Knife',40, 'Tank','NONE' )

    DECODE ( CONST_NAME,'Five', 5,'Pythagoras', '1.414213562',

    'Archimedes', '3.141592654','Pi', 3.141592654 )

    **********************************************If you use an output default value other than ERROR, the default value overridesthe ERROR function in an expression. For example, you use the ERROR function inan expression, and you assign the default value, 1234, to the output port. Each time the Integration Service encounters the ERROR function in the expression, itoverrides the error with the value 1234 and passes 1234 to the next transformation.It does not skip the row, and it does not log an error in the session log.

    IIF( SALARY < 0, ERROR ('Error. Negative salary found. Row skipped.', EMP_SALARY

    )

    SALARY RETURN VALUE10000 10000-15000 'Error. Negative salary found. Row skipped.'NULL NULL150000 1500001005 1005

    **********************************************

    Returns e raised to the specified power (exponent), where e=2.71828183. For example, EXP(2) returns 7.38905609893065.

    **********************************************

    Returns the first value found within a port or group. Optionally, you can applya filter to limit the rows the Integration Service reads. You can nest only oneother aggregate function within FIRST.FIRST( value [, filter_condition ] )

    FIRST( ITEM_NAME, ITEM_PRICE > 10 )LAST(ITEM_NAME, ITEM_PRICE > 10)LPAD( PART_NUM, 6, '0')RIM( LAST_NAME, 'S.')

    *********************************************

  • 7/28/2019 Informatica DW Consolidated

    24/40

    Returns the future value of an investment, where you make periodic, constant payments and the investment earns a constant interest rate.

    SyntaxFV( rate, terms, payment [, present value, type] )

    FV(0.0075, 12, -250, -2000, TRUE)

    *********************************************

    GET_DATE_PART( DATE_SHIPPED, 'HH12' )GET_DATE_PART( DATE_SHIPPED, 'DD' )GET_DATE_PART( DATE_SHIPPED, 'MON' )*********************************************

    GREATEST( QUANTITY1, QUANTITY2, QUANTITY3 )

    *********************************************

    INDEXOF( ITEM_NAME, diving hood, flashlight, safety knife)

    1 if the input value matches string1, 2 if the input value matches string2, andso on.

    0 if the input value is not found.

    NULL if the input is a null value.

    *********************************************

    INITCAP( string )

    *********************************************

    INSTR( COMPANY, 'a', 1, 2 )

    *********************************************

    Is_Date() returns 0 or 1

    *********************************************

    Mapping -> Debugger -> use the existing session instance -> click next instance[shows one row at a time when moves throuthout the mapping, displays the next row]

    *********************************************

    Is_Number()Is_spaces() Returns whether a string value consists entirely of spaces.LAST_DAY() Returns the date of the last day of the month for each date in a port.

    ********************************************

    MAX( ITEM_NAME, MANUFACTURER_ID='104' ) - The MAX function uses the same sort order that the Sorter transformation uses. However, the MAX function is case sensitive, and the Sorter transformation may not be case sensitive.

  • 7/28/2019 Informatica DW Consolidated

    25/40

    You can also use MAX to return the latest date or the largest numeric value in aport or group.

    *******************************************

    MD5 - Message-Digest algorithm 5

    METAPHONE - Encodes string values. You can specify the length of the string thatyou want to encode.

    *******************************************

    The following expression returns the average order for a Stabilizing Vest, basedon the first five rows in the Sales port, and thereafter, returns the average for the last five rows read:

    MOVINGAVG( SALES, 5 )MOVINGSUM( SALES, 5 )

    *******************************************

    POWER( NUMBERS, EXPONENT )

    *******************************************RAND (1) Returns a random number between 0 and 1. This is useful for probability scenarios.

    *******************************************

    REG_EXTRACT

    REPLACESTR ( CaseFlag, InputString, OldString1, [OldString2, ... OldStringN,] Ne

    wString )

    REVERSE( string )

    RPAD( ITEM_NAME, 16, '.')

    TO_FLOAT() - 0 if the value in the port is blank or a non-numeric character.

    ******************************************

    What is a global and local shortcuts?

    Standalone Repository : A repository that functions individually and this is unrelated to any other repositories.Global Repository : This is a centralized repository in a domain. This repository can contain shared objects across the repositories in a domain. The objects are shared through global shortcuts.Local Repository : Local repository is within a domain and its not a global repository. Local repository can connect to a global repository using global shortcuts and can use objects in its shared folders.

    **************************************************

    When bulk loading, the Integration Service bypasses the database log, which speeds performance. Without writing to the database log, however, the target databas

    e cannot perform rollback. As a result, you may not be able to perform recovery.When you use bulk loading, weigh the importance of improved session performanceagainst the ability to recover an incomplete session.

  • 7/28/2019 Informatica DW Consolidated

    26/40

    *************************************************

    IIF( FLG_A = 'Y' and FLG_B = 'Y' AND FLG_C = 'Y',VAL_A + VAL_B + VAL_C,IIF( FLG_A = 'Y' and FLG_B = 'Y' AND FLG_C = 'N',VAL_A + VAL_B ,

    IIF( FLG_A = 'Y' and FLG_B = 'N' AND FLG_C = 'Y',VAL_A + VAL_C,IIF( FLG_A = 'Y' and FLG_B = 'N' AND FLG_C = 'N',VAL_A ,IIF( FLG_A = 'N' and FLG_B = 'Y' AND FLG_C = 'Y',VAL_B + VAL_C,IIF( FLG_A = 'N' and FLG_B = 'Y' AND FLG_C = 'N',VAL_B ,IIF( FLG_A = 'N' and FLG_B = 'N' AND FLG_C = 'Y',VAL_C,IIF( FLG_A = 'N' and FLG_B = 'N' AND FLG_C = 'N',0.0,

    ))))))))

    This expression requires 8 IIFs, 16 ANDs, and at least 24 comparisons.

    If you take advantage of the IIF function, you can rewrite that expression as:

    IIF(FLG_A='Y', VAL_A, 0.0)+ IIF(FLG_B='Y', VAL_B, 0.0)+ IIF(FLG_C='Y', VAL_C, 0.0)

    *************************************************

  • 7/28/2019 Informatica DW Consolidated

    27/40

    ******************************************************

    Look for performance bottlenecks in the following order:

    1. Target

    2. Source

    3. Mapping

    4. Transformation

    5. Session

  • 7/28/2019 Informatica DW Consolidated

    28/40

    6. grid deployments

    7. Powercenter components

    8. System******************************************

    Run test sessions. You can configure a test session to read from a flat file source or to write to a flat file target to identify source and target bottlenecks.

    Analyze performance details. Analyze performance details, such as performancecounters, to determine where session performance decreases.

    Analyze thread statistics. Analyze thread statistics to determine the optimalnumber of partition points.

    Monitor system performance. You can use system monitoring tools to view the percentage of CPU use, I/O waits, and paging to identify system bottlenecks. You c

    an also use the Workflow Monitor to view system resource usage.

    ******************************************

    If the reader or writer thread is 100% busy, consider using string datatypes inthe source or target ports. Non-string ports require more processing.

    If a transformation thread is 100% busy, consider adding a partition point inthe segment. When you add partition points to the mapping, the Integration Service increases the number of transformation threads it uses for the session. However, if the machine is already running at or near full capacity, do not add morethreads.

    If one transformation requires more processing time than the others, consideradding a pass-through partition point to the transformation.

    *****************************************

    Complete the following tasks to eliminate target bottlenecks:

    Have the database administrator optimize database performance by optimizing the query.

    Increase the database network packet size.

    Configure index and key constraints.

    ****************************************

    Source BottlenecksPerformance bottlenecks can occur when the Integration Service reads from a source database. Inefficient query or small database network packet sizes can causesource bottlenecks.

    Identifying Source Bottlenecks

    You can read the thread statistics in the session log to determine if the sourceis the bottleneck. When the Integration Service spends more time on the readerthread than the transformation or writer threads, you have a source bottleneck.

  • 7/28/2019 Informatica DW Consolidated

    29/40

    If the session reads from a relational source, use the following methods to identify source bottlenecks:

    Filter transformation

    Read test mapping

    Database query

    If the session reads from a flat file source, you probably do not have a sourcebottleneck.

    Using a Filter TransformationYou can use a Filter transformation in the mapping to measure the time it takesto read source data.

    Add a Filter transformation after each source qualifier. Set the filter condition to false so that no data is processed passed the Filter transformation. If the

    time it takes to run the new session remains about the same, you have a sourcebottleneck.

    Using a Read Test MappingYou can create a read test mapping to identify source bottlenecks. A read test mapping isolates the read query by removing the transformation in the mapping.

    To create a read test mapping, complete the following steps:

    1. Make a copy of the original mapping.

    2. In the copied mapping, keep only the sources, source qualifiers, and any custom joins or queries.

    3. Remove all transformations.

    4. Connect the source qualifiers to a file target.

    Run a session against the read test mapping. If the session performance is similar to the original session, you have a source bottleneck.

    Using a Database QueryTo identify source bottlenecks, execute the read query directly against the source database.

    Copy the read query directly from the session log. Execute the query against thesource database with a query tool such as isql. On Windows, you can load the result of the query in a file. On UNIX, you can load the result of the query in /dev/null.

    Measure the query execution time and the time it takes for the query to return the first row.

    Eliminating Source BottlenecksComplete the following tasks to eliminate source bottlenecks:

    Set the number of bytes the Integration Service reads per line if the Integration Service reads from a flat file source.

    Have the database administrator optimize database performance by optimizing the query.

  • 7/28/2019 Informatica DW Consolidated

    30/40

    Increase the database network packet size.

    Configure index and key constraints.

    If there is a long delay between the two time measurements in a database query, you can use an optimizer hint.

    ******************A simple source filter on the source database can sometimes negatively impact performance because of the lack of indexes. You can use the PowerCenter conditional filter in the Source Qualifier to improve performance.********

    *****************************************************

    Mapping BottlenecksIf you determine that you do not have a source or target bottleneck, you may have a mapping bottleneck.

    Generally, you reduce the number of transformations in the mapping and delete unnecessary links between transformations to optimize the mapping. Configure the mapping with the least number of transformations and expressions to do the most amount of work possible. Delete unnecessary links between transformations to minimize the amount of data moved.

    ****************************************************

    Optimizing the Line Sequential Buffer LengthIf the session reads from a flat file source, you can improve session performance by setting the number of bytes the Integration Service reads per line. By defa

    ult, the Integration Service reads 1024 bytes per line. If each line in the source file is less than the default setting, you can decrease the line sequential buffer length in the session properties.

    ***************************************************

    Single-pass reading allows you to populate multiple targets with one source qualifier. Consider using single-pass reading if you have multiple sessions that usethe same sources. You can combine the transformation logic for each mapping inone mapping and use one source qualifier for each source.

    ***************************************************

    You can optimize performance for pass-through mappings. To pass directly from source to target without any other transformations, connect the Source Qualifier transformation directly to the target. If you use the Getting Started Wizard to create a pass-through mapping, the wizard creates an Expression transformation between the Source Qualifier transformation and the target.

    ***************************************************

    Use integer values in place of other datatypes when performing comparisons usingLookup and Filter transformations. For example, many databases store U.S. ZIP code information as a Char or Varchar datatype. If you convert the zip code datato an Integer datatype, the lookup database stores the zip code 94303-1234 as 94

    3031234. This helps increase the speed of the lookup comparisons based on zip code.

  • 7/28/2019 Informatica DW Consolidated

    31/40

    **************************************************Optimizing aggregator

    Grouping By Simple Columns - use integer valuesFiltering Data Before You AggregateLimiting Port Connections - Limit the number of connected input/output or outputports to reduce the amount of data the Aggregator transformation stores in the

    data cache.Using Sorted InputUsing Incremental AggregationIf you can capture changes from the source that affect less than half the target, you can use incremental aggregation to optimize the performance of Aggregatortransformations.

    When you use incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. The Integration Service updates the target incrementally, rather than processing the entire source and recalculating thesame calculations every time you run the session.

    You can increase the index and data cache sizes to hold all data in memory without paging to disk.

    ************************************************Optimizing Joiner

    Designate the master source as the source with fewer duplicate key values.Designate the master source as the source with fewer rows.Perform joins in a database when possible - Create a pre-session stored procedure to join the tables in a database.

    Use the Source Qualifier transformation to perform the join.Join sorted data when possible - minimize disk input and output.

    ************************************************Optimizing Lookup Transformations

    Native Drivers

    Caching Lookup TablesIf a mapping contains Lookup transformations, you might want to enable lookup caching. When you enable caching, the Integration Service caches the lookup tableand queries the lookup cache during the session. When this option is not enabled, the Integration Service queries the lookup table on a row-by-row basis.

    The result of the Lookup query and processing is the same, whether or not you cache the lookup table. However, using a lookup cache can increase session performance for smaller lookup tables. In general, you want to cache lookup tables thatneed less than 300 MB.

    Use the following types of caches to increase performance:

    Shared cache. You can share the lookup cache between multiple transformations.You can share an unnamed cache between transformations in the same mapping. Youcan share a named cache between transformations in the same or different mappings.

    Persistent cache. To save and reuse the cache files, you can configure the transformation to use a persistent cache. Use this feature when you know the lookuptable does not change between session runs. Using a persistent cache can improv

  • 7/28/2019 Informatica DW Consolidated

    32/40

    e performance because the Integration Service builds the memory cache from the cache files instead of from the database.

    Enabling Concurrent Caches - default Auto

    ***********************************************

    To optimize Sequence Generator transformations, create a reusable Sequence Generator and using it in multiple mappings simultaneously. Also, configure the Number of Cached Values property.

    The Number of Cached Values property determines the number of values the Integration Service caches at one time. Make sure that the Number of Cached Value is not too small. Consider configuring the Number of Cached Values to a value greaterthan 1,000.

    If you do not have to cache values, set the Number of Cache Values to 0. Sequenc

    e Generator transformations that do not use cache are faster than those that require cache.

    **********************************************

    Allocating MemoryFor optimal performance, configure the Sorter cache size with a value less thanor equal to the amount of available physical RAM on the Integration Service node. Allocate at least 16 MB of physical memory to sort data using the Sorter transformation. The Sorter cache size is set to 16,777,216 bytes by default. If the Integration Service cannot allocate enough memory to sort data, it fails the session.

    If the amount of incoming data is greater than the amount of Sorter cache size,the Integration Service temporarily stores data in the Sorter transformation work directory. The Integration Service requires disk space of at least twice the amount of incoming data when storing data in the work directory. If the amount ofincoming data is significantly greater than the Sorter cache size, the Integration Service may require much more than twice the amount of disk space availableto the work directory.

    Use the following formula to determine the size of incoming data:

    # input rows ([Sum(column size)] + 16)

    *********************************************

    Use the Select Distinct option for the Source Qualifier transformation if you want the Integration Service to select unique values from a source. Use Select Distinct option to filter unnecessary data earlier in the data flow. This can improve performance.

    ********************************************

    Transformation errors occur when the Integration Service encounters conversion errors, conflicting mapping logic, and any condition set up as an error, such asnull input. Check the session log to see where the transformation errors occur.If the errors center around particular transformations, evaluate those transform

    ation constraints.

    ********************************************

  • 7/28/2019 Informatica DW Consolidated

    33/40

  • 7/28/2019 Informatica DW Consolidated

    34/40

    ental aggregation in the session properties, the Workflow Manager issues a warning that you cannot enable truncate target tables and incremental aggregation inthe same session.

    Test load: When you enable both truncate target tables and test load, the Integration Service disables the truncate table function, runs a test load session, and writes the following message to the session log:

    **********************************************Java Transformation Examples

    if(isNull("LegacyAssociatedOrgRefno"))

    {IdentifyingType="Parent";

    LegacyAssociatedOrgRefno = LegacyIdentifyingRefno;

    generateRow();

    IdentifyingType="Child";

    LegacyAssociatedOrgRefno = LegacyIdentifyingRefno;

    generateRow();

    IdentifyingType="Self";

    LegacyAssociatedOrgRefno = LegacyIdentifyingRefno;

    generateRow();

    }

    if(isNull("LegacyAssociatedOrgRefno") && HOTYPE.equals("STHA"))

    {LegacyAssociatedOrgRefno=-1;Level=0;generateRow();

    }

    Swaping columns example:

    IdentifyingType = "Parent";

    generateRow();

    IdentifyingType = "Child";int x = LegacyAssociatedOrgRefno;

    int y = LegacyIdentifyingRefno;

  • 7/28/2019 Informatica DW Consolidated

    35/40

  • 7/28/2019 Informatica DW Consolidated

    36/40

    extension. to create a tar tar -cf archive.tar /directory, then to extract thearchive to the current directory run tar -xf archive.tar to use gzip, just add az to the options, to create a tar.gz: tar -czf archive.tar.gz /dir to extract it tar -xzf archive.tar.gz

    headtail

    head 100 | tail 1

    IIF(NOT ISNULL(In_UniqueLegacyTreatmentFunctionIdentifier) AND v_TreatmentFnTypeCode_dom = 'CC_TRTFN', NULL, 'B61411(TreatmentFnTypeCode)')

    IIF(ISNULL(v_UniqueLegacyTreatmentFunctionIdentifier_lkp), NULL, IIF( v_TreatmentFnTypeCode_dom = 'CC_TRTFN', NULL, 'B61411(TreatmentFnTypeCode)'))IIF(ISNULL(v_UniqueLegacySpecialtyIdentifier_lkp), NULL, IIF( v_SpecTypeCode_dom= 'CC_CPSPE', NULL, 'B61413(v_SpecTypeCode)'))IIF(ISNULL(v_UniqueLegacyCurrentSpecialtyIdentifier_lkp), NULL, IIF(v_CurrentSpecTypeCode_dom = 'CC_CPSPE', NULL, 'B61414(v_CurrentSpecTypeCode)'))

    *************************************************

    ********************************************************************************************************************************************************************************

  • 7/28/2019 Informatica DW Consolidated

    37/40

    **********************************************************

    DW**************************************************************

    Data WarehouseData warehouse is an architecture for organizing information system. It is a pro

    cess for building decision support systems and knowledge management enviroment that supports both day-to-day tactical decision making and long-term business strategies.

    Bill Inmon "Subject-oriented, integrated, time variant, non-volatile collectionof data in support of management's decision making process."

    Ralph Kimball "a copy of transaction data specifically structured for query andanalysis."

    Operational Data Store(ODS)

    An operational data store is an integrated, subject-oriented, volatile(includingupdate/deletion), current valued structure designed to serve operational usersas they do high performance integrated processing.

    OLTP(Online Transaction Processing)>OLTP is a class of program that facilitates and manages transaction-oriented applications, typically for data entry and retrieval transaction processing. OLTP systems are optimized for data entry operations. e.g. Order Entry, Banking, CRM,ERP applications etc

    Data Warehouse Operational/TransactionalSubject oriented Application orientedSummarized, refined DetailedRepresents value over time Accurate as of momentSupports managerial needs Supports day-to-day needsRead only data Can be updatedBatch processing Real time transactionsCompletely different life cycle Software Development Life CycleAnalysis driven Transaction drivenDimensional model Entity Relational DiagramLarge amount of data Small amount of dataRelaxed availability High availability

    Flexible structure Static structure

  • 7/28/2019 Informatica DW Consolidated

    38/40

    Business Intelligence (BI)Business Intelligence is a set of business processes for collecting and analyingbusiness information. BI functions include trend analysis, aggregation of data, drilling down to complex levels of detail, slice-dice, data rotation for comparative viewing.

    OLAP(On-Line Analytical Processing) Querying and presenting data from data warehouse exemplifying as multiple dimensions.

    ROLAP(Relational OLAP) Applications and set of user interfaces that retrieve data from RDBMS and present as dimensional model.

    MOLAP(Multidimensional OLAP) Applications, set of user interfaces and databasetechnologies that have dimensional model.DOLAP(Desktop OLAP) Designed for low-end, single user. Data is stored/downloaded on the desktop.HOLAP(Hybrid OLAP) is a combination of all the above OLAP methodologies.

    Top-Down Bottom-Up Hybrid FederatedPractitioner Bill Inmon Ralph Kimball Many practitioners DougHackneyEmphasize Data Warehouse Data Marts DW and data marts Integrate heterogeneous BI environmentsDesign Enterprise based normalized model; marts use a subject orient dimensional model

    Dimensional model of data mart, consists star schema

    Start enterprise and local models; one or more star schemasAn achitecture of architectures; share dimensions, facts, rules,

    definitions across organizationsArchitectMulti-tier comprised of staging area and dependent data martsStaging area and data martsHigh-level normalized enterprise model; initial martsReality of change in organizations and systemsData setDW atomic level data; marts summary dataContains both atomic and summary dataPopulates marts with atomic and summary data via a non-persistent staging area.Use of whatever means possible to integrate business needs

    *****************************************************************E. F. Codd(father of the relational database)'s 12 rules for OLAP

    Multidimensional conceptual view. This supports EIS "slice-and-dice" operationsand is usually required in financial modeling.Transparency. OLAP systems should be part of an open system that supports heterogeneous data sources. Furthermore, the end user should not have to be concerned

    about the details of data access or conversions.Accessibility. The OLAP should present the user with a single logical schema ofthe data.

  • 7/28/2019 Informatica DW Consolidated

    39/40

    Consistent reporting performance. Performance should not degrade as the number of dimensions in the model increases.Client/server architecture. Requirement for open, modular systems.Generic dimensionality. Not limited to 3-D and not biased toward any particulardimension. A function applied to one dimension should also be able to be appliedto another.Dynamic sparse-matrix handling. Related both to the idea of nulls in relational

    databases and to the notion of compressing large files, a sparse matrix is one in which not every cell contains data. OLAP systems should accommodate varying storage and data-handling options.Multiuser support. OLAP systems, like EISes, need to support multiple concurrentusers, including their individual views or slices of a common database.Unrestricted cross-dimensional operations. Similar to rule 6; all dimensions arecreated equal, and operations across data dimensions do not restrict relationships between cells.Intuitive data manipulation. Ideally, users shouldn't have to use menus or perform complex multiple-step operations when an intuitive drag-and-drop action willdo.Flexible reporting. Save a tree. Users should be able to print just what they ne

    ed, and any changes to the underlying financial model should be automatically reflected in reports.Unlimited dimensional and aggregation levels. A serious tool should support at least 15, and preferably 20, dimensions.

    ********************************************************************Dimensional data model is most often used in data warehousing systems. This is different from the 3rd normal form, commonly used for transactional (OLTP) type systems. As you can imagine, the same data would then be stored differently in adimensional model than in a 3rd normal form model.

    Dimension: A category of information. For example, the time dimension. eg: demog

    raphy, date, product, customer, HR etc

    Attribute: A unique level within a dimension. For example, Month is an attributein the Time Dimension.

    Hierarchy: The specification of levels that represents relationship between different attributes within a dimension. For example, one possible hierarchy in theTime dimension is Year ? Quarter ? Month ? Day.

    Dimensional Data Model: Dimensional data model is commonly used in data warehousing systems. This section describes this modeling technique, and the two commonschema types, star schema and snowflake schema.

    Fact Table: A fact table is a table that contains the measures of interest. Forexample, sales amount would be such a measure. This measure is stored in the fact table with the appropriate granularity. For example, it can be sales amount bystore by day. In this case, the fact table would contain three columns: A datecolumn, a store column, and a sales amount column.

    Attributes are the non-key columns in the lookup tables.

    Lookup Table: The lookup table provides the detailed information about the attributes. For example, the lookup table for the Quarter attribute would include a list of all of the quarters available in the data warehouse. Each row (each quarter) may have several fields, one for the unique ID that identifies the quarter,

    and one or more additional fields that specifies how that particular quarter isrepresented on a report (for example, first quarter of 2001 may be represented as "Q1 2001" or "2001 Q1").

  • 7/28/2019 Informatica DW Consolidated

    40/40

    STAR SCHEMA

    All measures in the fact table have the same level of granularity.complex star can have more than one fact table.

    A star schema is characterized by one OR more very large fact tables that contain the primary information in the data warehouse, and a number of much smaller dimension tables (OR lookup tables), each of which contains information about theentries for a particular attribute in the fact table.

    A star query is a join between a fact table and a number of dimension tables. Each dimension table is joined to the fact table using a primary key to foreign key join, but the dimension tables are not joined to each other.

    The main advantages of star schemas are that they:

    Provide a direct and intuitive mapping between the business entities being analy

    zed by end users and the schema design.Provide highly optimized performance for typical star queries.Are widely supported by a large number of business intelligence tools, which mayanticipate OR even require that the data-warehouse schema contains dimension tables

    The snowflake schema is an extension of the star schema, where each point of thestar explodes into more points. In a star schema, each dimension is representedby a single dimensional table, whereas in a snowflake schema, that dimensionaltable is normalized into multiple lookup tables, each representing a level in the dimensional hierarchy.

    the Time Dimension that consists of 2 different hierarchies:

    1. Year ? Month ? Day2. Week ? Day

    Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one large table. For example, a location dimension table in a star schema might be normalized intoa location table and city table in a snowflake schema. While this saves space, it increases the number of dimension tables and requires more foreign key joins.The result is more complex queries and reduced query performance. Figure above presents a graphical representation of a snowflake schema.

    Fact Constellation SchemaThis Schema is used mainly for the aggregate fact tables, OR where we want to split a fact table for better comprehension. The split of fact table is done onlywhen we want to focus on aggregation over few facts & dimensions.

    ********************************************************************