TalendOpenStudio BigData ReleaseNotes 5.4.1 En

download TalendOpenStudio BigData ReleaseNotes 5.4.1 En

of 6

Transcript of TalendOpenStudio BigData ReleaseNotes 5.4.1 En

  • 8/9/2019 TalendOpenStudio BigData ReleaseNotes 5.4.1 En

    1/12

    Talend Open Studio

    for Big DataRelease Notes

    5.4.1

  • 8/9/2019 TalendOpenStudio BigData ReleaseNotes 5.4.1 En

    2/12

    Talend Open Studio for Big Data

    Publication date December 12, 2013

    Copyleft

    This documentation is provided under the terms of the Creative Commons Public License (CCPL).

    For more information about what you can and cannot do with this documentation in accordance with the CCPL,please read: http://creativecommons.org/licenses/by-nc-sa/2.0/ 

    Notices

    All brands, product names, company names, trademarks and service marks are the properties of their respective

    owners.

    http://creativecommons.org/licenses/by-nc-sa/2.0/

  • 8/9/2019 TalendOpenStudio BigData ReleaseNotes 5.4.1 En

    3/12

    Table of Contents

    System Requirements .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    Big Data: New Features .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1. Kerberos security .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    2. Upgraded support for Hadoop

    distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    3. Hadoop file formats .. . . . . . . . . . . . . . . . . . . . . . . . . . . 24. File management in HDFS .. .. .. .. .. .. .. .. .. .. . 2

    5. NoSQL databases .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    6. In-memory technology .. . . . . . . . . . . . . . . . . . . . . . . . 3

    7. Cloud technology .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    8. Demo project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    9. Other features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    Big Data: Bug Fixes / Change Log .. .. .. .. .. .. .. .. .. .. .. . 4

    1. Bug Fixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    Big Data: Known Issues .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    1. Studio multi-instance starting issue .. . . .. . . .. . . 5

    2. Note for the developers of custom

    components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    Big Data: Hints and Notes .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    1. Installing required third-party licences . . .. . . .. . 7

    Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    1. Talend Help Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    2. Revised documents .. . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    3. Known issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    4. Open issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

  • 8/9/2019 TalendOpenStudio BigData ReleaseNotes 5.4.1 En

    4/12

    System Requirements

    1

    System RequirementsUsers should refer to the Installation and Upgrade Guides on the Talend Help Center (http://help.talend.com) for

    more information on Installation and System Requirements.

    http://help.talend.com/

  • 8/9/2019 TalendOpenStudio BigData ReleaseNotes 5.4.1 En

    5/12

    Big Data: New Features

    2

    Big Data: New Features

    1. Kerberos security1. The Kerberos kinit authentication mode has been enabled for all the Big Data components, including the Hive

    components.

    2. Except to the HBase ones, the Kerberos keytab authentication mode has been added to all the Big Data

    components.

    2. Upgraded support for Hadoop distributions

    1. New versions of the following Hadoop distributions are supported:

    • Hortonworks Data Platform 1.3 and 2.0

    • Cloudera 4.3 and 4.4

    • MapR 2.1.3 and 3.0.1

    2. EMC Pivotal is now available.

    3. Hadoop file formats

    Support for Sequencefile, RC, ORC and Avro has been added to several components:

    1. The tHiveCreateTable and the tHiveLoad components are created. They support not only a wide range of 

    commonly used file formats such as Sequencefile, RC, ORC and Avro, but also the formats that are not officially

    supported by Talend.

    2. In addition to their existing functions, tPigLoad and tPigStoreResult can now process a Sequencefile, RC or

    Avro file.

    4. File management in HDFS

    1. The tSqoopMerge component has been created for merging two datasets with newer records overwirting the

    older ones..

    2. Upgrade of HDFS components

    • The tHDFSCopy component can now merge the part files generated at the end of a MapReduce computation.

    • The input and the output components are enabled to handle header rows.

    • The tHDFSInput component can read sub-directories of a specified directory.

  • 8/9/2019 TalendOpenStudio BigData ReleaseNotes 5.4.1 En

    6/12

    Big Data: New Features

    3

    5. NoSQL databases

    1. The following components have been created to enable transactions with their related NoSQL databases:

    • tCassandraBulkLoad, tCassandraOutputBulk, tCassandraBulkExec and tCassandraOutputBulkExec

    • tMongoDBBulkLoad

    • The Riak components

    2. The 2.4 and the 2.5 versions of MongoDB are now supported by its related components.

    6. In-memory technology

    1. The newly added SAP Hana components help users easily configure the connection to a SAP Hana system and

    process transactions with this in-memory computing platform.

    7. Cloud technology

    1. With the addition of support for Amazon S3 (Simple Storage Service), users can use dedicated components to

    perform transactions with this data storage service.

    2. GS (Google Storage) components are now available for users to perform interactions with Google Storage and

    prepare their data before transferring the data to Google BigQuery.

    8. Demo project

    1. A Big Data demo project is provided with the Studio. The project includes a number of easy-to-use sample

    Jobs to help familiarize users with the various features and functions of Talend Studio with Big Data.

    9. Other features1. Support for OAuth2 security has been added to the Salesforce components.

    2. With the addition of support for Amazon S3 (Simple Storage Service), users can use dedicated components to

    perform transactions with this data storage service.

    3. The Vertica components now officially support Vertica 5.1 and Vertica 6.0.

  • 8/9/2019 TalendOpenStudio BigData ReleaseNotes 5.4.1 En

    7/12

  • 8/9/2019 TalendOpenStudio BigData ReleaseNotes 5.4.1 En

    8/12

    Big Data: Known Issues

    5

    Big Data: Known IssuesWe encourage you to consult the JIRA bug tracking tool for a full list of open issues:

    https://jira.talendforge.org/secure/IssueNavigator.jspa?requestId=16599

    Note that this list shows issues from both Talend's Community and Subscription products.

    1. Studio multi-instance starting issue

    If you are using the open source version of the Studio and have tried to launch it twice or even more at the same

    time, the Studio might not be able to restart any more after you close all of its instances.

    2. Note for the developers of customcomponents

    A new finally component template such as tFileOutputDelimited_finally.javajet has been created for processing

    the finally block. This change might provoke code compilation errors of a custom component when this component

    has been migrated to 5.4.1 and is used there to process multiple outputs.

    Issue diagnostic:

    A custom components subject to this issue is typically developed with either of the following practices:

    1. This custom component is written to open a try block in the begin part and close it in the end  part.

    2. This custom component is based on a duplicate of any of the following components released between 4.2.3

    (exclusive) and 5.4.1 (exclusive).

    • tFileOutputDelimited

    • tSAPOutput

    • tBigQueryOutputBulk, tCassandraOutput, tHBaseOutput, tMongoDBOutput, tMongoDBWriteConf,

    tNeo4jOutput, tNeo4jOutputRelationship, tNeo4jRow, tRiakOutput

    • tAccessOutputBulk, tBonitaInstantiateProcess, tGreenplumOutputBulk, tInformixOutputBulk,tIngresOutputBulk, tMSSqlOutputBulk, tMomOutput, tMysqlOutputBulk, tOracleBulkExec,

    tOracleOutputBulk, tParAccelOutputBulk, tPivotToColumnsDelimited, tPostgresPlusOutputBulk,

    tPostgresqlOutputBulk, tSalesforceOutputBulk, tSybaseOutputBulk, tVerticaOutputBulk 

    • tGenKeyHadoopIn, tGenKeyHadoopOut, tMatchGroupHadoopIn, tMatchGroupHadoopOut

    • tCollector, tDepartitioner, tPartitioner, tRecollector

    Recommended solution:

    1. Remove any try, catch or finally blocks from your begin and end  parts.

    2. Put any resources that you will need to use in your finally code in the new resourceMap variable. For example,

    resourceMap.put("resources_tFileOutputDelimited_1",object);

    https://jira.talendforge.org/secure/IssueNavigator.jspa?requestId=16599

  • 8/9/2019 TalendOpenStudio BigData ReleaseNotes 5.4.1 En

    9/12

    Big Data: Known Issues

    6

    3. Create a  finally code template which will then be able to use objects from the resourceMap variable and

    close connections.

    The following links present a complete example for implementing this solution:

    1. Modification of begin.javajet:

    http://talendforge.org/trac/tos/changeset/111049#file13.

    2. Modification of end.javajet:

    http://talendforge.org/trac/tos/changeset/111049#file14.

    3. Addition of a finally part:

    http://talendforge.org/trac/tos/browser/trunk/org.talend.designer.components.localprovider/components/ 

    tSAPOutput/tSAPOutput_finally.javajet?rev=111049.

    http://talendforge.org/trac/tos/browser/trunk/org.talend.designer.components.localprovider/components/tSAPOutput/tSAPOutput_finally.javajet?rev=111049http://talendforge.org/trac/tos/browser/trunk/org.talend.designer.components.localprovider/components/tSAPOutput/tSAPOutput_finally.javajet?rev=111049http://talendforge.org/trac/tos/changeset/111049#file14http://talendforge.org/trac/tos/changeset/111049#file13

  • 8/9/2019 TalendOpenStudio BigData ReleaseNotes 5.4.1 En

    10/12

    Big Data: Hints and Notes

    7

    Big Data: Hints and Notes

    1. Installing required third-party licencesUsers must install certain required third-party libraries for all Talend products to work correctly. These libraries

    can be installed via the Modules View.

  • 8/9/2019 TalendOpenStudio BigData ReleaseNotes 5.4.1 En

    11/12

    Documentation

    8

    Documentation

    1. Talend Help CenterFind out more about how to get the most out of your Talend products on the Talend Help Center: http:// 

    help.talend.com.

    New articles for this release include:

    • A Knowledge Base article providing a full list of the different Map/Reduce components: https://help.talend.com/ 

     pages/viewpage.action?pageId=22525540

    2. Revised documentsIn addition to updates to the content across the documentation set, the following specific documentation changes

    have been made.

    • Talend Open Studio for MDM User Guide now includes parts describing how to work with the Integration and

    Profiling perspectives, as well as the  MDM  perspective. This guide merges the information contained in the

    Talend Open Studio for Data Integration User Guide and the Talend Open Studio for Data Quality User Guide

    with the previous standalone Talend Open Studio for MDM User Guide.

    • Talend Big Data Studio Getting Started Guide has been renamed to Talend Big Data Getting Started Guide.

    • A new chapter "Getting started with Talend Big Data using the demo project" has been added to the Talend Big Data Studio Getting Started Guide. This chapter provides short descriptions about the sample Jobs included in

    the demo project and introduces the necessary preparations to run the sample Jobs on a Hadoop platform.

    • Talend Open Studio for ESB Mediation Components Reference Guide and Talend ESB Mediation Components

     Reference Guide have been merged into one guide, Talend ESB Mediation Components Reference Guide.

    • In the ESB Getting Started Guide, the chapter "Downloading and installing Talend ESB software" is now called

    "Getting started with Talend ESB", and the demo chapters are now split into two categories ("Basic deployment

    and runtime use cases" and "Advanced deployment and runtime use cases with SOA Governance").

    • In the ESB Infrastructure Services Configuration Guide and the STS User Guide, some conceptual information

    has been added that was previously found in the ESB Getting Started Guide.

    3. Known issues

    In the Talend ESB Mediation Components Reference Guide, the documentation for the cMap component does not

    specify that this component is only available with Talend Platform products.

    4. Open issues

    We encourage you to consult the JIRA bug tracking tool for a full list of open issues:

    https://help.talend.com/pages/viewpage.action?pageId=22525540https://help.talend.com/pages/viewpage.action?pageId=22525540https://help.talend.com/pages/viewpage.action?pageId=22525540http://help.talend.com/http://help.talend.com/

  • 8/9/2019 TalendOpenStudio BigData ReleaseNotes 5.4.1 En

    12/12

    Documentation

    9

    https://jira.talendforge.org/secure/IssueNavigator.jspa?requestId=16604

    https://jira.talendforge.org/secure/IssueNavigator.jspa?requestId=16604