This blog will be talking about Talend Open Studio (TOS) versus Pentaho Data Integrator (Kettle)

Talend Open Studio (TOS) and Pentaho Data Integration (Kettle) are two comprehensive and widely used Open Source ETL tools.

Pentaho Kettle

  1. The company started around 2001 (2002 was when kettle was integrated into it).
  2. It has a stand-alone java engine that processes the jobs and tasks for moving data between many different databases and files.
  3. It can schedule tasks (but you need a scheduler for that – cron).
  4. It can run remote jobs on “slave servers” on other machines.
  5. It has data quality features: from its own GUI, writing more customised SQL queries, Java script and regular expressions.
  6. (Spoon) it is easy to build Data Integration procedures. The procedures can be run by Kettle runtime in different ways: Using the command line utility (Pan), a small server (Carte), a database repository (Kitchen) or directly from the IDE (Spoon). Procedures are saved in XML files and interpreted by a Java library which is required to run the ETL tasks.
  7. Kettle is an interpreter of ELT procedures saved in XML format.
  8. Kettle comes with a graphical tool which is very intuitive and helps the entire ETL process from the design to test and deployment.
  9. Kettle IDE is slightly easier to start with but also less comprehensive
  10. Kettle is more flexible on this and the ETL procedures can be built quickly.
  11. Pentaho Data Integration (Kettle) is Java (Swing) application and library. Kettle is an interpreter of procedures written in XML format. The features and components are a little less compressive than Talend ones, however this doesn’t restrict the complexity of the ETL procedures that can be implemented. Kettle provides a JavaScript engine (as well as a Java one) to fine tune the data manipulation process.
  12. Kettle is also a good tool, with everything necessary to build even complex ETL procedures.
  13. Kettle is an interpreter of ETL procedures written in XML format. Kettle provides a Java or JavaScript engine to take control of data processing.
  14. Kettle (PDI) is the default tool in Pentaho Business Intelligence Suite. The procedures can be also executed outside the Pentaho platform, provided that all the Kettle libraries and Java interpreter are installed.
  15. Kettle makes it is easy to deploy procedures in clustered environments and save them in a database table.

Talend TOS

  1. It started around October 2006
  2. It has a much smaller community then Pentaho but has 2 finance companies supporting it.
  3. It generates java or perl code which you can later run on your server.
  4. It can schedule tasks (also with using schedulers like cron).
  5. It has data quality features: from its own GUI, writing more customised SQL queries and Java.
  6. Talend uses a user friendly and comprehensive IDE (similar to Pentaho Kettle’s) to design the procedures. Procedures can be tested on the IDE and compiled in Java code. The Java generated code can be modified to achieve greater control and flexibility.
  7. Talend Open Studio is a Java code generator tool.
  8. Talend comes with a graphical tool which is very intuitive and helps the entire ETL process from the design to test and deployment.
  9. Talend Open Studio is steeper; however its flexibility and power greatly compensate the first impact.

10. Talend Open Studio requirements are to define the correct schema of the data to be processed and the IDE helps a lot on this task. Anyway metadata definition in Talend is an important feature and helps the maintainability and reliability of the procedure when deployed in production.

11. Talend Open Studio is an Eclipse based Java tool. The procedures are then compiled in Java bytecode during the deployment, this means that the entire Java ecosystem can be potentially used.

12. Components and features are numerous, mixing both general purpose tools and very specific components. Talend provides vendor specific sets of RDBMS, NoSQL, and Big Data components among generic ones; this approach enables the support to both vendor specific features and generic database features.

13. Talend the full Java ecosystem can be used and it’s easy to use vendor specific database features.

14. As a code generator tool Talend Open Studio translates procedures in compact and fast Java.

15. Talend Open Studio (TOS) is a generic ETL and Data Management tool also integrated in the SpagoBI and Jasper Server BI platforms. Procedures are compiled in small Java packages, easily deployable and run able in any Java enabled environment.

 

 

Both ETL Tools

  1. Talend and Pentaho offers some of the most deployed Open Source ETL tools, used in several mission critical implementations.
  2. Talend and Pentaho have strong community support are healthy, well known companies. Open Source Business Intelligence is growing fast and real world applications are widespread.
  3. Talend Open Studio and Pentaho Kettle are both user friendly, well documented and have a strong community support. Talend Open Studio requires more initial effort to get started however its great potential is highly appreciated from the beginning.
  4. The interpreted nature of Kettle makes it sometimes slower in some tasks compared to Talend.
  5. Talend is a single threading code generator (Java or Perl), Kettle uses a metadata driven multi-threaded engine. So it’s your choice: either debug generated Java code (Talend) or debug a graphical data flow (Kettle).
  6. Pentaho Kettle is very easy to use and a good solution in Pentaho environments. Talend is a more general propose Data Management platform that can be used in conjunction with its Talend ESB, Talend Data Quality and Talend MDM companions.
  7. Pentaho is faster (twice as fast maybe) then Talend.
  8. Talend is more a tool for people who are making already a Java program and want to save lots and lots of time with a tool that generates code for them.

Talend uses Eclipse as GUI; PDI uses SWT (which Eclipse uses). Depending on how you look at it both have advantages/disadvantages. PDI has a more different look and feel for Eclipse users, but PDI will have fewer problems with Eclipse versions/upgrades.

Vishwanth Surapraju

Helical IT Solutions

1 comment

  1. sai dileep

    good one!. It would be great if you bring few more differentiates using any use cases.

Leave a Reply