Pentaho Data Integrator – Kettle along with Talend is one of the most famous Open Source ETL tool. KETTLE ( k- kettle, E- extract, T- Transform, T-Transport, L-Load, E-Environment).

However, it also does come in two variations i.e. Community version (free) and Enterprise version (paid). In this blog, we would try to cover what all are the additional features which are present in the Paid version.

 

–          Access Control:  Using this, admin can manage and control who all will have the right to create, modify and delete PDI transformations and jobs.

PDI Repository

 

–          Version Control:  This feature allows an admin to manage different jobs, managing and tracking their versions and transformations as well. It saves multiple copies and revisions of the job, and hence can take care of any deletion by mistake. Restoration can be, thus done, very easily.

PDI Repository Explorer

 

–           Production Control:  scheduling and monitoring jobs on a centralized server. Person can check the job and execute it without modifying. Scheduling and running the transformation at pre-defined time also possible.

PDI access controlPDI Job Scheduler

 

–          Integrated Security:  Presence of this component in PDI EE helps in recognizing users and roles from external corporate systems (like Spring Security Framework)

–          Content Management Repository:  This component is also present only in PDI EE version, it stores multiple versions of content and applying rules to content access (Apache JackRabbit)

–          Integrated Scheduling:  executing jobs on a centralized server at predetermined intervals (Quartz) is taken care by Integrated scheduling component present in the EE version.

 

 

Security Feature necessity:

–          In case if there are multiple users with different access right (like business, team leader, developer, operation etc), in those cases security feature helps in restricting the transformations. This user level security can be done via LDAP or CAS

 

Repository Feature

–          If there are multiple ETL developers who are working on the project, then it’s really important to have a repository sort of feature, so that every developer will have his own folder. Other developers can have read access to other ETL developers folder, that means they can only execute but cant modify

–          Repository also helps in maintaining the version information and copies as well

–          In CE version of ETL, the ETL copy will have to be saved separately since it dosent have repository feature

 

Data Integration Server Feature

–          This helps in remote deployment and monitoring of the transformation

–          You don’t need to be loggedin to execute the jobs

Data integration server also includes a scheduler that lets setup recurring schedules. Simple GUI allows to define things like start date, end date, logging level, frequency and when to repeat job execution, logging levels etc.

Please get in touch for more information at [email protected]

1 comment

  1. Surya

    Talend (or Pentaho PDI aka Kettle) is what you need if you intend to do sorieus heavy lifting of data, but to be fair to Apatar, their product has improved a lot since I wrote this (including the addition of an Excel component!). Even so, even for simple’ mashups, I find Talend a more rewarding tool to use, particularly now that I’ve rediscovered Groovy as a scripting language for Talend.Tom

Leave a Reply