Pentaho Data Integrator – Kettle along with Talend is one of the most famous Open Source ETL tool. KETTLE ( k- kettle, E- extract, T- Transform, T-Transport, L-Load, E-Environment).
However, it also does come in two variations i.e. Community version (free) and Enterprise version (paid). In this blog, we would try to cover what all are the additional features which are present in the Paid version.
– Access Control: Using this, admin can manage and control who all will have the right to create, modify and delete PDI transformations and jobs.
– Version Control: This feature allows an admin to manage different jobs, managing and tracking their versions and transformations as well. It saves multiple copies and revisions of the job, and hence can take care of any deletion by mistake. Restoration can be, thus done, very easily.
– Production Control: scheduling and monitoring jobs on a centralized server. Person can check the job and execute it without modifying. Scheduling and running the transformation at pre-defined time also possible.
– Integrated Security: Presence of this component in PDI EE helps in recognizing users and roles from external corporate systems (like Spring Security Framework)
– Content Management Repository: This component is also present only in PDI EE version, it stores multiple versions of content and applying rules to content access (Apache JackRabbit)
– Integrated Scheduling: executing jobs on a centralized server at predetermined intervals (Quartz) is taken care by Integrated scheduling component present in the EE version.
Security Feature necessity:
– In case if there are multiple users with different access right (like business, team leader, developer, operation etc), in those cases security feature helps in restricting the transformations. This user level security can be done via LDAP or CAS
– If there are multiple ETL developers who are working on the project, then it’s really important to have a repository sort of feature, so that every developer will have his own folder. Other developers can have read access to other ETL developers folder, that means they can only execute but cant modify
– Repository also helps in maintaining the version information and copies as well
– In CE version of ETL, the ETL copy will have to be saved separately since it dosent have repository feature
Data Integration Server Feature
– This helps in remote deployment and monitoring of the transformation
– You don’t need to be loggedin to execute the jobs
Data integration server also includes a scheduler that lets setup recurring schedules. Simple GUI allows to define things like start date, end date, logging level, frequency and when to repeat job execution, logging levels etc.
Please get in touch for more information at [email protected]