Audit Logs in Pentaho Data Integration (PDI)

Audit Logs in Pentaho Data Integration

Audit Logs at Job level and Transformation Level are very useful for ETL projects to track the details regarding Job name, Start Date, End Date, Transformation Name, Error,Number of Lines Read, Number of Line Write, Number of lines from Input, Number of Lines in output etc.

I have created a simple PDI Job to track audit logs at Job level, Job entry level and Transformation Level in Mysql database. Please find the process as follows:

Created a Kettle Job FileToDB.kjb which contains two job entries (Start and Transformation) and one database connection(PSQL_Log). I have created a separate database ‘PDI_Audit_Logs’ in Mysql DB. I configured this DB details as database connection.


Right click on the work area and select Job settings as shown in below screen shot.


It will pop up a Job Properties window. Click on Log tab. Select ‘Job entry log table’ option, it will display the variable names under this section. You can enable all the variable are if you consider default variables is also fine. Now configure the log table connection as shown in below screen shot:


Log Connection–> We have already created this connection.(In this case MYSQL DB)

Log Schema–>Existed Database Name

**Log Table–> Table Name (in which you want to store the tracked details). It can be existed table name.  We will see the process for new table in the later steps.

Log line timeout–>Number of days that you want to store the log details.

**If you are creating the log table first time you have to click on SQL button which is at the below otherwise click on Ok button. So SQL button will pop up SQL statement to create this table with the above shown column names as . I gave log table name as “FileToDB_JobEntryLog” My job name and level of logging.


click on Execute button, it will create that table in the given schema/database. And will prompt below window. click on OK and then click on close of SQL editor.


So Job Entry level Log configuration is done. Now select Job Log table section to configure the log connection for Job level. Configure the values as shown in below Screen shot.

And for Log Table Name, same steps have to follow as explained above .


Now Job Audit Logs settings are done. Open the transformation. it has system info, file input and DB output steps connected with Hops. And the DB output schema is different and Audit Logs schema is different So we have created two connection PSQL and PSQL_Log.


Right click on the work area and select Transformation Settings.


It will pop up Transformation Properties window. Select Logging tab.And then click on Transformation option. It will display the variables that can be tracked from this option.


For the fields LINES_READ,LINES_WRITTEN,LINES_UPDATED,LINES_INPUT,LINES_OUTPUT,LINES_REJECTED should be given a step names that it should track from which step accordingly.

Same like Job settings here also you have to configure Log connections and click on SQL button if it is first time to create the Log Table Name otherwise click on OK.

Now Audit Logs created for transformation also. So execute the Job. And check in the database for these Log tables. It will create 3 tables (job Entry level, job level, Transformation level ) under the given schema.

Job Entry Level Table:

As there are two Job entries it generated the details for both the job entries


Job Level Log Table:


Transformation Level Log Table:


Hope this topic will be useful.

Thank you.

Define JNDI for Pentaho BA server 5.0

Define JNDI for Pentaho BA server 5.0

Note: For illustration I’m showing Oracle 11g configuration. Replace the Resource name, username, password, driverClassName, and url parameters, or any relevant connection settings.

Add Driver

Place appropriate driver for Oracle 11g which is ojdbc6- to this directory for the BA Server: /pentaho/server/biserver-ee/tomcat/lib/.

Note: There should be only one driver for one database in this directory. Ensure that there are no other versions of the same vendor’s driver in this directory. If there are, back up the old driver files and remove them to avoid version conflicts.

Specify JNDI connections

  1. Stop server & BA server.
  2. Edit the /tomcat/webapps/pentaho/WEB-INF/web.xml file.
  3. At the end of the <web-app> element, in the same part of the file where you see <!– insert additional resource-refs –>, add this XML snippet.
  4. myDatasource
    jdbc/ myDatasource

  5. Save & Close web.xml
  6. Open & modify this file /tomcat/webapps/pentaho/META-INF/context.xml.
  7. Add this XML snippet anywhere in context.xml
  8. Save & Close context.xml
  9. Open simple jndi
  10. Open & Modify this file biserver-ce/pentaho-solutions/system/simple-jndi/
  11. Add these lines
  12. myDatasource /type=javax.sql.DataSource
    myDatasource /driver= oracle.jdbc.OracleDriver
    myDatasource /url= jdbc:oracle:thin:@//localhost:1521/sysdba
    myDatasource /user=dbuser
    myDatasource /password=password

  13. Save & Close
  14. Delete the pentaho.xml filed located in the /tomcat/conf/catalina/directory.
  15. Start tomcat & BA server.

Creating Report In iReport using Linear Gauge as component

This blog will teach reader how to create report in ireport using linear gauge as component and publishing it on the jasper server.

Purpose : to compare the avg(salary) of male and female employee in an organization

Database server : – postgesql

Database name : foodmart

Table name : employee

Below are the steps :

# 1 : Create two datasets named “MaleSlary” &  “FemaleSalary” for calculating the avg(salary) for

Male and female respectively:

Dataset 1(MaleSalary) :  select gender,avg(salary) from employee where gender like ‘M’group by


Dataset 2(FemaleSalary) : select gender,avg(salary) from employee where gender like ‘F’group by


# 1:- Drag and drop two “linear gauge “ as widget type from WidgetPro Palette chart in ireport

#2 :- Add the above datasource for widget1 as MaleSalary and widget2 as FemaleSalary

#3:- right click on the widget chart -> Edit Widget Peoperties

linear gauge jasper report

Here for each tab in the properties we can customize our widget visualization.

Example : Suppose we need to add % symbol after the Widget Pointer value,then in that case

We need to go to the Advanced Properties of Widget Configuration and add

Property Name : number suffix and Value Expression : ‘%’.

linear gauge jasper report 2

Example 2 : Suppose we need to add the Color Ranges For the Widget then in the widget properties,

Color Range Option is there, we just only have to give our condition.


# 4:- After Publishing the report int jasper server , the report will look like below :

linear gauge in iReport


Helical IT Solutions


Make Batch count for Incremental loading in Talend (TOS_5.3)

This blog will talk about how to make Batch count for Incremental loading in Talend (TOS_5.3).

First all we have t_source and t_target tables
Both Tables have (t_source,t_target) data like this….



INSERT into t_source

We inserted one record into t_source
Insert into t_source(id,name,city,s_date) values (111,’vic’,’del’,’2014-03-01 01:02:00′)
UPDATE from t_source
We updated from t_source
Update t_source set name=’don’,s_date=’2014-02-01 01:02:00′ where id = 109;
DELETE from t_source
We deleted from t_source
Delete from t_source where id = 108;

Finally we have records from t_source table and t_target tables


We want make Batch count in TALEND(TOS)

We created one job…


Details of Job

Subjob (1)

We are fetched max(id) from t_target table and we updated into property variable

context.last_id =;

Subjob (2)

We are fetching min (id) and max (id) from t_source and we updated into property variables

context.sr_max_id = input_row.max_id;

context.sr_min_id = input_row.min_id;

Subjob (3)

we are selecting from t_source

select * from t_source where  id > “+context.last_id+” order by id

and insert into t_target table by primary key is id


we need to count between primary key from t_source

select count(*) as batch_count from t_source where id between “+context.sr_min_id+” and “+context.sr_max_id+”

and updated into property variable. We want to calculate Batch count

We will define by divide count (context.MT_COUNT = 5)  . context.max_count, context.min_count is 0 before Execution of job.

context.count = input_row.count;

System.out.println(“Count of primary key from source “+context.UPLOAD_FILE_NAME+” Table : “+context.count);


int x = (context.count / context.MT_COUNT) + 3;

context.batch_count = x;

System.out.println(“Batch Count : “+context.batch_count);

context.max_count = context.MT_COUNT;


context.min_count = context.sr_min_id ;

context.max_count = context.sr_min_id + context.max_count;

SubJob (5)

We will iterate by context.batch_count. We have another job(test123) by Iterating.

1.Test123 Job

test123a.       SubJob(5.1)

We are printing Batch count min_count to max_count

System.out.println(“Batch “+Numeric.sequence(“s1″,1,1)+”: Count of “+context.min_count+” to “+context.max_count);    

b.      Subjob(5.2)

We are selecting from t_source between primary key

select * from t_source  where id >= “+context.min_count+” and id <= “+context.max_count+” order by id

and collects data into Buffer output

c.       SubJob (5.3)

We compared by inner join from Buffer input(t_source) and t_target tables in tmap. If any reject output will be there then updated into t_target.

T_target sql query: select * from t_target  where id >= “+context.min_count+” and id <= “+context.max_count+” order by id

d.      SubJob (5.4)

We compared by left outer join from t_target and Buffer input(t_source) in tmap. We filtered == 0 and if any output is there then deleted

T_target sql query: select * from t_target  where id >= “+context.min_count+” and id <= “+context.max_count+” order by id

And we have t_javarow(Min, Max of  p_key)

In that,

context.min_count =;

context.max_count = context.min_count + context.MT_COUNT;


We Executed the job by defined (property variables)MT_COUNT = 5.



Finally we have records from t_source and t_target.


Thanks & regards

Vishwanth suraparaju

Senior ETL Developer

Overcome Jasper Reports Complexities using PLPGSQL

This document is to help users to overcome jasper reports complexities using PLPGSQL (Postgres Stored Procedure function). Will look into the complexity with input controls of type multi-select query and how to cascade them.

Creating a stored procedure function in plpsql, as we are dealing with multi-select query we need to create a stored procedure function with Array , there is also function VARDIC available in plpgsql but using Array is much easier.

Complexity 1:

Example: How to create a stored procedure function for type multi-select query input control.

CREATE OR REPLACE FUNCTION multiple_m1(text[])

RETURNS TABLE(comp_id integer, comp_name text, comp_sort text) AS



IF  ‘Khosla’ = ANY($1) THEN

    RETURN QUERY SELECT id::int as comp_id, ltrim(company.compname)::text as compname, ltrim(company.compname)::text as sort

    FROM company where deleteflag = ‘0’ and active = ‘1’

    and id in (270,394,376,396,403);


    RETURN QUERY (SELECT id::int as comp_id,ltrim(company.compname)::text as compname, ltrim(company.compname)::text as sort

    FROM company where deleteflag = ‘0’ and active = ‘1’

    UNION SELECT -1::int AS id, ‘All’::text AS compname, ‘ ‘ as sort) order by sort;


    RETURN QUERY SELECT id::int as comp_id,ltrim(company.compname)::text as compname, ltrim(company.compname)::text as sort

    FROM company where deleteflag = ‘0’ and active = ‘1’

    and (compname like ANY($1) or ” like ANY($1));




$$ LANGUAGE plpgsql


How to call function in Input Control?

Select * from company_multiple_m1 (‘{“”}’);

{“”} = NULL.

Create another Input Control of type Multi-Select Query. Here is another I/P control where comp_id column value of “multiple_m1” (Input control) is the input of my 2nd I/P Control. So My 2nd I/P control runs based on the comp_id value of 1st I/P control. This is nothing but cascading.


CREATE OR REPLACE FUNCTION mulitple_m2(integer, integer[])


RETURNS TABLE(id integer, reference integer, job_title text, status text) AS






IF -1 = $1 THEN


RETURN QUERY select, jobs.reference::int, jobs.job_title::text, ltrim(substring(jobs.status,3,char_length(jobs.status)))::text as status 


FROM jobs ,company c


where jobs.”DeleteFlag” = ‘0’


and and c.DeleteFlag = ‘0’ and = ‘1’


and ( = ANY($2) OR ‘-1’= ANY($2))


order by jobs.job_title;






RETURN QUERY select, jobs.reference::int, jobs.job_title::text, ltrim(substring(jobs.status,3,char_length(jobs.status)))::text as status


from jobs ,company c


where jobs.”DeleteFlag” = ‘0’ and jobs.status = ‘1: Open Req’


and and c.DeleteFlag= ‘0’ and = ‘1’


and ( = ANY($2) OR ‘-1’ = ANY($2))


order by jobs.job_title;






RETURN QUERY select, jobs.reference::int, jobs.job_title::text, ltrim(substring(jobs.status,3,char_length(jobs.status)))::text as status


from jobs ,company c


where jobs.”DeleteFlag” = ‘0’ and jobs.status = ‘0: Inactive’


and and c.DeleteFlag= ‘0’ and = ‘1’


and ( = ANY($2) OR ‘-1’= ANY($2))


order by jobs.job_title;







$$ LANGUAGE plpgsql

Cascading value of Parameter 1 into Parameter 2?

SELECT * FROM mulitple_m2($1,$2);

SELECT * FROM mulitple_m2($1,ARRAY[$P!{mulitple_m1}]::integer[]);


Complexity 2:

If let’s say your Input Control off type Multi-Select Query value is String and if this value is passed on your SQL Query. Your SQL Query does not works with $X{IN,}

$X{IN,column_name,parameter} does not works with plpgsql. It throws Error Message saying that Column “Column_name” does not exist.

Hence $X {} does not work in Jasper server when dealing with Stored Procedure.

Another work around is to pass $P{} in your SQL instead of $X{} and it throws an Error saying that “parameter type not supported in query : parameter_name class java.util.Collection”. Hence $P can be used.

Another work around is to pass $P{} with exclamation as $P!{parameter} in your SQL instead of $X{} and it throws an Error saying “org.postgresql.util.PSQLException:  syntax error at or near “[” ”

What is that Error? Where is that Syntax Error?

The reason behind this is your jasperserver passing String Values like [AJ, alexandra, Amanda, Amy, andrea, Angie, API]. Your SQL statement does not accept string values without single quotes. Thus in order to add single quotes to the parameter values, we need to perform at java end and then to pass those values on SQLquery.

How to perform it?

1. Add another parameter on the report.

Jasper Reports using PLPGSQL

    2. In expression field using Join function of Java, add single quotes along with Array and pass this parameter into SQL query.


“ARRAY[‘”+$P{parameter _m1}.join(‘\’,\”)+”‘]”


2)      3. Know your SQL query looks like

SELECT * FROM Nofsubmitted_report($P!{parameter _2});

Complexity 3:

If let’s say your Input Control off type Multi-Select Query value is Integer and How we pass this Integer array (collection) to call plpgsql function.

As we know $X{} and $P{} does not works in Jasper Server when dealing with stored procedure function, we have also seen what error throws when we use it.

The solution i found is to pass value with in Array like

SELECT * FROM function_name(ARRAY[$P!{parameter_m1}]);

For any confusion, please get in touch with us at Helical IT Solutions

D3 Chord Diagram in Jaspersoft / iReport

This blog will talk about what is chord diagram, its usage and about how to create and integrate a D3 Chord Diagram in Jaspersoft / iReport

All the reports are develop using ireport 5.5 professional and jasper server 5.5

As html component of jasper server does not load any scripts in the html component, we loaded the script in one of the decorator page(jsp page). The page is located at the location:


* Note: The above problem is specific with Jaspersoft 5.5 and not with the previous versions.

In the page we included the scripts which we want to load. We added the following code in the jsp page at line no 46:

<script type=”text/javascript” language=”JavaScript” src=”${pageContext.request.contextPath}/scripts/d3.v3.min.js”></script>

The script to be added should be kept at location:



Chord Diagram:

Chord diagram is basically used to show the relation between multiple entities. Different entities are represented by different arcs, the size of the arc varies. Also these different arcs are connected to each other via chords or ribbons, which shows the relationship strength between them. There can also be hover information, colors on the arcs or relationships (ribbons) etc.

Chord diagrams in D3 are one of the more difficult chart types to produce from raw data.  Fundamentally, you have to construct both a matrix and a mapping that explains how the matrix is composed and contains other meta data in order to render a complete diagram with appropriate coloring and labels.

Chord diagrams in D3 present information using two sets of the SVG elements: (1) a set of arcs around the outside of the diagram and (2) a set of chords or ribbons on the inside.  The arcs are referred to as “groups” in the D3 code and their arc length (how much of the circle they occupy) is determined by aggregating one complete row of the matrix.  Thus, the circumference of the circle represents the sum of the entire matrix (in whatever units it is expressed in) and is sub-divided into arcs for each of its rows.  The chords on the other hand show relations between elements in the matrix.  Their width on either end is determined by a location in the matrix.

A chord always has a “source” and a “target” and the way those are interpreted depends on your data.

Chord Diagram D3 Jaspersoft

Integration with JasperServer:

The data which we use for developing the chord diagram can be fetched from any database. The data fetched from database is stored in a variable and is then accessed in the html component using the same variable. Applying this process makes the report dynamic instead of static. Few parameters can also be added in the report which can be used in query and/or html component.

We require a matrix for creating a chord diagram. So data can be stored in a variable of string type and when used in the script it is first converted to array of array and is then used in the chord diagram.


Any variable/parameter can be accessed as shown below:

“<script> var arr =”+$V{variable1}+” </script>”

Parameter in query:

Select * from table_name

where date between $P{parameter1}  and $P{parameter2}


The sample code of static chart can be found at:

The steps on how to integrate it with jasperserver was discussed in my previous blog(D3 Integrating with Jasperserver).

Please refer to the link

Debug Mondrian Cubes

This blog will talk and educate the reader on how to debug Mondrian Cubes.

Question: How to look at the queries that Mondrian generates while the user is navigating the OLAP cube. ? It’s really useful to look at Mondrian log files because they give us a lot of useful information about how our system is behaving. We can
o look at sql statements and MDX queries,
o Have some profiling information on queries that are executed,
o Get other useful debugging information.

The following Steps illustrate how to enable Mondrian debugging logs, adding some properties to the Mondrian configuration file. After that, we’ll configure two new log4j appenders to have the desired log files properly written on our file system.

Step 1: Enable Mondrian debug log – Mondrian has a big set of configuration settings that can be modified. In our case, to enable Mondrian debug information follow the steps detailed below: Open the file located in <bi-server_home>/pentaho-solution/system/mondrian and add the following line.

Example: file location.
D:\Installation Softwares\Pentaho\biserver-ce-4.8.0-stable\biserver-ce\pentaho-solutions\system\mondrian

Debug Mondrian Cube

Step 2: Update log4j configuration
At this point we’re going to modify the log4j configuration file adding the required appenders categories to have our logging information displayed properly Open the log4j.xml file located in <bi-server_home>/tomcat/webapps/pentaho/WEB-INF/classes Based on what you want to log, add the one or each of the following lines to the file. They will create two new RollingFileAppenders. You’re free to use the kind of appender you prefer.

Example: Location of log4j.xml file

D:\Installation Softwares\Pentaho\biserver-ce-4.8.0-stable\biserver-ce\tomcat\webapps\pentaho\WEB-INF\classe Add the following code.

NOTE: The code is already available within the file.. Just we need to un-comment.

<appender name=”MDXLOG” class=”org.apache.log4j.RollingFileAppender”>
<param name=”File” value=”../logs/mondrian_mdx.log”/>
<param name=”Append” value=”false”/>
<param name=”MaxFileSize” value=”500KB”/>
<param name=”MaxBackupIndex” value=”1″/>
<layout class=”org.apache.log4j.PatternLayout”>
<param name=”ConversionPattern” value=”%d %-5p [%c] %m%n”/>

<category name=”mondrian.mdx”>
<priority value=”DEBUG”/>
<appender-ref ref=”MDXLOG”/>

<!– ========================================================= –>
<!– Special Log File specifically for Mondrian SQL Statements –>
<!– ========================================================= –>

<appender name=”SQLLOG” class=”org.apache.log4j.RollingFileAppender”>
<param name=”File” value=”../logs/mondrian_sql.log”/>
<param name=”Append” value=”false”/>
<param name=”MaxFileSize” value=”500KB”/>
<param name=”MaxBackupIndex” value=”1″/>
<layout class=”org.apache.log4j.PatternLayout”>
<param name=”ConversionPattern” value=”%d %-5p [%c] %m%n”/>

<category name=”mondrian.sql”>
<priority value=”DEBUG”/>
<appender-ref ref=”SQLLOG”/>


Step 3: Enable the new log settings To have the new log settings enabled restart the Pentaho bi-server instance.

Log files location
D:\Installation Softwares\Pentaho\biserver-ce-4.8.0-stable\biserver-ce\tomcat\logs
After restarting the server and when you run the CUBE-OLAP…
You can find the following two files in the above location.

Files are:
i) Mondrian_mdx.log
ii) Mondrian_sql.log

Debug Mondrian Cube 2

Now, Enjoy analysing SQL queries that generated while performing run actions in various tools like Pentaho Analyser, Saiku Analysis.

Get in touch with us at Helical IT Solutions

Create Organization using REST Webservice in Jasperserver / Jaspersoft

This blog will talk about how to create organizations using REST webservice in JasperServer.

STEP 1:-

Put the following jar files in lib folder:










STEP 2:-

Create a jsp page




<meta http-equiv=“Content-Type” content=“text/html; charset=ISO-8859-1”>

<title>Insert title here</title>



<form action=TestWebService>

<input type=“submit” value=“submit”>



STEP 3:=

Create client :


package com.helical.restwebservices;









import java.util.List;

import javax.servlet.http.HttpServlet;

import javax.servlet.http.HttpServletRequest;

import javax.servlet.http.HttpServletResponse;

import org.apache.http.HttpResponse;

import org.apache.http.client.HttpClient;

import org.apache.http.client.ResponseHandler;

import org.apache.http.client.methods.HttpGet;

import org.apache.http.client.methods.HttpPost;

import org.apache.http.client.methods.HttpPut;

import org.apache.http.entity.StringEntity;

import org.apache.http.impl.client.BasicResponseHandler;

import org.apache.http.impl.client.DefaultHttpClient;


public class TestWebService extends HttpServlet{

private static final long serialVersionUID = 1L;

public void doGet(HttpServletRequest req , HttpServletResponse res)


String jSessionId;

jSessionId = jWSLogin(“[email protected]“, “test”, null);




public String jWSLogin(String username, String password, String organization)


if(username == null || password == null)


throw new RuntimeException(“Failed: Username or Password can’t be null”);


String j_uname = organization == null ?  username : (organization.trim().length() == 0 || organization == “”) ? username : username +”%7C”+ organization;


String j_password = password;

String jSessionIdCookie=””;

try {

URL url = new URL(““);

HttpURLConnection conn = (HttpURLConnection) url.openConnection();



conn.setRequestProperty(“Content-Type”, “application/x-www-form-urlencoded”);

String input = “j_username=”+j_uname+”&j_password=”+j_password;

OutputStream os = conn.getOutputStream();                os.write(input.getBytes());


if (conn.getResponseCode() != 200) {

throw new RuntimeException(“Failed : HTTP error code : “+   conn.getResponseCode());




BufferedReader br = new BufferedReader(new InputStreamReader(


List<String> cookies = conn.getHeaderFields().get(“Set-Cookie”);


         for (String string : cookies)


jSessionIdCookie = string;




} catch (MalformedURLException e) {


} catch (IOException e) {



  return jSessionIdCookie;


public void addOrganization(String jSessionId, HttpServletResponse res)


HttpClient httpClient = new DefaultHttpClient();


HttpPost request = new         HttpPost(““);

request.setHeader(“Cookie”, jSessionId);

StringEntity params =new StringEntity(“{\”id\”:\”helicaltest123\”,\”alias\”:\”helicaltest123\”,\”parentId\”:\”organizations\”,\”tenantName\”:\”helicaltest123\”,\”tenantDesc\”:\”Audit Department of Finance\”,\”theme\”:\”default\”}”);

request.addHeader(“content-type”, “application/json”);


HttpResponse response = httpClient.execute(request);

}catch(Exception e)



}finally {







We have created two methods in

  1. jWSLogin()
  2. addOrganization()

jWSLogin() is use to aunthenticate and return session so that we can use that session in another task.


is use to add  organization. Data which we want to insert should be in json or xml format. In this example I have taken in json format.


STEP 4:-

Create web.xml:=

<?xml version=“1.0” encoding=“UTF-8”?>

<web-app xmlns:xsi= xmlns= xmlns:web= xsi:schemaLocation= id=“WebApp_ID” version=“2.5”>







<servlet-name> TestWebService </servlet-name>

<servlet-class>com.helical.restwebservices. TestWebService </servlet-class>



<servlet-name> TestWebService </servlet-name>

<url-pattern>/ TestWebService </url-pattern>




For any other help on Jasper Server, please get in touch with us

Helical IT Solutions

Implement Load Balancing in Jaspersoft and Pentaho and Designing a Cluster

Load Balancing in Jaspersoft and Pentaho and Designing a Cluster


Load balancing is often done in BI infrastructure to have good scalability and availability. There can be a number of methods of implementing the same, one of the method is by using Load balancer and the BI server (jasper report server or Pentaho Server) can run behind the same. One very important factor to be kept in mind is the versions of the BI server should be the same (either both Community edition or both Enterprise edition). It’s also preferable that the BI suite version number is also the same in both the instances and the configurations are also exactly the same.

Jaspersoft Pentaho Load Balancing

A cluster is a group of server and a properly designed cluster can support many users and organizations, avoid downtime, fail-over, load balancing  and plan for future growth as well. This type of architecture can also take care of any future enhancement. An end user is not aware of this architecture and he will access the same URL.


Jaspersoft works on sticky session mechanism i.e. if switching happens from one server to another user sessions are lost.


Load Balancer: It’s a hardware or software to spread traffic between the cluster of servers. To an end user he will access the application via the same web URL, the load balancer will gauge the load on the different servers, and thus accordingly will spread the requests so that end user will have the best speed and response time. Its recommended that the Pentaho or the Jaspersoft server hardwares are exactly the same, which can thus lead to more effective work of load balancing by load balancer.

Different algorithms can be used at the time of load balancing like round-robin or load based or anything else.



Shared repository database:

This is having all the reports, folders, role, user, security and other resources. Whatever operations are done in the repository they are thread-safe, which means many operations can run simultaneously. Also there are internal locks present which prevents conflicting operations.


Job Scheduler:

Job scheduler is responsible for accessing and executing jobs at predefined schedule. These job schedulers also have locking mechanish, to make sure that simultaneous triggering dosent happen.

Also all the nodes need to be clock synced.



Session Management and Failover

Client session information is stored in –memory. After a user logs in or a web services clien’t sends a request with credentials, the session contains the user profile such as organization and role membership for use in enforcing permissions. For browser users, the session also stores information about the state of the web interface, for example the last folder viewed in the repository, the last search term, or the data entered in a wizard…..  In commercial editions with the Ad Hoc Editor and Dashboard Designer, the user session also stores the on-screen state of the report or dashboard that the browser user creates interactively.


There are two types of session management which can happen, and each of them handles the failover in a different way

–  Replicated or persistent sessions: Instantaneous session information of the sessions are continuously stored at a shared location. Whenever any failure happens, load balancer automatically redirects to another server node. Since the other server also has access to all the information, this happens seam-lessly without end user experience getting affected.


–  Sticky or Pinned session: Here the sessions are managed and accessed by private servers, hence at the time of failure sessions are lost. After load balancer connects to other server, a new session is initiated.



For more information about Jaspersoft BI suite or Pentaho BI suite, get in touch with us at [email protected]

PDI Kettle – Difference between CE and EE

Pentaho Data Integrator – Kettle along with Talend is one of the most famous Open Source ETL tool. KETTLE ( k- kettle, E- extract, T- Transform, T-Transport, L-Load, E-Environment).

However, it also does come in two variations i.e. Community version (free) and Enterprise version (paid). In this blog, we would try to cover what all are the additional features which are present in the Paid version.


–          Access Control:  Using this, admin can manage and control who all will have the right to create, modify and delete PDI transformations and jobs.

PDI Repository


–          Version Control:  This feature allows an admin to manage different jobs, managing and tracking their versions and transformations as well. It saves multiple copies and revisions of the job, and hence can take care of any deletion by mistake. Restoration can be, thus done, very easily.

PDI Repository Explorer


–           Production Control:  scheduling and monitoring jobs on a centralized server. Person can check the job and execute it without modifying. Scheduling and running the transformation at pre-defined time also possible.

PDI access controlPDI Job Scheduler


–          Integrated Security:  Presence of this component in PDI EE helps in recognizing users and roles from external corporate systems (like Spring Security Framework)

–          Content Management Repository:  This component is also present only in PDI EE version, it stores multiple versions of content and applying rules to content access (Apache JackRabbit)

–          Integrated Scheduling:  executing jobs on a centralized server at predetermined intervals (Quartz) is taken care by Integrated scheduling component present in the EE version.



Security Feature necessity:

–          In case if there are multiple users with different access right (like business, team leader, developer, operation etc), in those cases security feature helps in restricting the transformations. This user level security can be done via LDAP or CAS


Repository Feature

–          If there are multiple ETL developers who are working on the project, then it’s really important to have a repository sort of feature, so that every developer will have his own folder. Other developers can have read access to other ETL developers folder, that means they can only execute but cant modify

–          Repository also helps in maintaining the version information and copies as well

–          In CE version of ETL, the ETL copy will have to be saved separately since it dosent have repository feature


Data Integration Server Feature

–          This helps in remote deployment and monitoring of the transformation

–          You don’t need to be loggedin to execute the jobs

Data integration server also includes a scheduler that lets setup recurring schedules. Simple GUI allows to define things like start date, end date, logging level, frequency and when to repeat job execution, logging levels etc.

Please get in touch for more information at [email protected]