Analytic Query in Pentaho Data Integration [LAG & LEAD]

The could be a time we get a requirement as follows:

Fetch Every Purchases Sum for each Customer in a year, having the following columns

customerid, customername, productcost, purchasemonth

Now this looks easy, but the tricky part would be fetching the


he/she made a purchase

If your using a PostgresDB then this will be easy because there are various windows functions that will help you in acquiring your result with ease

But if your DB doesn’t support Windows Functions then it could be a problem.

So now Pentaho will help you reach your result in very less hustle with the help of the Analytic Query Component

Now lets build our query for one customer

sf97.customer_id as CustomerID,
co.fullname as CustomerName,
SUM(sf97.store_sales) as ProductCost,
tbd.the_month as PurchaseMonth
from sales_fact_1997 sf97
JOIN time_by_day tbd on tbd.time_id = sf97.time_id
JOIN customer co on co.customer_id = sf97.customer_id
WHERE sf97.customer_id = 4873

ORDER BY sf97.customer_id,tbd.month_of_year

Now we Open Up our PDI,  Create a new Transformation,

  • Select Table Input as your Source Table
  • Next Select the Analytic Query


  • Open up Analytic query
  • Now from our requirement we want to fetch the PreviousPurchaseMonth he/she made a purchase for each customer (What month he/she previously made a purchase)
    so the unique item here is the customerID
  • Then in the Group Field Select the customerid
  • In the Analytic Functions
    Enter the name of your new field
    Subject will be the purchasemonth Column because our new column is dependent on that

Type: This can be Lag (Behind) or Lead (Forward)
Click ok

Now right click on Analytic Query Component and Select Preview -> Quick Lunch


There you go.

Note make Sure in your query you have and order by based on your requirement.

Ehizogie Sohail Izebhijie

Leave a Reply