What is a lookup? How is it different from the Merge Join component in ETL?


A lookup is basically accessing data from a relational tables which are not sources in the mapping table. In SQL Terms, a lookup is a subquery of the main query. Lookups are used basically either for filtering data or for Joins.

In PDI, a lookup is used to get a related data, in case of joins and also it is used to implement SCD. Lookups are basically used for reference, based on the lookup condition. when u want some data based on target data ,will take lookup on that particular table and retrieve the corresponding fields from that table. There are various kinds of lookups that we can use in PDI, namely:

1. Stream Lookup

2. Database lookup

3. Insert/Update(Implementing SCD)

4. Update(Implementing SCD)

5. Dimensional Lookup/update(Implementing SCD)


Difference between Lookups and Merge join component in PDI?

In lookups, basically there are two types of joins namely inner join and left join. Meanwhile in the Merge Join component there are inner join,left outer,right outer and full outer join.

Stream lookup vs Merge Join:

In Stream lookup, it only performs the LEFT JOIN. However, since the whole data is cached first into memory, it supports one possible lookup value per key value.

In Merge Join, it performs the same way the Join mechanism works in SQL. However, the input should be sorted according to the key values in which the join is going to be performed. Without sorting, the merge join step doesnt work properly.

Database Lookup vs Merge Join:

The database lookup component in PDI can perform both the left join and also the inner join. If we set the caching as true, then it works like the Stream Lookup where it supports one possible lookup value per key value. If we set the cache as false, then it performs the same join mechanism as the join mechanism in SQL.

The main advantage with lookup is it consumes less memory as compared to Merge Join since external sort is not required in lookups whereas we need to sort the input before performing Merge Join. However, the Merge Join is faster than lookups while loading huge amount of data but it consumes more memory as compared to lookups.




Leave a Reply