Merge Join Vs Stream lookup in Pentaho DI

Posted on April 15, 2016 by By Nikhilesh, in ETL | 0

Merge Join Vs Stream lookup in Pentaho DI

Merge Join:

It joins two data sets which are coming from two table Inputs. In this steps the below Types of Joins are available:

FULL OUTER: all rows from both sources will be included in the result, with empty values for non-matching keys in both data streams
LEFT OUTER: all rows from the first source will be in the result, with empty values for non-matching keys in the second data stream
RIGHT OUTER: all rows from the second source will be in the result, with empty values for non-matching keys in the first data stream
INNER JOIN: only rows having the same key in both sources will be included in the result

Make data easy with Helical Insight.
Helical Insight is world’s best open source business intelligence tool.

Click Here to Free Download

Note: In this step rows are expected in to be sorted on the specified key fields. When using the Sort step, this works fine.

Stream Lookup:

The Stream lookup step type allows you to look up data using information coming from other steps in the transformation. The data coming from the Source step is first read into memory and is then used to look up data from the main stream.

Preserve memory: Encodes rows of data to preserve memory while sorting. (Technical background: Kettle will store the lookup data as raw bytes in a custom storage object that uses a hashcode of the bytes as the key. More CPU cost related to calculating the hashcode, less memory needed.)

Key and value are exactly one integer field: Preserves memory while executing a sort by . Note: Works only when “Preserve memory” is checked. Cannot be combined with the “Use sorted list” option.
(Technical background: The lookup data is stored in a custom storage object that is similar to the byte array hashmap, but it doesn’t have to convert to raw bytes. It just takes a hashcode of the long.)

Use sorted list: Enable to store values using a sorted list; this provides better memory usage when working with data sets containing wide row. Note: Works only when “Preserve memory” is checked. Cannot be combined with the “Key and value are exactly one integer field” option. (Technical background: the lookup data is put into a tuple and stored in a sorted list. Lookups are done via a binary tree search.)

Thank you

Lalitha

Best Open Source Business Intelligence Software Helical Insight is Here

A Business Intelligence Framework

DI difference between database join and merge join in pentaho ETL Join lookup merge join in pentaho Merge Join Vs Stream lookup in Pentaho DI Merge Join/Stream Lookup Pentaho Issue pentaho pentaho stream lookup multiple values What is a lookup join? What is stream lookup in Pentaho? What is the difference between merge join and lookup?

0 0 votes

Article Rating

0 Comments

Inline Feedbacks

View all comments

You might also like..

Software Testing

Defect Life Cycle

By admin

This blog explains about the complete life cycle of a bug and different status of bug from the stage it was identified,fixed,retest and close. What is Defect life cycle? Defect life cycle is the life cycle of a defect or...

Software Testing

Different Levels of Testing in Software Testing

By admin

What are the Levels of Software Testing? In this blog,we are going to understand the various levels of software testing In Software Testing,we have four different levels of testing,which are as mentioned below: Unit Testing Integration Testing System Testing Acceptance...

Pentaho

How To Get Subfolder Names In Pentaho

By admin

Introduction Pentaho is a business intelligence and data integration platform, and you can use “Get subfolder names” in transformations, The ability to retrieve subfolder names from a directory is often required when dealing with dynamic or changing file structures. Here...

About Helical IT Solutions Pvt Ltd

Location

Contact Us

Search what you are looking for..

Merge Join Vs Stream lookup in Pentaho DI

Posted on April 15, 2016 by By Nikhilesh, in ETL | 0

A Business Intelligence Framework

You might also like..

Software Testing

Defect Life Cycle

By admin

Software Testing

Different Levels of Testing in Software Testing

By admin

Pentaho

How To Get Subfolder Names In Pentaho

By admin

Contact Form