Map Reduce in Hadoop :
It is designed for processing large volume of data in parallel.
It is an execution model in hadoop framework which is sub-divided into two separate phases :
- Mapper phase,
- Reducer Phase
Mapper Phase : During this phase , the input data splits for analysis by map tasks running in parallel across the hadoop cluster. It separate required output key and output value and writes into local disk.
Reducer Phase : It has two responsibility :
- Grouping the data based on key
Once output is returned , immediately mapper output will be deleted.
In our query or job,if suppose there is no requirement of grouping and aggregating functionality , we can suspend the reducer . In such situation , mapper output is permanent.
For the mapper and reducer , input and output should be in key value pair.
Identity Mapper : it is like identity function in mathematics.
Identity Mapper takes the input key/value pair and splits it out without any processing.
Identity Reducer :
In Identity Reducer , the reduce step will take place , related sorting and shuffling will also be performed.but there will be no aggregation .
So if we want to sort our data that is coming from map but don’t care for any grouping and also fine with multiple reducer output then in that case we can use identity reducer.
Combiner in Map reduce :
Combiner is used as an optimization for map reduce job.The combiner function runs on the output of the map phase and is used as filtering or aggregating step to lessen the number of intermediate keys that are being passed to the reducer.In most of the cases , the reducer is set to be the combiner class. The output of the combiner class is the intermediate data that is passed to the reducer where as the output of the reducer class is passed to the output file on disk.