Passing Context Parameters from Command Line to Talend Job

Passing Context Parameters from Command Line to Talend Job

Some times we have to pass context parameter value from commandline while executing talend job (.bat file or .sh file) which gets generated when we export job. We can make use of “–context_param” command in commandline  to pass the context parameter values while executing talend job from command line.

Here is a simple job for passing the context parameters from commandline.

Place tjava component on working area, and under the component tab write a code to print the context parameter values as shown in below screen shot:

bg1

 

Declare the context Parameters and assign a defualt values under the Contexts tab for this job.

Here I have declared two context parameters as param1 and param2 and assigned default values as shown in the below screen shot:

bg2

Now if we execute this job from studio it will display these default values on the console as shown in the below screen shot.

bg

If we execute this job from command line without providing the context parameters it will display the default values as follows:

bg3

Now execute the job by providing context variables in command line prompt. It will display these values as the output instead of default values as shown in the below output.

Here I passed ‘a1’ and ‘a2’ as the context parameter values for param1 and param2 context variables respectively.

bg4

Thanks and Regards,

Lalitha

Creating Custom Component in Talend Open Studio

Creating Custom Component in Talend Open Studio

What a component is?

Graphically it is an icon which you drag from palette to the work place. But technically it is a file folder which has the same name of the component i.e (component name = custom component file folder name). So will call this as component folder.

Where this component folder located in Talend when you install?

It is in: TOS_DI-Win32-r117820-<VERSIONNAME> \plugins\org.talend.designer.components.localprovider_<VERSIONNAME>\components

To start, we need a component designer but before to it first task is to create a “Custom Component” folder somewhere handy on your disk.

Now we need to locate the custom component folder path with in Talend such that component designer gets enabled or Opens.

TOS Custom ComponentNow you should be able to see a two different perspectives 1) Data Integration 2) Component Designer. You can switch from one to the other with small table like icon on the top right part when ever you need. It can also be accessible from “Window->Perspective”.

Select Component Designer Perspective, a pane should appear with list of projects don’t worry if it is empty , that’s because we did not develop any component yet, However you should be able to see a “Component_Project”.

Designer

First component:

First component name is tTutorial1, that’s the first information we need because we will create a directory with the same name in our Custom Components folder, directly in the file system. Now, Hit F5 or “refresh” the menu to look at the tTutorial1 component.

To install a custom component, from the Component Designer, the “push components to palette” is used.The studio will attempt the installation of ALL the components visible under the COMPONENT_PROJECT root node.

Each (java) component must have a descriptor XML file stored in the component folder, this file must have the same name of the component, plus a _java suffix and .xml extension.

In our case we will create tTutorial1_java.xml. This element is really key in a component, it allows the studio to understand what the component is supposed to do and how it can interact with other components.

The structure of the xml descriptor file is defined and some parts are mandatory, we will start with them.

 <?xml version="1.0" encoding="UTF-8"?>
<COMPONENT>
<HEADER AUTHOR="Dasari"
    COMPATIBILITY="ALL"
    PLATEFORM="ALL"
    SERIAL=""
    STARTABLE="true"
    STATUS="BETA"
    VERSION="0.1">
    <SIGNATURE/>
 </HEADER>
<FAMILIES>
<FAMILY>Tutorial</FAMILY>
</FAMILIES>
<DOCUMENTATION>
<URL></URL>
</DOCUMENTATION>
<CONNECTORS>
<CONNECTOR CTYPE="FLOW"/>
</CONNECTORS>
<PARAMETERS>
</PARAMETERS>
<CODEGENERATION/>
<RETURNS>
</RETURNS>
</COMPONENT>

Copy & paste the above code into a text file called tTutorial1_java.xml in your Custom Components/tTutorial1 folder.

you need at least another file in order to make the component visible in the palette : the message properties file. it is a java .properties file and it is mandatory, it will look initially something like this :

#
#Wed July 30 12:03:50 IST 2014
HELP=org.talend.help.tTutorial1
LONG_NAME=Tutorial component 1

Copy & paste the above code into another text file in the same directory, that you will save as tTutorial1_messages.properties, and now push the components to the palette.

In the bottom right part of your application window you should see a message saying Generation engine initialization in progress, get used to it because you are going to see it quite a few times.

Switch to “Data Integration” perspective and find the component from palette.If you cannot see in palette, simply create an empty project and a new job.Problem Occured

Now it will generate a null pointer Exception. So, what’s causing the null pointer error now? What’s missing? That should be visible directly in the palette : your component is missing an icon (which, for a graphical tool is quite important).

 

Download the attached picture and save it as tTutorial1_icon32.png in the component folder. Now again refresh and Push the component into palette.

Finally this time, your component should be visible in the palette. Custom Component

A component needs a minimum Files:
1) The xml descriptor file
2) The messages properties file
3) An icon file, to be succesfully installed in the palette

Reference:
[1] http://www.powerupbi.com/talend/componentCreation_3.html

Multi valued column processing using Talend Open Studio

Multi valued column processing using Talend

Introduction

Now days, source data files which are exported from the different database structures are having multiple fields in single column. These fields need to be extracted and processed independently and need to be identified with the key column for corresponding row. This specific scenario can’t be handled independently without the use of ETL tool required in data migration activities. Talend provides a component called as tNormalize. This component normalizes input flow by following SQL standard.  In order to understand the business use case and its importance let us take an example of Patient records in hospital.

Sample Data

Following is the sample data extract in which last column is Percentile having two columns separated by “|” pipe operator. These columns (Percentile_1~Percentile_2|Percentile _1~Percentile_2|Percentile_1~ Percentile_2) need to be extracted to another table with respect to PlanID. This combination could be repeated for n number of times.

Measurement Year,Payer,Plan ID,Plan Name,Domain,Sub Domain,Measure Description,Measure,Method of Data Collection,Eligible Population,Denominator,Numerator,Rate,Level of Significance,Percentile

2012,Child Health Plus, 1080382,Statewide Average,Child and Adolescent Health,Caring for Children and Adolescents with Illnesses,Follow-Up Care for Children Prescribed ADHD Medication:Continuation Phase,ADHD-Continuation,Administrative Review,633,633,355,56,,63~63|53~47

2012,Child Health Plus,1080383,MVP,Child and Adolescent Health,Caring for Children and Adolescents with Illnesses,Follow-Up Care for Children Prescribed ADHD Medication:Continuation Phase,ADHD-Continuation,Administrative Review,,,,,Sample size too small to report,~|~

2012,Child Health Plus,1090384,CDPHP,Child and Adolescent Health,Caring for Children and Adolescents with Illnesses,Follow-Up Care for Children Prescribed ADHD Medication:Continuation Phase,ADHD-Continuation,Administrative Review,54,54,32,59,,61~61|52~48

Job Design

In above data PlanID is a unique primary key in addition to couple of other columns… when extracted to another table we would be needing all the percentile for each PlanID. This problem could be solved by using the tNormalize component. Following is the job flow design to extract PlanID and multi-valued Percentile column

Multi valued column talend job

Multi valued column talend job

Pipe separator configuration

Pipe separator configuration

Separator needs to be escaped by “\\” for pipe operator or having similar special character. After executing the job, you will get records as follows

Expected output

PlanID, Percentile_1,Percentile_2

1080382,63,63

1080382,53,47

1080283,,

1080283,,

1080384,61,61

1080384,52,48

– Vaibhav Waghmare

Make Batch count for Incremental loading in Talend (TOS_5.3)

This blog will talk about how to make Batch count for Incremental loading in Talend (TOS_5.3).

First all we have t_source and t_target tables
Both Tables have (t_source,t_target) data like this….

before_execute_job

Objective

INSERT into t_source

We inserted one record into t_source
Insert into t_source(id,name,city,s_date) values (111,’vic’,’del’,’2014-03-01 01:02:00′)
UPDATE from t_source
We updated from t_source
Update t_source set name=’don’,s_date=’2014-02-01 01:02:00′ where id = 109;
DELETE from t_source
We deleted from t_source
Delete from t_source where id = 108;

Finally we have records from t_source table and t_target tables

update_Tsource

We want make Batch count in TALEND(TOS)

We created one job…

test

Details of Job

Subjob (1)

We are fetched max(id) from t_target table and we updated into property variable

context.last_id = input_row.id;

Subjob (2)

We are fetching min (id) and max (id) from t_source and we updated into property variables

context.sr_max_id = input_row.max_id;

context.sr_min_id = input_row.min_id;

Subjob (3)

we are selecting from t_source

select * from t_source where  id > “+context.last_id+” order by id

and insert into t_target table by primary key is id

Subjob(4)

we need to count between primary key from t_source

select count(*) as batch_count from t_source where id between “+context.sr_min_id+” and “+context.sr_max_id+”

and updated into property variable. We want to calculate Batch count

We will define by divide count (context.MT_COUNT = 5)  . context.max_count, context.min_count is 0 before Execution of job.

context.count = input_row.count;

System.out.println(“Count of primary key from source “+context.UPLOAD_FILE_NAME+” Table : “+context.count);

 

int x = (context.count / context.MT_COUNT) + 3;

context.batch_count = x;

System.out.println(“Batch Count : “+context.batch_count);

context.max_count = context.MT_COUNT;

 

context.min_count = context.sr_min_id ;

context.max_count = context.sr_min_id + context.max_count;

SubJob (5)

We will iterate by context.batch_count. We have another job(test123) by Iterating.

1.Test123 Job

test123a.       SubJob(5.1)

We are printing Batch count min_count to max_count

System.out.println(“Batch “+Numeric.sequence(“s1″,1,1)+”: Count of “+context.min_count+” to “+context.max_count);    

b.      Subjob(5.2)

We are selecting from t_source between primary key

select * from t_source  where id >= “+context.min_count+” and id <= “+context.max_count+” order by id

and collects data into Buffer output

c.       SubJob (5.3)

We compared by inner join from Buffer input(t_source) and t_target tables in tmap. If any reject output will be there then updated into t_target.

T_target sql query: select * from t_target  where id >= “+context.min_count+” and id <= “+context.max_count+” order by id

d.      SubJob (5.4)

We compared by left outer join from t_target and Buffer input(t_source) in tmap. We filtered t_source.id == 0 and if any output is there then deleted

T_target sql query: select * from t_target  where id >= “+context.min_count+” and id <= “+context.max_count+” order by id

And we have t_javarow(Min, Max of  p_key)

In that,

context.min_count = input_row.id;

context.max_count = context.min_count + context.MT_COUNT;

Results

We Executed the job by defined (property variables)MT_COUNT = 5.

extecion

 

Finally we have records from t_source and t_target.

afterexecited

Thanks & regards

Vishwanth suraparaju

Senior ETL Developer