buildjob

Scheduling Talend Job on both Windows and Linux OS

Scheduling Talend Job on both Windows and Linux OS

In order to schedule a Talend Job, first we have to export the job as build job which will generate the runnable files to execute on required OS. And also all the child jobs in that project converted as jar files.

Right click on the parent/main job of your project and select Build Job option.

It will prompt a window, the job will create a zip file.

You have to choose the path and build type should be Autonomous. And also enable the option Extract the zip file. It will unzip the above zip file in the same path that you have given.

Enable Shell Launcher and Context scripts. For Shell Launcher choose All as the value to generate both runnable files for Win OS and Linux OS. It will generate two executable/runnable files like batch file for Windows OS and Shell file for Linux.

 

Schedule Talend Job on Windows OS:

On Windows OS, we can schedule the Talend job by using Windows Task Scheduler.

On the right side, under actions you can click on Create Basic Task. And then we can select when the task should run as shown in below image:

 

You can also select exact time to run:

 

 

Now select ‘Start a Program’ option under Action

 

 

Now browse the batch file and also you can add any context parameters if needed.

startprogram

 

 

Now click on Finish. Task will be scheduled for the selected time.

finish

Schedule Talend Job on Linux OS:

On Linux OS we use crontab to schedule tasks

Type the below command to open crontab file:

crontab -e

And add the job which need to be schedule as shown in below image:

linux

Hope this post will be useful to refer when scheduling Talend Job.

 

 

 

 

 

Creating Report In iReport using Linear Gauge as component

This blog will teach reader how to create report in ireport using linear gauge as component and publishing it on the jasper server.

Purpose : to compare the avg(salary) of male and female employee in an organization

Database server : – postgesql

Database name : foodmart

Table name : employee

Below are the steps :

# 1 : Create two datasets named “MaleSlary” &  “FemaleSalary” for calculating the avg(salary) for

Male and female respectively:

Dataset 1(MaleSalary) :  select gender,avg(salary) from employee where gender like ‘M’group by

Gender

Dataset 2(FemaleSalary) : select gender,avg(salary) from employee where gender like ‘F’group by

Gender

# 1:- Drag and drop two “linear gauge “ as widget type from WidgetPro Palette chart in ireport

#2 :- Add the above datasource for widget1 as MaleSalary and widget2 as FemaleSalary

#3:- right click on the widget chart -> Edit Widget Peoperties

linear gauge jasper report

Here for each tab in the properties we can customize our widget visualization.

Example : Suppose we need to add % symbol after the Widget Pointer value,then in that case

We need to go to the Advanced Properties of Widget Configuration and add

Property Name : number suffix and Value Expression : ‘%’.

linear gauge jasper report 2

Example 2 : Suppose we need to add the Color Ranges For the Widget then in the widget properties,

Color Range Option is there, we just only have to give our condition.

 

# 4:- After Publishing the report int jasper server , the report will look like below :

linear gauge in iReport

Rupam

Helical IT Solutions

 

Dimensional Modeling Process

Dimensional Modeling

Dimensional modeling is a technique, used in data warehouse design, for conceptualizing and visualizing data models as a set of measures that are described by common aspects of the business. It is especially useful for summarizing and rearranging the data and presenting views of the data to support data analysis.

Dimensional Modeling Vocabulary

Fact

A fact table is the primary table in a dimensional model where the numerical performance measurements of the business are stored. The term fact represents a business measure that can be used in analyzing the business or business processes. The most useful facts are numeric and additive.

For e.g. Sales, Quantity ordered

Dimension

Dimension tables are integral companions to a fact table. The dimension tables contain the textual descriptors of the business. Each dimension is defined by its single primary key, which serves as the basis for referential integrity with any given fact table to which it is joined.

Dimensions are the parameters over which we want to perform Online Analytical Processing (OLAP). For example, in a database for analyzing all sales of products, common dimensions could be:

  • · Time
  • · Location/region
  • · Customers
  • · Salesperson

 

Dimensional Modeling Process

Identify business process

In dimensional modeling, the best unit of analysis is the business process in which the organization has the most interest. Select the business process for which the dimensional model will be designed. Based on the selection, the requirements for the business process are gathered.

At this phase we focus on business processes, rather than on business departments, so that we can deliver consistent information more economically throughout the organization. If we establish departmentally bound dimensional models, we’ll inevitably duplicate data with different labels and terminology.

For example, we’d build a single dimensional model to handle orders data rather than building separate models for the sales and marketing departments, which both want to access orders data.

Define grain

The granularity of a fact is the level of detail at which it is recorded. If data is to be analyzed effectively, it must all be at the same level of granularity. As a general rule, data should be kept at the highest (most detailed) level of granularity.

For example, grain definitions can include the following items:

  • A line item on a grocery receipt
  • A monthly snapshot of a bank account statement

Identify dimensions & facts

Our next step in creating a model is to identify the measures and dimensions within our requirements.

A user typically needs to evaluate, or analyze, some aspect of the organizations business. The requirements that have been collected must represent the two key elements of this analysis: what is being analyzed, and the evaluation criteria for what is being analyzed. We refer to the evaluation criteria as measures and what is being analyzed as dimensions.

 

If we are clear about the grain, then the dimensions typically can be identified quite easily. With the choice of each dimension, we will list all the discrete, text like attributes that will flesh out each dimension table. Examples of common dimensions include date, product, customer, transaction type, and status.

 

Facts are determined by answering the question, “What are we measuring?” Business users are keenly interested in analyzing these business process performance measures.

 

Creating a dimension table

Now that we have identified dimensions, next we need to identify members , hierarchies & properties or attributes of each dimension that we need to store in our table.

Dimension Members:

A dimension contains many dimension members. A dimension member is a distinct name or identifier used to determine a data items position. For example, all months, quarters, and years make up a time dimension, and all cities, regions, and countries make up a geography dimension.

Dimension Hierarchies:

We can arrange the members of a dimension into one or more hierarchies. Each hierarchy can also have multiple hierarchy levels. Every member of a dimension does not locate on one hierarchy structure.

Creating a fact table

Together, one set of dimensions and its associated measures make up what we call a fact. Organizing the dimensions and measures into facts is the next step. This is the process of grouping dimensions and measures together in a manner that can address the specified requirements. All candidate facts in a design must be true to the grain defined in step 2. Facts that clearly belong to a different grain must be in a separate fact table.

 

Archana Verma

Helical IT Solutions

Getting Started with Mongo DB

Installation & Startup:

Download MongoDB installer for windows platform from http://www.mongodb.org/downloads and run. This simply extracts the binaries to your program files.

#Create DBPATH and log libraries:

Allocate a folder in your system that can be used for holding the mongo databases and also allocate a log file.

Ex – Allocated “C:\mongo\data\db” for databases and “C:\mongo\logs\mongo.log” as a log file.

#Starting the mongo database

Below are different ways of starting the mongodb:

1.    From the command prompt

Execute the mongod.exe present in the bin folder to start the database.

On command prompt à mongod --dbpath c:\mongo\data\db

There are other options that can also be specified alongwith dbpath. If dbpath is not provided, it looks for c:\data\db folder and gives error if not found.

To shutdown, press CTRL+C

 

2.    Starting with a config file

You can create a configuration file to define settings for the MongoDB server like the dbpath,logpath etc. Below is a sample file :

(This is a older format, for 2.6 version a new format is introduced. Older format is supported for backward compatibility)

#This is an example config file for MongoDB

dbpath = C:\Mongo\data\db

port = 27017

logpath = C:\Mongo\logs\mongo.log

Now you can use the below command –

C:\Program Files\MongoDB 2.6 Standard\bin>mongod --config mongo.conf

2014-04-15T10:27:18.883+0530 log file "C:\Mongo\logs\mongo.log" exists; moved to

"C:\Mongo\logs\mongo.log.2014-04-15T04-57-18".

As we haven’t specified “logappend” option in the config file, it allocates new file everytime you start the db. You can check the log file if you are getting errors while connecting to the db

To shutdown, use command “mongod –shutdown”

 

3.    Installing as Windows service:

Start the command prompt as administrator

You can use the below command to create the service, edit the same as per your settings:

sc create MongoDB binPath= "\"C:\Program Files\MongoDB 2.6 Standard\bin\mongod.exe\" --service --config=\"C:\Program Files\MongoDB 2.6 Standard\bin\mongo.conf\"" DisplayName= "MongoDB 2.6 Standard"

Please note this is a single line of command

You can now simply start/stop the service to start/shutdown the mongo database.

 

Using Mongo command shell:

Run Mongo.exe from \bin folder and you will see the below:

MongoDB shell version: 2.6.0

connecting to: test      //This is the Default database

Welcome to the MongoDB shell.

For interactive help, type "help".

For more comprehensive documentation, see

http://docs.mongodb.org/

Questions? Try the support group

http://groups.google.com/group/mongodb-user

 

Some basic commands to get you started

> show dbs                  // show databases

admin  (empty)
local  0.078GB

> use names               // switch to a particular database/creates one if it does not exist

switched to db names

> db.mynames.insert({name: 'shraddha', email: '[email protected]'})           // Inserting document
WriteResult({ "nInserted" : 1 })

//Note that , ‘db’ points to the current database in use. Here, Collection “mynames” is automatically created when you insert a document

> show dbs

admin  (empty)
local  0.078GB
names  0.078GB

> db.mynames.find()               //query the db, select operation

{ "_id" : ObjectId("534cbfd03dfb3fbd86d8029d"), "name" : "shraddha", "email" : "[email protected]" }

//One more way of inserting……

> a={"name":"test3","email":"test3.other"}

{ "name" : "test3", "email" : "test3.other" }

> b={"name":"test4",email:"test4.other"}

{ "name" : "test4", "email" : "test4.other" }

> db.othernames.insert(a)

WriteResult({ "nInserted" : 1 })

> db.othernames.insert(b)

WriteResult({ "nInserted" : 1 })

> db.othernames.insert(c)

2014-04-15T19:40:24.798+0530 ReferenceError: c is not defined

//…In all the above inserts, the “_id” which has the unique key is auto-generated..

 

> coll=db.mynames

names.mynames

> coll.find()

{ "_id" : ObjectId("534cbfd03dfb3fbd86d8029d"), "name" : "shraddha", "email" : "[email protected]" }
{ "_id" : ObjectId("534d3b89f4d4b90697c205d6"), "name" : "test1", "email" : "test1.helical" }

> coll=db.othernames

names.othernames

> coll.find()

{ "_id" : ObjectId("534d3dc3f4d4b90697c205d7"), "name" : "test3", "email" : "test3.other" }
{ "_id" : ObjectId("534d3dcdf4d4b90697c205d8"), "name" : "test4", "email" : "test4.other" }

 

> coll.find({name:{$gt:"test3"}})                  //find documents where “name” is >”test3”

{ "_id" : ObjectId("534d3dcdf4d4b90697c205d8"), "name" : "test4", "email" : "test4.other" }

> coll.find({name:"test3"})

{ "_id" : ObjectId("534d3dc3f4d4b90697c205d7"), "name" : "test3", "email" : "test3.other" }

>

> coll.find({$or:[{name:{$gt:"test3"}},{name:"test3"}]})

{ "_id" : ObjectId("534d3dc3f4d4b90697c205d7"), "name" : "test3", "email" : "test3.other" }
{ "_id" : ObjectId("534d3dcdf4d4b90697c205d8"), "name" : "test4", "email" : "test4.other" }

> coll.find({$or:[{name:{$gt:"test3"}},{name:"test0"}]})

{ "_id" : ObjectId("534d3dcdf4d4b90697c205d8"), "name" : "test4", "email" : "test4.other" }

>
 
//Example - Manually inserting ObjectID field (key value)
 
> coll=db.testobjs

names.testobjs

> coll.insert({_id:1,fld1:"abc",fld2:123})

WriteResult({ "nInserted" : 1 })

> coll.insert({_id:2,fld1:"cde",fld2:345})

WriteResult({ "nInserted" : 1 })

> coll.insert({_id:2,fld1:"cde",fld2:345})       //trying to insert duplicate value in _id

WriteResult({
"Inserted" : 0,
"writeError" : {
"code" : 11000,
"errmsg" : "insertDocument :: caused by :: 11000 E11000 duplicate key error index: names.testobjs.$_id_  dup key: { : 2.0 }"}}

> coll.find()

{ "_id" : 1, "fld1" : "abc", "fld2" : 123 }
{ "_id" : 2, "fld1" : "cde", "fld2" : 345 }

>
 

Importing a csv file into mongodb:

Alter the below command as per your requirement and execute:

C:\Program Files\MongoDB 2.6 Standard\bin>mongoimport --db northwind --collection orders --type csv --file C:\Shraddha\Official\MongoDB\northwind-mongo-master\orders.csv --headerline

connected to: 127.0.0.1
2014-04-17T18:24:22.603+0530 check 9 831
2014-04-17T18:24:22.604+0530 imported 830 objects

 

Options used –

–db : name of the database
–collection : orders
–type : type of input file (we can also import tsv, JSON)
–file : path of the input file
–headerline : signifies that the first line in the csv file is column names

 

Shraddha Tambe

Helical IT Solutions

Logging using Talend

Logging using Talend

Introduction: In this article, we will discuss about different methods of logging in Talend Open Studio. Talend is one of the widely used open source data integration tool in the market. Talend mainly uses three types of logging

  1. Statistics – Job execution statistics at component and job level
  2. Error – Job level errors, warning and exceptions
  3. Meter logging – Data flow details inside job

Best approach for logging is at project level. To enable project level logging, In Talend Open Studio, go to File, Project properties and enable or disable check boxes to start default logging at project level. See the screen shot below.

talend logging 1

If you enable the logging at project level, then every new job created will inherit these settings. There are more settings and options to do if you enable project level logging. See below screenshot.

talend logging 2

You can decide it to log the information to Console/File/Database. In case if you select any of File/Database or both options, then need to set few more default parameters like

talend logging 3

For file names, you can pre-or post fixes the file name with Talend Date Time stamp function. Or else it will write into the same file for every execution and flush out earlier data. In case of databases, you can have existing created database. This scenario does not work when you don’t have any database on the target server this scenario fails. For general JDBC, you need to provide above parameters, else if you select any database such as MySQL, then provide username, password and other required parameter values.

If you enable project level logging, then there is no need to separately use all these components say, tLogCatcher to Log the errors, tFlowMeter to catch the data flow and tStatCatcher to catch the Errors, flow and statistics log. Talend throws the errors or exceptions whenever it occurs and displays its complete trace on the console. tLogCatcher if used with the help of tDie or tWarn, would catch those messages and can be redirected to the required database / file based on the requirement. In order to do this we need to take care of all above components, implement and test the job.

Advantage with this approach is that you get the brief error information in logs table which is automatically created by Talend. In addition to this information Talend also prints its error log trace on console. And the negative side of this is that the console trace is not stored in the log table.

Problem: In both above approaches, Talend does not store or redirect its console trace to database or file.