Beginner’s Guide to E.T.L (Extract, Transform and Load) – Connection Set-Up

Connection Setup for connection type: PostgreSQL

[We are now setting up a connection to A database i.e if your source is a database]
There are 3 ways of access provided for Connections using PostgreSQL
a. Native (JBDC)
b. ODBC
c. JNDI

a. Native (JBDC)
I. Enter the Connection Name: Anyname
II. Select the connection type: PostgreSQL
III. Host Name: localhost [This can be an ip address]
Database Name: LearnETL[Name of the database you are using]
Port Number: 5432 or your required Port Number
User Name: Your database user name
Password: Your database password
IV. Test Connections and OK.

b. JNDI
Here we need to go to the data-integration folder and open up the sub-folder “simple-jndi” and edit jdbc.properties

Here we need to write the following code:

ETL_Connection/type=javax.sql.DataSource
ETL_Connection/driver=org.postgresql.Driver
ETL_Connection/url=jdbc:postgresql://localhost:5432/LearnETL
ETL_Connection/user=postgres
ETL_Connection/password=postgres

ETL_Connection: name of the connection
localhost:5432/LearnETL: localhost is the host name, 5432 is the port number and LearnETL is the Database name.
user: username
password: Password

Save and back to the Database connection
Restart your PDI.

and in the Setup, select JNDI and for
JNDI Name : name of your connection [ETL_Connection]

c. ODBC

This is not commonly used but what will be needed

are as follows:

1.Install the PostgreSQL ODBC driver which can be downloaded.
2.Select the PostgreSQL ODBC Unicode and then
3. Setup
enter Data source Source, Name, Server, Username, Password and Port. Test and Save if Connection is OK.!
4. a bundle of JAR files to be copied in your Java folder as this ODBC bundle files has been discontinued in JAVA 8 Bridge

Thank You
Sohail Izebhijie

Beginner’s Guide to E.T.L (Extract, Transform and Load) – A Basic Process

Loading Data from Source to Target

Before we proceed it’s important to identify the tool you might need to accomplish the process of ETL, in my case i would be using the Pentaho Data Integration Application (keep in mind irrespective of the tool you use the steps or procedures are similar but the approach might be differ).

The following Steps can be followed:

1. Identify your Source, it could be the following:
a. C.S.V file
b. Text file
c. A database
d. and so on

in my scenario a C.S.V (comma separated file)file.

2. Open Up your Spoon.bat
Select a new transformation and select a “Input” then select what type of input you require
we have Text File Input, C.S.V file input, SAS Input, Table Input and so on. In My case since i’ll be using C.S.V file as a Source i’ll select C.S.V file Input Component.

3. Set Up your Connection Based on your preferred connection type in my case i’ll be using the postgreSQL.

[Read my next Blog on setting Up a Connection using Connection type: PostgreSQL]

4. Once Connection has been Established you can Right-Click on the Connection and Select Share if that’s a common Connection to all your transformations will be using this will share the Transformation Connection to other transformation.

5. So we will be Sending data from Source to a Target to we need to to have a “Input” as the source
and an “Output” as the Target.

6. Input

download a C.S.V file from the internet

or even Create a TXT/C.S.V input file
as shown below.

Create a source if required
Text_Source (comma delimited)

Employee_Number,First_Name, Last_Name, Middle_Initial, Age, Gender, Title
101,Ehizogie,Izebhijie,Sohail,24,Male,Developer
102,Fahad,Anjum,WithClause,23,Male,Developer
103,Gayatri,Sharma,A,24,Female,Accountant

Save as txt or csv and this can be your input.

Here since our input is from a csv file
we open Up or C.S.V File Input component
Step Name: Anyname
File Name: Browse the selected path
Delimiter: , (comma)
Enclosure: ”

and then Select Get Fields and Select OK
Preview your Data

7. Output
Open Up Table output component
Select Target Schema
Select The Target Table

[Keeping in mind a Table exist in the DB]

Select OK!

Right-Click on the table output to MAP the columns from Source to Target

Now this is important as the to get the right data from the source to the column in the target
and Then Run

As a Beginner keep in mind that
Errors are a bound to occur
Such as Type of data from
Source Does not Match your Target Table format.
and so on.

Here a we can Have some little transformation step to convert and take care of such format errors
[In my next blog we can look into handling that]

Now Go to your Target Database and Run the SELECT * FROM table_name

there you go!

Thanks
Sohail Izebhijie

Steps to migrate oracle with pentaho

Step 1:-

Run script as DB admin.

Script is available in location:- biserver-ce\data\oracle10g.

Modify configuration file:-

  1. applicationContext-spring-security-hibernate.properties.

Location:-

pentaho-solutions\system\applicationContext-spring-security-hibernate.properties.

original code:-

jdbc.driver=org.hsqldb.jdbcDriver

jdbc.url=jdbc:hsqldb:hsql://localhost:9001/hibernate

jdbc.username=hibuser

jdbc.password=password

hibernate.dialect=org.hibernate.dialect.HSQLDialect

Modified code:-

dbc.driver=oracle.jdbc.OracleDriver

jdbc.url=jdbc:oracle:thin:@localhost:1521/sysdba

jdbc.username=hibuser

jdbc.password=password

hibernate.dialect=org.hibernate.dialect.Oracle10gDialect

  1. hibernate-settings.xml

Location:- pentaho-solutions\system\hibernate\hibernate-settings.xml.

Original code

<config-file>system/hibernate/hsql.hibernate.cfg.xml</config-file>

Modified code:-

<config-file>system/hibernate/oracle10g.hibernate.cfg.xml</config-file>

oracle10g.hibernate.cfg.xml:-

Location:-

pentaho-solutions\system\hibernate\ oracle10g.hibernate.cfg.xml

Do not need to change any code in this file.. Just need to check everything is perfect or not

 

   <!– Oracle 10g Configuration –>

<property name=”connection.driver_class”>oracle.jdbc.driver.OracleDriver</property>

<property name=”connection.url”>jdbc:oracle:thin:@localhost:1521/sysdba

</property>

<property name=”dialect”>org.hibernate.dialect.Oracle10gDialect</property>

<property name=”connection.username”>hibuser</property>

<property name=”connection.password”>password</property>

<property name=”connection.pool_size”>10</property>

<property name=”show_sql”>false</property>

<property name=”hibernate.jdbc.use_streams_for_binary”>true</property>

<!– replaces DefinitionVersionManager –>

<property name=”hibernate.hbm2ddl.auto”>update</property>

<!– load resource from classpath –>

<mapping resource=”hibernate/oracle10g.hbm.xml” />

 

  1. quartz.properties:-

Location:-

pentaho-solutions\system\quartz\quartz.properties

Original Code

org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.PostgreSQLDelegate

Modified Code:-

org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.oracle.OracleDelegate

4.context.xml:-

Location:-

tomcat\webapps\pentaho\META-INF\context.xml

Original Code

<Resource name=”jdbc/Hibernate” auth=”Container” type=”javax.sql.DataSource”

factory=”org.apache.commons.dbcp.BasicDataSourceFactory” maxActive=”20″ maxIdle=”5″

maxWait=”10000″ username=”hibuser” password=”password”

driverClassName=”org.hsqldb.jdbcDriver” url=”jdbc:hsqldb:hsql://localhost/hibernate

validationQuery=”select count(*) from INFORMATION_SCHEMA.SYSTEM_SEQUENCES” />

 

<Resource name=”jdbc/Quartz” auth=”Container” type=”javax.sql.DataSource”

factory=”org.apache.commons.dbcp.BasicDataSourceFactory” maxActive=”20″ maxIdle=”5″

maxWait=”10000″ username=”pentaho_user” password=”password”

driverClassName=”org.hsqldb.jdbcDriver” url=”jdbc:hsqldb:hsql://localhost/quartz

validationQuery=”select count(*) from INFORMATION_SCHEMA.SYSTEM_SEQUENCES”/>

 

Modified Code:-

<Resource validationQuery=”select 1 from dual”

url=” jdbc:oracle:thin:@localhost:1521/sysdba

driverClassName=”oracle.jdbc.OracleDriver” password=”password”

username=”hibuser” maxWait=”10000″ maxIdle=”5″ maxActive=”20″

factory=”org.apache.commons.dbcp.BasicDataSourceFactory”

type=”javax.sql.DataSource” auth=”Container” name=”jdbc/Hibernate”/>

                       

<Resource validationQuery=”select 1 from dual”

url=” jdbc:oracle:thin:@localhost:1521/sysdba

driverClassName=”oracle.jdbc.OracleDriver” password=”password”

username=”quartz” maxWait=”10000″ maxIdle=”5″ maxActive=”20″

factory=”org.apache.commons.dbcp.BasicDataSourceFactory”

type=”javax.sql.DataSource” auth=”Container” name=”jdbc/Quartz”/>

6. repository.xml

Location of the file: pentaho-solutions\system\jackrabbit\repository.xml.

Comment this code means (<! – – every thing here – -> )

Active means: Remove comment

i)                    FileSystem part

Comment this code

<FileSystem class=”org.apache.jackrabbit.core.fs.local.LocalFileSystem”>

     <param name=”path” value=”${rep.home}/repository”/>

</FileSystem>

Active this code:-

<FileSystem class=”org.apache.jackrabbit.core.fs.db.OracleFileSystem”>

   <param name=”url” value=”jdbc:oracle:thin:@localhost:1521“/>

   <param name=”user” value=”jcr_user”/>

   <param name=”password” value=”password”/>

   <param name=”schemaObjectPrefix” value=”fs_repos_”/>

   <param name=”tablespace” value=”pentaho_tablespace”/>

</FileSystem>

ii)                  DataStore part

Comment this code

<DataStore class=”org.apache.jackrabbit.core.data.FileDataStore”/>

Active this code:-

<DataStore class=”org.apache.jackrabbit.core.data.db.DbDataStore”>

<param name=”url” value=”jdbc:oracle:thin:@localhost:1521/sysdba”/>

<param name=”driver” value=”oracle.jdbc.OracleDriver”/>

<param name=”user” value=”jcr_user”/>

<param name=”password” value=”password”/>

<param name=”databaseType” value=”oracle”/>

<param name=”minRecordLength” value=”1024″/>

<param name=”maxConnections” value=”3″/>

<param name=”copyWhenReading” value=”true”/>

<param name=”tablePrefix” value=””/>

<param name=”schemaObjectPrefix” value=”ds_repos_”/>

</DataStore>

iii)                Security part in the FileSystem Workspace part

Comment this code:-

<FileSystem class=”org.apache.jackrabbit.core.fs.local.LocalFileSystem”>

<param name=”path” value=”${wsp.home}”/>

</FileSystem>

Active this code:-

<FileSystem class=”org.apache.jackrabbit.core.fs.db.OracleFileSystem”>

<param name=”url” value=”jdbc:oracle:[email protected]:1521/sysdba”/>

<param name=”user” value=”jcr_user”/>

<param name=”password” value=”password”/>

<param name=”schemaObjectPrefix” value=”fs_ws_”/>

<param name=”tablespace” value=”pentaho_tablespace”/>

</FileSystem>

iv)       PersistenceManager part

Comment this code:-

<PersistenceManager class=”org.apache.jackrabbit.core.persistence.pool.H2PersistenceManager”>

<param name=”url” value=”jdbc:h2:${wsp.home}/db”/>

<param name=”schemaObjectPrefix” value=”${wsp.name}_”/>

</PersistenceManager>

Active This Code:-

<PersistenceManager class=”org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager”>

<param name=”url” value=”jdbc:oracle:thin:@localhost:1521/sysdba”/>

<param name=”driver” value=”oracle.jdbc.OracleDriver”/>

<param name=”user” value=”jcr_user”/>

<param name=”password” value=”password”/>

<param name=”schema” value=”oracle”/>

<param name=”schemaObjectPrefix” value=”${wsp.name}_pm_ws_”/>

<param name=”tablespace” value=”pentaho_tablespace”/>

</PersistenceManager>

v)       FileSystem Versioning part

Comment This Code:-

<FileSystem class=”org.apache.jackrabbit.core.fs.local.LocalFileSystem”>

<param name=”path” value=”${rep.home}/version” />

</FileSystem>

Active This Code:-

<PersistenceManager class=”org.apache.jackrabbit.core.persistence.bundle.OraclePersistenceManager”>

<param name=”url” value=”jdbc:oracle:thin:@localhost:1521/sysdba”/>

<param name=”driver” value=”oracle.jdbc.OracleDriver”/>

<param name=”user” value=”jcr_user”/>

<param name=”password” value=”password”/>

<param name=”schema” value=”oracle”/>

<param name=”schemaObjectPrefix” value=”pm_ver_”/>

<param name=”tablespace” value=”pentaho_tablespace”/>

</PersistenceManager>

 

 

Stopping HSQL db start up

In web.xml file

Comment or delete this code (Commenting is preferable)

<!– [BEGIN HSQLDB DATABASES] –>

<context-param>

<param-name>hsqldb-databases</param-name>

<param-value>sampledata@../../data/hsqldb/sampledata,hibernate@../../data/hsqldb/hibernate,quartz@../../data/hsqldb/quartz</param-value>

</context-param>

<!– [END HSQLDB DATABASES] –>

 

Also comment this code

<!– [BEGIN HSQLDB STARTER] –>

<listener>

<listener-class>org.pentaho.platform.web.http.context.HsqldbStartupListener</listener-class>

</listener>

<!– [END HSQLDB STARTER] –>

You have done with integrating pentaho 5.0.1 CE with Oracle

Now login to the Pentaho server .

URL:  http://localhost:8080/pentaho

Username/Password : Admin/password

SCD Type 1 Implementation on Pentaho Data Integrator

This blog will talk about SCD Type 1 Implementation on Pentaho Data Integrator

Slowly Changing Dimension Type 1 does not preserve any historical versions of the data .
This methodology overwrites old data with new data, and therefore stores only the most current information. In this article let’s discuss the step by step implementation of SCD Type 1 using Pentaho.
The number of records we store in SCD Type 1 does not increase exponentially as this methodology overwrites old data with new data
Create table in Database for source and target and Create connections for database.

Table Input Step:

SCD Penthao

Drag and Drop the table input in spoon workspace and give the connection, then click on get select statement.

Then Click on ok.

Database Lookup:-

Drag And Drop The Database Lookup. Double Click on Database Lookup And Set The Following properties  and then click on ok.:-

SCD Penthao 2

Filter Rows:-

Set The following properties  of filter rows and click on ok.

SCD Penthao 3

Select Values:-

Before you work on select values, connect the table output and specify all the database field on to the output step.

SCD Penthao 4

Table Output :-

SCD Penthao 5

Update:-

Set The Following On Update  and click on ok.

SCD Penthao 6

Program should Like the Following Figure :-

SCD Penthao 7

 

For any other query related to Pentaho Data Integrator, get in touch with us at Helical IT Solutions

Bidyut Kumar

Helical IT Solutions