JNDI

Java Naming and Directory Interface (JNDI)

Using JNDI, applications running in distributed environment, helps to look up services independently.

How JNDI is useful?

  1.  During deployment sequence, the same JNDI name can be used in different servers.
  2.  It helps to provide security of credential leakage to the user groups like developer, testers, stagers etc, because the credentials are configured in the server not in the application.
  3.  Connection Pooling can be achieved

How to use JNDI?
In order to use, JNDI classes and service providers should be there
The service provider may be :

  1. Lightweight Directory Access Protocol (LDAP)
  2. Common Object Request Broker Architecture (CORBA)
  3. Common Object Services (COS) name service
  4. Java Remote Method Invocation (RMI) RegistryThe objects needs to be created and registered so that , they can be further used for lookup.

How to configure JNDI in tomcat?
In tomcat environment we can configure JNDI in 3 different places
1. Inside the application using context.xml in META-INF directory.
2. Inside the tomcat server context.xml : Inside the apache-tomcat/conf directory we can configure. This can be helpful when multiple application needs the same database.
3. Both in server.xml and context.xml

NB: The database driver jar should be present in the tomcat lib directory

<Resource name="jdbc/lookupName" 
global="jdbc/lookupName" 
auth="Container" 
type="javax.sql.DataSource" 
driverClassName="com.mysql.jdbc.Driver" 
url="jdbc:mysql://localhost:3306/hdi" 
username="hiuser" 
password="hiuser" 

maxActive="100" 
maxIdle="20" 
minIdle="5" 
maxWait="10000"/>

In order to use it in java program we have to use the followng
DataSource ds = (DataSource) ctx.lookup(“java:/comp/env/jdbc/lookupName”);

Let’s Have Fun with Triggers in SQL

Let’s Have Fun with Triggers in SQL

What are Triggers?

Triggers are a Block of Structure that executes when a DML Statement is performed or Executed. Triggers are fun to use but from a business point of view not advisable but that a different story entirely ie the Pro’s and Con’s. but for gaining an idea about Triggers here is what the Video Covers

  1. Basic Creation of a Trigger
  2. and a simple real-time scenario on how i used triggers

The Video here illustrates or covers the above points.

SPRING JDBC

While working with database using plain old JDBC, it becomes cumbersome to write unnecessary code to handle exceptions, opening and closing database connections etc. But Spring JDBC Framework takes care of all the low-level details starting from opening the connection, prepare and execute the SQL statement, process exceptions, handle transactions and finally close the connection.

 

Spring JDBC provides several approaches and correspondingly different classes to interface with the database

The Spring Framework’s JDBC abstraction framework consists of four different packages, namely

core

datasource

object

support

1.Core

org.springframework.jdbc.core.

SubPackage:

1.1 org.springframework.jdbc.core.simple

1.2 org.springframework.jdbc.core.namedparam

  1. Datasource

org.springframework.jdbc.datasource

SubPackage:

2.1 org.springfamework.jdbc.datasource.embedded

3.object

org.springframework.jdbc.object

4.support

org.springframework.jdbc.support

 

org.springframework.jdbc.core.

This package contains the JdbcTemplate class and its various callback interfaces, plus a variety of related classes.

It’s subpackage named org.springframework.jdbc.core.simple contains the SimpleJdbcInsert and SimpleJdbcCall classes. Another subpackage named org.springframework.jdbc.core.namedparam contains the NamedParameterJdbcTemplate class and the related support classes.

 

The org.springframework.jdbc.datasource package contains a utility class for easy DataSource access, and various simple DataSource implementations that can be used for testing and running unmodified JDBC code outside of a Java EE container. A subpackage named org.springfamework.jdbc.datasource.embedded provides support for creating in-memory database instances using Java database engines such as HSQL and H2.

 

Here, we are using the JdbcTemplate object in the EmployeeDao class, so we are passing it by the setter method but you can use constructor also.

<?xml version=”1.0″ encoding=”UTF-8″?>

<beans

xmlns=”http://www.springframework.org/schema/beans”

xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”

xmlns:p=”http://www.springframework.org/schema/p”

xsi:schemaLocation=”http://www.springframework.org/schema/beans

http://www.springframework.org/schema/beans/spring-beans-3.0.xsd”>

<bean id=”ds” class=”org.springframework.jdbc.datasource.DriverManagerDataSource”>

<property name=”driverClassName” value=”oracle.jdbc.driver.OracleDriver” />

<property name=”url” value=”jdbc:oracle:thin:@localhost:1521:xe” />

<property name=”username” value=”system” />

<property name=”password” value=”oracle” />

</bean>

<bean id=”jdbcTemplate” class=”org.springframework.jdbc.core.JdbcTemplate”>

<property name=”dataSource” ref=”ds”></property>

</bean>

<bean id=”edao” class=”com.helical.hdidao”>

<property name=”jdbcTemplate” ref=”jdbcTemplate”></property>

</bean>

</beans>

DB Partitioning

dbpartition

Partitioning enhances the performance, manageability, and availability of a wide variety of applications and helps reduce the total cost of ownership for storing large amounts of data. Partitioning allows tables, indexes, and index-organized tables to be subdivided into smaller pieces, enabling these database objects to be managed and accessed at a finer level of granularity.

Partitioning is a rather general concept and can be applied in many contexts. When it considers the partitioning of relational data,

it usually refers to decomposing your tables either row-wise (horizontally) or column-wise (vertically).

–       Horizontal partition(Row-wise partitioning) :- It involves putting different rows into different tables. Perhaps customers with ZIP codes less than 50000 are stored in CustomersEast, while customers with ZIP codes greater than or equal to 50000 are stored in CustomersWest. The two partition tables are then CustomersEast and CustomersWest, while a viewwith a union might be created over both of them to provide a complete view of all customers.

–       Vertical partition(Column-wise partitioning) :- It involves creating tables with fewer columns and using additional tables to store the remaining columns. Normalization also involves this splitting of columns across tables, but vertical partitioning goes beyond that and partitions columns even when already normalized. Different physical storage might be used to realize vertical partitioning as well; storing infrequently used or very wide columns on a different device,

for example, is a method of vertical partitioning. Done explicitly or implicitly, this type of partitioning is called “row splitting” (the row is split by its columns). A common form of vertical partitioning is to split dynamic data (slow to find) from static data (fast to find) in a table where the dynamic data is not used as often as the static. Creating a view across the two newly created tables restores the original table with a performance penalty, however performance will increase when accessing the static data e.g. for statistical analysis.

Other Example:–

assume you have a table like this:
  create table data (    id integer primary key,     status char(1) not null,     data1 varchar2(10) not null,     data2 varchar2(10) not null);

One way to partition data vertically: Split it as follows:
create table data_main (    id integer primary key,    status char(1) not null,    data1 varchar2(10) not null ); create table data_rarely_used (    id integer primary key,    data2 varchar2(10) not null,    foreign key (id) references data_main (id) );

This kind of partitioning can be applied, for example, when you rarely need column data2 in your queries. Partition data_main will take less space, hence full table scans will be faster and it is more likely that it fits into the DBMS’ page cache. The downside: When you have to query all columns of data, you obivously have to join the tables, which will be more expensive that querying the original table.

Notice you are splitting the columns in the same way as you would when you normalize tables. However, in this case data could already be normalized to 3NF (and even BCNF and 4NF), but you decide to further split it for the reason of physical optimization.

One way to partition data horizontally, using Oracle syntax:
create table data (    id integer primary key,     status char(1),     data1 varchar2(10),     data2 varchar2(10) )    partition by list (status) (        partition active_data values ( ‘A’ ),       partition other_data values(default)     );

This would tell the DBMS to internally store the table data in two segments (like two tables), depending on the value of the column status. This way of partitioning data can be applied, for example, when you usually query only rows of one partition, e.g., the status ‘A’ rows (let’s call them active rows). Like before, full scans will be faster (particularly if there are only few active rows), the active rows (and the other rows resp.) are stored contiguously (they won’t be scattered around pages that they share with rows of a different status value, and it is more likely that the active rows will be in the page cache.

Different types of models used by databases

There are a lot of databases out there in the market, lets understand the models types and the corresponding usage.

Different types of models used by databases
• Relational DBMS                          • Wide column stores                                   • Content stores
• Document stores                          • Event stores                                                • Graph DBMS
• Key-value stores                          • Multivalue DBMS                                        • Native XML DBMS
• Navigational DBMS                      • Object Oriented DBMS                             • RDF Stores
• Search Engines

 

Relational databases: It is a collection of data items organized in set often called as tables. All the tables are related to each other. SQL statements are used to fetch, update, and delete the information from these tables.
Popular relational databases are
Oracle                                                                         • MySQL
Microsoft SQL Server                                               • PostgreSQL

 

Wide column stores: This type of databases stores data in a form of records with an ability to hold any number of dynamic columns. This type of databases doesn’t follow a specific schema. You can think of storing of the data in two-dimensional key-value.
Popular wide column stores databases are
Cassandra                                               • HBase                                                       • Accumulo

 

Content stores: They are also called as content repositories, specialized in management of digital content, such as text, pictures or videos, including their metadata. Some of the features included are full text search, versioning, hierarchical structured content, and access control.
Popular content stores databases are
Jackrabbit                                                                • Modeshape

 

Document stores: They are also called as document oriented databases systems, and characterized by their schema-free organization of data. They store records and each record may have different columns. Document stores often use internal notations, which can be processed directly in applications, mostly JSON.
Popular document stores databases are
MongoDB                                 • CouchDB                                                       • Couchbase

 

Event stores: They persists all state changing events for an object together with a timestamp, thereby creating time series for individual objects. The current state of an object can be inferred by replaying all events for that object from time 0 till the current time.
Popular event stores are
InfluxDB                                          • Event Store

 

Graph DBMS: Graph DBMS, also called graph-oriented DBMS or graph database, represent data in graph structures as nodes and edges, which are relationships between nodes. They allow easy processing of data in that form, and simple calculation of specific properties of the graph, such as the number of steps needed to get from one node to another node.
Popular graph event database are
Neo4j                                                • Titan

 

Key-value stores: They can only store pairs of keys and values, as well as retrieve values when a key is known. One of the examples of the key-value stores is mentioned below
Key Value
Student1_maths 100
Student1_english 80
Student2_maths 100

Popular Key-value stores are
DynamoDB                                           • Redis                                                       • Memcached

 

Multivalue DBMS: It is also called as multidimensional database. It is very similar to Relational databases, however it differ from relational databases in that they have features that supports to use of attributes of values, rather than all attributes being single-valued.
Popular Multivalue databases are
Adabas                       • D3                          • UniData,Universe                                 • jBASE

 

Native XML DBMS: This type of databases internal data model corresponds to XML document. Native XML DBMS do not necessarily store data as XML documents, they can use other formats for better efficiency.
Generally defines a logical model for an XML document – as opposed to the data in that document – and stores and retrieves documents according to that model. Has an XML document as its fundamental unit of storage.
Popular Native XML DBMS
MarkLogic                    • Sedna

 

Navigational DBMS: This type of databases allows access to data sets only via linked records. They were the first established systems able to manage large amounts of data.
Popular Navigational DBMS
IMS                              • IDMS

 

Object Oriented DBMS: This type of databases stores the information in the form of objects as used in the object oriented programming. Object oriented databases are different from relational databases which are table oriented.
An object oriented DBMS follows an object oriented data model with classes (the schema of objects), properties and methods. An object is always managed as a whole. This means for example, that the insertion of an object, which in a relational system would probably be stored in multiple tables, will be performed automatically as one atomic transaction – without any action by the application program. Reading an object can also be done as a single operation and without complex joins.
There are tools and architectures that are now provided for the storage of objects into relational databases (such as Hibernate or JPA).
Popular object oriented databases are
Cache                                         • ObjectDB

 

RDF Stores: The Resource Description Framework (RDF) is a methodology for the description of information, originally developed for describing metadata of IT resources. Today it is used much more generally, often in connection with the sematic web, but also in other applications.
The RDF model represents information as triples in the form of subject-predicate-object.
Database management systems, which are able to store and process such triples, are called RDF stores or triple stores.
Popular RDF stores are
AllegroGraph                               • Jena

 

Search Engines: Search engines are NoSQL database management systems dedicated to the search for data content. In addition to general optimization for this type of application, the specialization consists in typically offering the following features:
• Support for complex search expressions
• Full text search
• Stemming (reducing inflected words to their stem)
• Ranking and grouping of search results
• Geospatial search
• Distributed search for high scalability
Popular search engines are:
Elasticsearch                               • Solr                                           • Sphinx

Change Liferay Database

This blog will talk about how to change liferay database.

Liferay comes with a default database called “HSQL” or “Hypersonic”. This is not meant for production use however! You might need to switch to a real database to use Liferay. This page documents how to change the default database system.

Database configuration in 6.0, 6.1 is the same as 5.2.

If you need to change the database of liferay, you can choose any database e.g MySQL, PostgreSQL etc. You can change database of liferay in two ways

1. If you are installing liferay change database at that time.

2. If you have already installed liferay, you will have to make some changes.

Change Database:

If you have installed LFR, take backup of your LFR data (called Data Migration).  If you want to take backup of stored LFR data follow these steps:

1. Create a database lportal (or any name) in your database.

2. Login as administrator in LFR.

3. Select Go To –> Control Panel.

4. Select Server –> Server Administration.

Liferay change database

5. Select Data Migration Tab

Liferay change database 2

6. Fill all the details (JDBC Driver Class Name, JDBC URL, JDBC User Name, JDBC Password) in text boxes.

Liferay change database 3

(For Postgres SQL DB)

 

7. Select Execute

See the LFR server console, You can see the process of data migration. When the process is completed, restart the LFR Server.

As you can see in lportal database, there is lots of tables which contains LFR data.

 

Connect LFR to database:

Now, we have to connect LFR to database. The configuration of the database is set in a single file, portal-ext.properties. To store the configuration of database, follow these steps:

1. Open new text file.

2. Write configuration of database, e.g

For PostgreSQL:

jdbc.default.driverClassName=org.postgresql.Driver

jdbc.default.url=jdbc:postgresql://localhost:6666/lportal?useUnicode=true&amp;characterEncoding=UTF-8

jdbc.default.username=@@@@

jdbc.default.password=$$$$$$

For MySQL:

jdbc.default.driverClassName=com.mysql.jdbc.Driver

jdbc.default.url=jdbc:mysql://localhost:3306/lportal?useUnicode=true&characterEncoding=UTF-8&useFastDateParsing=false

jdbc.default.username= @@@@

jdbc.default.password= $$$$$

 

3. Save as portal-ext.properties in {liferay-home}\{tomcat}\webapps\ROOT\WEB-INF\classes

4. Restart Liferay Server.

Now, LFR database has been changed.

 

NOTE:   If you are going through Data Migration Process, then JDBC URL ( jdbc:mysql://localhost:3306/lportal  in this example) should be same at both place, when you are doing Data Migration & when you are writing Configuration in portal-ext.properties.

 

Sharad Sinha

[email protected]

Helical IT Solutions

Creating a Federated Tables inside MySQL

What is Federated Table?

A federated database system is a type of meta-database management system (DBMS), which transparently maps multiple autonomous database systems into a single federated database.

A Federated Table is a table which points to a table in another MySQL database instance (mostly on another server). It can be seen as a view to this remote database table. Other RDBMS have similar concepts for example database links.

federated table

Federated Tables Concept In MYSQL(linux) :

  1. Federated Storage Engine : Using Federated Storage Engine,we will be able to access data from tables of Remote databases rather than local databases. This Federated Engine concept is available from MYSQL 5.3.0 Onwards

Example : Suppose you want to access some data from the DataBase that is on one server1,

Also another set of data from different server2, but data on these two servers are not accessible through out the world , Rather these are available only on one server3 , so what you can do is :

Simply create “Federated” tables for all those tables from which you want to access data on server3 from server1 & server2. When using a FEDERATED table, queries on the server3 are automatically executed on the remote (federated) tables. No data is stored on the local tables.

NOTE :  By default Federated Storage Engine will not enabled on the mysql

Federated Table 2But if you want to verify whether federated engine is on or off, “SHOW ENGINES” sqlquery can help to identify the status.

federated table 2If value of Support column is “YES” then Federated Engine is Turned On, Else If it is “No” it means it is Turned Off.

To enable federated engine ,

  1. You add the line ‘federated’ to the /etc/my.cnf file after  [mysqld] section
  2. restart the mysql server

Federated Table 1

  1. Federated Table Creation : It has two elements :
    1. Remote server : Data is physically stored in this server
    2. Local Server :      No data is stored here,simply you can fire the query and you will be able to see your required output.

Example :

  1. Suppose you have a table on the Remote Server (say Master Table : “student_test”)

Data Base Name is :  STUDENT_DB that you want to   access by using a Federated table

 

CREATE TABLE student_test(

sid     INT(20) NOT NULL AUTO_INCREMENT,

sname   VARCHAR(32) NOT NULL DEFAULT ,

PRIMARY KEY  (id),

)ENGINE=MyISAM

DEFAULT CHARSET=latin1;

 

2. Next,You have to create a table on the local database server(say Federated Table :”Student_federated)” DataBase Name is STUDENT_FEDERATED to access the data on the Master Table

 

CREATE TABLE Student_federated(

sid     INT(20) NOT NULL AUTO_INCREMENT,

sname   VARCHAR(32) NOT NULL DEFAULT ,

PRIMARY KEY  (id),

)ENGINE=FEDERATED

DEFAULT CHARSET=latin1;

CONNECTION=’scheme://username:[email protected]:port/DBname/TableName;

For the above scenario , The connection String should be :

CONNECTION=’mysql://root:[email protected]:3306/STUDENT_DB /Student_test;

 

Sequence of execution:

– When a client issues an SQL statement that refers to a FEDERATED table, the flow of information between the local server (where the SQL statement is executed) and the remote server (where the data is physically stored) is as follows:

– The storage engine looks through each column that the FEDERATED table has and constructs an appropriate SQL statement that refers to the remote table.

– The statement is sent to the remote server using the MySQL client API.

– The remote server processes the statement and the local server retrieves any result that the statement produces (an affected-rows count or a result set).

If the statement produces a result set, each column is converted to internal storage engine format that the FEDERATED engine expects and can use to display the result to the client that issued the original statement.

 

Now You are able to get your required output by firing sql query

 

Thankyou

Helical IT Solutions

Change Data Capture (CDC) – Methods, Approaches, Alternatives

Change data capture :

It’s the process of tracking the data that has changed so that corresponding action can be taken. Its nothing but capturing the changes which are made on the data source and applying it at all the other places, and hence it reduces the efforts required at the ETL step.

 

Different strategies :

  • Push: the source process creates a snapshot of changes within its own process and delivers rows downstream. The downstream process uses the snapshot, creates its own subset and delivers them to the next process.
  • Pull: the target that is immediately downstream from the source, prepares a request for data from the source. The downstream target delivers the snapshot to the next target, as in the push model.

 

Usage of Change Data Capture:

Change data capture is often used in data warehousing since it involves extraction and transformation of data from one or more databases to data warehouse for analysis.

Prior to CDC, the change data was captured using other methodologies i.e. table difference or table comparison. Its like taking a copy of the source and comparing with target, spotting the difference. For limited number of records this approach might work, but for more number of records this methodology will be highly inefficient and result in immense overheads.

 

Table differencing: It involved copying entire table from source to staging. But since, the entire table was shifted, there were additional overheads and this method was not efficient. Also, in case if there are multiple changes in the data, it cant be captured using this methodology.

 

Change value selection: This involves capturing source database changed values based on the value of a specific colum. However for implementing this we need to fire queries to find out the changed data which is an additional overhead. Again, in this method also we will not be able to find or capture the intermediate values. Also the changes happening between firing of two queries will not be captured.

 

A slightly better approach than the ones mentioned above is by using database triggers. This triggers can be used after any Insert/Delete/Modify statement whenever there is any change, the changed information will be passed on to a table. The only disadvantage of this solution is it is database specific and migration of such a solution becomes complex since its db specific.

 

Another approach would be the usage of log readers. Many Enterprise DBMS have capability of transaction logs that can be used (can be switched on and off), the amount & details of information in log to be stored can be decided as well. By reading this we can come to know of the changes happening in the system. The problem with this is purchasing of new component for log reading, migration might be an issue, also some DBs might not have the logging features etc. Also sometimes some changes which are not committed, their logs are also created which could be as well an issue.

 

Capturing data with CDC:

CDC can capture data using two modes :: Synchronous and Asynchronous. In synchronous mode, any changed data in source system is informed immediately to the target using DML (data manipulation language) operations, whereas that’s not the case with Asynchronous mode. Synchronous uses triggers and it results in no latency or delay, though it has more overheads as compared to asynchronous.  
Benefits of CDC

–          CDC can improve performance and reduce redundancy by copying only the changed data.

–          Log based CDC also reduces load on operational systems since it only reads the logs rather than db

–          No latency, hence end business users can have immediate realization of the same in their BI

–          CDC can easily handle high volume transactions

 

We can implement CDC using Talend Integration Suite. More information about the same will be covered in another blog.

 

For any query, please reach out to me on [email protected]