Helical Insight for ETL Tester (QA)

Helical Insight for ETL Tester (QA)

Let’s take an example that your ETL developer has just wrote an ETL script, which loaded the data into the Data Warehouse, few millions of records got inserted into the Data Warehouse.

For QA, it is super difficult to test all (millions) of the data and also a time consuming process. Moreover, there are many other problems like

  • some might not be well versed with writing complex SQL queries
  • There might be multiple data sources involved in generating the Data Warehouse.
  • Writing multiple queries on multiple data sources and then combining it in one single result set would be a sweat breaking task.

Now think of Helical Insight as your friend which helps you to do most of your tasks, allowing you to focus on more critical thinking rather than just doing stare and test whole day. Wondering how Helical Insight can help you achieve the same?

Here is your answer,

  • Helical Insight is capable of communicating with any of your data source(s) like CSV, Excel, MySql, Oracle, SQL Server, Postgres etc.
  • Helical Insight has a drag and drop feature to make your test cases ready, without even having a prior knowledge of SQL queries. Helical Insight generates the SQL queries including the joins between multiple entities. Helical Insight understands your data structure, hence it generates the query even if you are working with different set of data source(s), DW or normalized database, Helical Insight understands all.
  • Using this (drag and drop) facility you can generate all your test cases in a form of report, which you can call it as your test cases. For example, your report does a count of records for a dimension. You can save this report and check the same with source system by creating a separate report. If its a migration project the count should definitely match otherwise we know something is wrong with the ETL Script.
  • Once you have saved these reports (test cases) you can re-executes them again an again to verify your data. In the above situation you have just identified that Helical Insight is really a very useful tool with some really cool features like generating the SQL query for any database or data source(s), ability to execute the queries on any databases or data source(s), ability to save the report / test cases / SQL queries for future usage.

Staging DB in Datawarehousing – The WHYs and WHY NOTs

Staging DB in Datawarehousing – The WHYs and WHY NOTs

Whenever we talk about designing datawarehouses, staging database is one consideration we have to make. And it depends on many factors whether you choose to have one or not. I feel when we start small, like implementing small datamarts, we may not see the need for a staging database but as the DW implementations grow in size and complexity, staging DBs have a lot of benefits to offer. Staging DBs are often the place to dump source data, the workplace to create temperory intermediate tables for processing the transformation steps in ETLs, to keep what is good, to reject what is bad and so on.

Lets quickly look through the PROs and CONs of having Staging DBs in the DW implementations:

The WHYs

Source data accumalation: If data is being pulled from multiple systems Source systems and not all data is available at the same time, we may need to accumalate all raw data pulled at different times and then integrate/merge and load the same.

Data transformation, cleansing, data alignment: Staging DB may be the workplace for the ETL to do the required cleansing, apply transformation rules and perform alignment of data from various systems.

Decoupling of Source from Datawarehouse: Staging DB helps decouple the Source systems from the DW. Source systems are accessed only by ETL jobs which pull data into staging, or they may even choose to push data there. This greatly reduces the amount of time that source systems are accessed by DW loading jobs, reducing contention too.

ETL Performance: Loading staging tables is simple as they are mostly the exact replicas of the source tables, minus the indexes / foreign key relationships. As most of the joins on the data , cleansing, calculations are done on staging tables and may be stored in intermediate temporary tables, loading of DW from staging tables is also simpler. This greatly increases the ETL performance as well as quality of the DW data.

Handle Failure, debugging, restart , recovery: As staging DBs have snapshots of source data, can have extensive logging, intermediate tables used for data transformations etc, they make it easier to debug errors/failures. In case of restarts / failures, the step of pulling the source data may not have to be redone.

Data reconciliation / pre / post load validations : This is very much similar to the point above. Staging DBs are excellent to capture auditing information which can be used for reconciling data, validating data to reject bad data, make sure the data we loaded is correct , consistent with the source, nothing is missed out.

Change Data Capture: Staging DB is also used as one of the mechanisms for CDC. For this the copy of source data tables from previous load are preserved till the next execution. The current and previous source data are compared to identify changes to the data. This is mostly used in scenarios where the source systems do not have means to support change detections.

Simpler, manintenable ETLs: With staging, we divide the process of Extract Transform Load into different processes. Combining the extraction, cleansing, transformation logic, precalculations, summarizations into one single ETL makes it very complex and also a single point for failure. With staging step, ETLs are smaller, simpler and easy to maintain as well.

Prevent Orphan records / race conditions: When we fetch data from source directly and load into DW, there are chances that we sometimes may pull some partial updates happening in the transactional system or the new data might be getting inserted as we pull causing race conditions. With Staging, the data from source is pulled in one single go preventing such problems.

Backup of source: The staging DBs if archived / retained for some specific period of time may act like a backup of the source data which is loaded into the DW.



Increases processing time , as data is first loaded into staging tables, transformed and then loaded to the DW. Due to the additional step, there is slight increase in processing time.
Increases latency, source changes take more time to reflect in DW. In realtime/near realtime DW, staging is not usually included.
Environment requirements : Additional Database servers / storage space required.

The overhead that staging may impose is quite less as compared to the benefits we derive. But then again, it completely depends on what suits a given requirement. The above points may however help us in making a choice.

Shraddha Tambe | Helical IT Solutions


1) http://data-warehouses.net/architecture/staging.html
2) http://dwbi.org/etl/etl/52-why-do-we-need-staging-area-during-etl-load
3) http://danlinstedt.com/allposts/datavaultcat/data-vault-and-staging-area/
4) http://www.dwh-club.com/dwh-bi-articles/change-data-capture-methods.html

Why build a Data warehouse?

Why build a Data warehouse?

Often when we talk about implementing BI, the first thing we need to look at is how and where is the data? Is it scattered? Is it easy to report on? With this we often need to know if we need to build a reporting database or a data mart or for that matter a data warehouse. We all know WHAT a data warehouse essentially is; here we are going to discuss the WHY.

Consolidate/Integrate Data:

Usually the data in an enterprise is scattered across the various applications and data sources that form the enterprise’s ecosystem. To aid better insights and business decisions, the business users need data from all the various subject areas to be accessible. DWH helps us to consolidate and integrate data from various sources and store in a single data model.


Cleanse data & Enhance data Quality:

DWH includes converting data from different sources into common formats, using common keys, structure, decimal/time formats. We can restructure the data and rename tables and fields so if makes more sense to the users. We can also use master data management to consolidate common data from various sources and build relationships.


Store Historical Data:

DWH is usually used to store huge amounts of historical data that the transactional systems do not /cannot retain. This helps analytics across different time periods, trend analysis.


Speed up Data Retrieval:

In a DWH, the data is structured in such a way that aids faster retrieval. The structure is designed for faster data access, drill down, filtering, and summarization. The transactional systems usually optimized for smaller transactions that inserts/update data in specific tables and access small amount of data. However, BI requires analyzing, aggregating a large amount of data quickly.DWH stores data to help such retrievals.


Unburden transactional Systems:

Executing resource intensive BI queries that access huge amount of data on transactional systems may affect these systems. It may even bring them down affecting business. Building a DWH separate from the transactional systems solves this problem.


Prepare / Pre-aggregate data:

Depending on the business needs, data can be pre-aggregated at various levels and stored in the DWH. This can save time and improve performance of the BI reports. The KPIs that help the business users can also be pre-calculated and stored. Users can then analyse these indicators across dimensions to gain knowledge and insights.


Support Self Service BI:

The DWH either uses dimensional model or easily feeds dimensional models, which can be used to build analytical cubes. These cubes can be used by business users to do Adhoc reporting, slice-n-dice the data and build their own reports.


Increased findability for Business users:

In a DWH, the data is stored in ways that is more meaningful to the business users. Also, the data relations can be easily established. The table names, field names are business user friendly. These make it easier for users to find the data they were looking for.


Helps Data mining / trend analysis / Predictive Analytics:

Once a DWH is created, it makes easier to analyse huge amounts of data and associations to uncover hidden patterns using data mining methodologies. The historical data in DWH can we analysed across time periods to discover trends / perform business predictions.
These are some major reasons / benefits of building a DWH. Though building DWH is a time consuming and complex process, it solves many problems and delivers effective BI.

Shraddha Tambe | Helical IT Solutions

About Analytical Databases

Analytics-iconAbout Analytical Databases 

Companies and Corporates that build Business Intelligence solutions for their business, often want to consolidate their data that is spread across various applications, for reporting and analysis purposes. For that, they build datawarehouses or data marts as the foundation of their enterprise-wide BI solutions. These DWs have been implemented using traditional relational databases and successfully so. But as organizations add more and more data and users to these systems, the scabiltity and performance become crucial. One of the useful contributors to solving the problems of scability / performance could be the use of Analytical Databases or Datawarehousing DBs.


What are these Analytical Databases ?

Analytics databases are optimized to be used for retrieval, aggregation and analytics. They deliver benefits in a BI environment because they are designed for high performance in this environment.

These Analytical databases, or analytical platforms, span a range of technology from appliances and columnar databases to shared nothing, massively parallel processing databases. The common thread among them is that most are read-only environments that deliver exceptional price/performance compared with general-purpose relational databases originally designed to run transaction processing applications. Analytical databases are now a central component in any BI ecosystem, providing the necessary performance and scalability to support large numbers of analytical users and growing volumes of both structured and unstructured data.

Below are a few differences between transactional and analytical DBs:

  • On-line transactional procesing (OLTP) databases are designed to host operational systems that run the business. Analytics databases step in when operational use of the data has ended, and the data are to be used for information and analysis.
  • OLTP-oriented databases, are designed for high performance in an on-line transactional environment, where additions, updates, and deletions are being continuously made to individual or small numbers of records in a random manner over time, to data scattered accross tables. In an analytics database, there are mostly bulk uploads of data, deletions and update are rare / infrequent.
  • Data integrity is critical to transactional data, OLTP databases are optimal for these.
  • OLTP databases support environments where many concurrent users are retrieving, adding, updating, and deleting data on realtime basis. Analytics databases, on the other hand, support environments where very many concurrent users are retrieving data but adds, updates, and deletes are done periodically, on scheduled basis.
  • In OLTP environments, retrievals are oriented towards single or few records where many or most columns from these records are required. In analystics databases, retrievals are oriented towards many or most records (statistical operations); few columns from these records are required.


What makes Analytical DBs faster and more scalable ?

Let’s look at some of the technologies I mentioned above to see how these make analytical DBs different. These are some but not all of the technologies that are used.

Columnar Databases :

A columnar database, also known as a column-oriented database, is a database management system (DBMS) that stores data in columns rather than in rows as relational DBMSs. The main difference between a columnar database and a traditional row-oriented database are centered around performance, storage necessities and schema modifying techniques. The goal of a columnar database is to efficiently write and read data to and from hard disk storage and speed up the time it takes to return a query. One of the main benefits of a columnar database is that data can be highly compressed allowing columnar operations — like MIN, MAX, SUM, COUNT and AVG— to be performed very rapidly. Another benefit is that because a column-based DBMS is self-indexing, it uses less disk space than a relational database management system (RDBMS) containing the same data.

Column-oriented organizations are more efficient when an aggregate needs to be computed over many rows but only for a notably smaller subset of all columns of data, because reading that smaller subset of data can be faster than reading all data.

Massively Parallel Processing:

MPP (massively parallel processing) is the coordinated processing of a program by multiple processors that work on different parts of the program, with each processor using its own operating system and memory. An MPP system is also known as a “loosely coupled” or “shared nothing” system. MPP allows the warehouse to break up large data analytics jobs into smaller, more manageable chunks, which are then distributed to multiple processors. Analytics is run simultaneously – or in parallel – on each processor, then the results returned and synthesized.

In Memory analytics:

I don’t think I really need to elaborate on this one. They are databaes which load the source data into system memory in a compressed, non-relational format in an attempt to streamline the work involved in processing queries.

Online analytical processing (OLAP):

These are the databases, which store multidimensional “cubes” of aggregated data for analyzing information based on multiple data attributes.


This is a little bit about analytical databases and how they might add value to a DW/BI Solution.

Shraddha TambeHelical IT Solutions

Slide share – determine right analytic db

Unit Test and JUnit Introduction

Testing is the process of checking the functionality of the application whether it is working as per requirements and to ensure that at developer level, unit testing comes into picture. Unit testing is the testing of single entity (class or method). Unit testing is very essential to every software company to give a quality product to their customers.

A unit test is a piece of code written by a developer that executes a specific functionality in the code to be tested. The percentage of code which is tested by unit tests is typically called test coverage.


A unit test examines the behavior of a distinct unit of work. Within a Java application, the “distinct unit of work” is often (but not always) a single method. By contrast, integration tests and acceptance tests examine how various components interact.


A unit of work is a task that isn’t directly dependent on the completion of any other task.


JUnit is a framework that was started by Kent Beck and Erich Gamma.

Erich Gamma is one of the Gang of Four who gave us the now-classic Design Patterns book

A framework is a semi-complete application. A framework provides a reusable, common structure to share among applications. Developers incorporate the framework into their own application and extend it to meet their specific needs.


JUnit (http://www.junit.org) is open source software, released under IBM’s Common Public License Version 1.0 and hosted on SourceForge. The Common Public License is business friendly: people can distribute JUnit with commercial products without a lot of red tape or restrictions.

Unit testing framework should follow several best practices like

– Each unit should run independently of all other unit tests.

– The framework should detect and report errors test by test.

– It should be easy to define which unit tests will run.

public class Calculator {
public double add(double number1, double number2) {
return number1 + number2;

JUnit has many features that make it easy to write and run tests
– Separate test class instances and class loaders for each unit test to avoid side effects
– JUnit annotations to provide resource initialization and reclamation methods; @Before, @BeforeClass, @After, @AfterClass
– A variety of assert methods to make it easy to check the results of your tests
– Integration with other popular tools like ANT, MAVEN, and popular IDEs like Eclipse, NetBeans, IntelliJ and JBuilder

import static org.junit.Assert.*;
import org.junit.Test;
public class CalculatorTest { (1)
public void testAdd() { (2)
Calculator calculator = new Calculator(); (3)
double result = calculator.add(10, 50); (4)
assertEquals(60, result, 0); (5)

At 1, we start by defining a test class. The only restriction is that the class must be public; we can name it whatever we like.

At 2, we mark the method as a unit test method by adding the @Test annotation.
A best practice is to name test methods following the testXXX pattern. JUnit doesn’t
have method name restrictions. You can name your methods as you like; as long as
they have the @Test annotation, JUnit will execute them.

At 3, we start the test by creating an instance of the Calculator class (the “object
under test”), and at 4, as before, we execute the test by calling the method to test,
passing it two known values.
At 5, the JUnit framework begins to shine! To check the result of the test, we call
an assertEquals method, which we imported with a static import on the first line of
the class.


Assert Method

– assertTrue(String message, Boolean test)

– assertFalse(String message, Boolean test)

– assertNull(String message, Object object)

– assertNotNull(String message, Object object)

– assertEquals(String message, Object expected, Object actual) (uses equals method)

– assertSame(String message, Object expected, Object actual) (uses == operator)

– assertNotSame(String message, Object expected, Object actual)



Jasper BI And Pentaho Reports Performance Optimization

In any BI solution, we create a lot of reports and dashboard. Often the developers come across the problem that the solution is not performing well, the reports are

taking a lot of time to load.
In this blog we will try to discuss the different approaches which could be used to optimize the performance.

a. Having good hardware. This ensures that the processing is not taking much time

b. Having good internet speed : This reduces latency

c. Reports with many pages: Let’s say if we are generating a report which is having 100 pages. Good part in viewing the same via BI (via web browser) (Jaspersoft and Pentaho) is that we don’t need to load all the pages before we can show it to the end user. As soon as the pages are loaded end customer can start viewing it. Hence he
can start seeing first couple of pages immediately and meanwhile the remaining pages will keep on loading and rendering to the user. But the case is not the same with other formats, if a client wants to download the report in excel/pdf/csv format, he can only download once the entire report has been loaded.

d. Caching at BI level : Jaspersoft & Pentaho, both are having caching facility. We can increase the size of the caching memory, thus this leads to improved performance.

e. Caching at database level : If we feel that some of the reports / db generally takes  a lot of time, what can be done is we can create temp tables inside db and we can put the values required for those reports in this tables. This way we can have multiple level of caching (application level and database level).

f. Optimization at query level: Using the best practices of writing sqlqueries (like avoid writing nested queries, including too many Joins etc).

g. Database indexing and controlling of users on database will also increase performance of report execution.

h. Remove unused variables / parameters / table columns with in the report.

i. Maximum Usage of built in functions which are available with in jasper and pentaho reports for calculating total / percentages / avg and for type conversions like

string to integer etc will also increase the performance of report.

j. Whenever a BI solution is developed directly hitting the transactional database, the performance is compromised. Hence its always suggested that there is separate

reporting database (or data warehouse).

k. If there are multiple input parameters (like the one present in the below snapshot), then on the submit button only the report should change.
BI Report Performance
Otherwise it will result in firing many queries to the database. Also for the report/dashboard, ideally there should not be multiple input parameters.

Navin Dasari
Helical IT Solutions

Dimensional Modeling Process

Dimensional Modeling

Dimensional modeling is a technique, used in data warehouse design, for conceptualizing and visualizing data models as a set of measures that are described by common aspects of the business. It is especially useful for summarizing and rearranging the data and presenting views of the data to support data analysis.

Dimensional Modeling Vocabulary


A fact table is the primary table in a dimensional model where the numerical performance measurements of the business are stored. The term fact represents a business measure that can be used in analyzing the business or business processes. The most useful facts are numeric and additive.

For e.g. Sales, Quantity ordered


Dimension tables are integral companions to a fact table. The dimension tables contain the textual descriptors of the business. Each dimension is defined by its single primary key, which serves as the basis for referential integrity with any given fact table to which it is joined.

Dimensions are the parameters over which we want to perform Online Analytical Processing (OLAP). For example, in a database for analyzing all sales of products, common dimensions could be:

  • · Time
  • · Location/region
  • · Customers
  • · Salesperson


Dimensional Modeling Process

Identify business process

In dimensional modeling, the best unit of analysis is the business process in which the organization has the most interest. Select the business process for which the dimensional model will be designed. Based on the selection, the requirements for the business process are gathered.

At this phase we focus on business processes, rather than on business departments, so that we can deliver consistent information more economically throughout the organization. If we establish departmentally bound dimensional models, we’ll inevitably duplicate data with different labels and terminology.

For example, we’d build a single dimensional model to handle orders data rather than building separate models for the sales and marketing departments, which both want to access orders data.

Define grain

The granularity of a fact is the level of detail at which it is recorded. If data is to be analyzed effectively, it must all be at the same level of granularity. As a general rule, data should be kept at the highest (most detailed) level of granularity.

For example, grain definitions can include the following items:

  • A line item on a grocery receipt
  • A monthly snapshot of a bank account statement

Identify dimensions & facts

Our next step in creating a model is to identify the measures and dimensions within our requirements.

A user typically needs to evaluate, or analyze, some aspect of the organizations business. The requirements that have been collected must represent the two key elements of this analysis: what is being analyzed, and the evaluation criteria for what is being analyzed. We refer to the evaluation criteria as measures and what is being analyzed as dimensions.


If we are clear about the grain, then the dimensions typically can be identified quite easily. With the choice of each dimension, we will list all the discrete, text like attributes that will flesh out each dimension table. Examples of common dimensions include date, product, customer, transaction type, and status.


Facts are determined by answering the question, “What are we measuring?” Business users are keenly interested in analyzing these business process performance measures.


Creating a dimension table

Now that we have identified dimensions, next we need to identify members , hierarchies & properties or attributes of each dimension that we need to store in our table.

Dimension Members:

A dimension contains many dimension members. A dimension member is a distinct name or identifier used to determine a data items position. For example, all months, quarters, and years make up a time dimension, and all cities, regions, and countries make up a geography dimension.

Dimension Hierarchies:

We can arrange the members of a dimension into one or more hierarchies. Each hierarchy can also have multiple hierarchy levels. Every member of a dimension does not locate on one hierarchy structure.

Creating a fact table

Together, one set of dimensions and its associated measures make up what we call a fact. Organizing the dimensions and measures into facts is the next step. This is the process of grouping dimensions and measures together in a manner that can address the specified requirements. All candidate facts in a design must be true to the grain defined in step 2. Facts that clearly belong to a different grain must be in a separate fact table.


Archana Verma

Helical IT Solutions

How to Use LUA as Web Script

Lua is commonly described as a “multi-paradigm” language, providing a small set of general features that can be extended to fit different problem types, rather than providing a more complex and rigid specification to match a single paradigm. Lua, for instance, does not contain explicit support for inheritance, but allows it to be implemented with metatables. Similarly, Lua allows programmers to implement namespaces, classes, and other related features using its single table implementation; first-class functions allow the employment of many techniques from functional programming; and full lexical scoping allows fine-grained information hiding to enforce the principle of least privilege.

lua base language is light – the full reference interpreter is only about 180 kB compiled– and easily adaptable to a broad range of applications.

Lua is a dynamically typed language intended for use as an extension or scripting language, and is compact enough to fit on a variety of host platforms.

It supports only a small number of atomic data structures such as

– Boolean values,

– Numbers (double-precision floating point by default), and

– Strings.

Typical data structures such as

– Arrays,

– Sets,

– Lists, and

– Records

can be represented using Lua’s single native data structure,

  •  the table, which is essentially a heterogeneous associative array.
Ex. a_table = {} -- Creates a new, empty table

NOTES--Lua programs are not interpreted directly from the textual Lua file, but are compiled into byte-code which is then run on the Lua virtual machine. The compilation process is typically transparent to the user and is performed during run-time, but it can be done offline in order to increase loading performance or reduce the memory footprint of the host environment by leaving out the compiler.

How  to Add Buttons in LUA and Handling their  Events…

Button1 = exSection:option(Button, “_button_click_event “,translate(“Button Section”))      Note:- exSection  is defined in a map, and button is defined in that exSection-section

Button1.title = translate(“CLICK  HERE TO PERFORM BUTTON-ACTION”)

Button1.inputtitle=translate(” START “)


Function Button1.write(self, section, value)        –Defining  button-click event function

luci.sys.call(‘/etc/init.d/test_daemon  start’)        –Firing a system-command   to start a script


How  to call Templates from LUA and calling LUA-function from  template

testTemplate = exSection:option(Value, “_custom”, translate(“Template Testing… “))

testTemplate.template = “Template-Testing”



<%-  …… LUA code which you want to execute in your lua page…..  %>

<input type=”text ” name=”cbi.read.testintvl” id=” cbi.read.testintvl ” value=”<%-=h_testlval-%>”></input>                           ——Your HTML Code  here h_testlval   value comes from lua page.


Note:- In LUA page we called any input-type(EX.Button,TextBox..)From their name.

Function Definitions in lua page

function testTemplate.lastmodifiedtime(self, section)

local mdtime=st.mtime(“/usr/lib/test.txt”)

return os.date(‘%c’, mdtime)


How  to create MAP and SECTION in lua page

local m, s

m=Map (“testing”, translate (“TEST-APPLICATION CONFIGURATION “))

s = m: section (TypedSection, “testagent”, translate (“TESTAGENT”))

s.anonymous = true

Predefined Function of lua when using map.

I.            m.on_init=function(self)

———–Things which you want to execute before your lua page load———


II.            m.on_before_commit = function(self)

——Before commit———-


III.            m.on_after_commit=function(self)

—–After Commit———-


LUA Uses by different Organization in their Application.

  1. Adobe Photoshop Lightroom uses LUA for its user interface.
  2. Apache HTTP Server can use LUA anywhere in the request process (since version 2.3, via the core mod_lua module).
  3. Cisco uses LUA to implement Dynamic Access Policies within the Adaptive Security Appliance.
  4. iClone, a 3D real-time animation studio to create animation movies uses Lua in the controls of its new physic simulation.
  5. MySQL Proxy uses LUA for its extensions
  6. VLC media player uses LUA to provide scripting support.

Wikipedia since March 2013 LUA is used as a new templating language, and its template library is growing.


Pushpraj Kumar,

Helical IT Solutions

How to solve pg_tblspc problem in postgres

This blog will talk about How to solve pg_tblspc problem in postgres

pg_tblspc problem postgres

Sometime when we are doing some operation on Postgres database, we faced this error.

I faced  it,  when I  was  trying  to  get  values  from  database.

This issue throws error that could not open directory “pg_tblspc/<1234…x>”: No such file or directory

To  solve this  issue  you  first  have  to  create  dump  of  your  database. Steps are ahead …


  • psql -h localhost -p 6666 -U postgres -d <yourDBname>-f “E:/xyz/abc.p”

(It will create dump file of your database)

  • Then delete your old database after dumping it.

(dump database <yourDBname>)

  • Then again restore it from your dump file…


Or, Its better to use your backup file while restoring..

If  problem  is  still  there then we need  to first restore that <abc.p/abc.sql> file  on  our  local  machine

  • then  again  create  (.p)  file  as  dump  file
  • And restore your postgres by using this new dump file…

You will not come across this problem again……..


Another way:-

1) Backup data with pg_dump


pg_dump -i -h localhost -p 5432 -U postgres -F c -b -v -f

“/usr/local/backup/” old_db


To list all of the available options of pg_dump, please issue following command.


  • Ø pg_dump -?


-p, –port=PORT database server port number

-i, –ignore-version proceed even when server version mismatches

-h, –host=HOSTNAME database server host or socket directory

-U, –username=NAME connect as specified database user

-W, –password force password prompt (should happen automatically)

-d, –dbname=NAME connect to database name

-v, –verbose verbose mode

-F, –format=c|t|p output file format (custom, tar, plain text)

-c, –clean clean (drop) schema prior to create

-b, –blobs include large objects in dump

-v, –verbose verbose mode

-f, –file=FILENAME output file name


2) Restore data with pg_restore


pg_restore -i -h localhost -p 5432 -U postgres -d old_db -v



To list all of the available options of pg_restore, please issue following command.


  • Ø pg_restore -?


-p, –port=PORT database server port number

-i, –ignore-version proceed even when server version mismatches

-h, –host=HOSTNAME database server host or socket directory

-U, –username=NAME connect as specified database user

-W, –password force password prompt (should happen automatically)

-d, –dbname=NAME connect to database name

-v, –verbose verbose mode


Pushpraj Kumar,

Helical IT Solutions

Creating a Federated Tables inside MySQL

What is Federated Table?

A federated database system is a type of meta-database management system (DBMS), which transparently maps multiple autonomous database systems into a single federated database.

A Federated Table is a table which points to a table in another MySQL database instance (mostly on another server). It can be seen as a view to this remote database table. Other RDBMS have similar concepts for example database links.

federated table

Federated Tables Concept In MYSQL(linux) :

  1. Federated Storage Engine : Using Federated Storage Engine,we will be able to access data from tables of Remote databases rather than local databases. This Federated Engine concept is available from MYSQL 5.3.0 Onwards

Example : Suppose you want to access some data from the DataBase that is on one server1,

Also another set of data from different server2, but data on these two servers are not accessible through out the world , Rather these are available only on one server3 , so what you can do is :

Simply create “Federated” tables for all those tables from which you want to access data on server3 from server1 & server2. When using a FEDERATED table, queries on the server3 are automatically executed on the remote (federated) tables. No data is stored on the local tables.

NOTE :  By default Federated Storage Engine will not enabled on the mysql

Federated Table 2But if you want to verify whether federated engine is on or off, “SHOW ENGINES” sqlquery can help to identify the status.

federated table 2If value of Support column is “YES” then Federated Engine is Turned On, Else If it is “No” it means it is Turned Off.

To enable federated engine ,

  1. You add the line ‘federated’ to the /etc/my.cnf file after  [mysqld] section
  2. restart the mysql server

Federated Table 1

  1. Federated Table Creation : It has two elements :
    1. Remote server : Data is physically stored in this server
    2. Local Server :      No data is stored here,simply you can fire the query and you will be able to see your required output.

Example :

  1. Suppose you have a table on the Remote Server (say Master Table : “student_test”)

Data Base Name is :  STUDENT_DB that you want to   access by using a Federated table


CREATE TABLE student_test(







2. Next,You have to create a table on the local database server(say Federated Table :”Student_federated)” DataBase Name is STUDENT_FEDERATED to access the data on the Master Table


CREATE TABLE Student_federated(






CONNECTION=’scheme://username:[email protected]:port/DBname/TableName;

For the above scenario , The connection String should be :

CONNECTION=’mysql://root:[email protected]:3306/STUDENT_DB /Student_test;


Sequence of execution:

– When a client issues an SQL statement that refers to a FEDERATED table, the flow of information between the local server (where the SQL statement is executed) and the remote server (where the data is physically stored) is as follows:

– The storage engine looks through each column that the FEDERATED table has and constructs an appropriate SQL statement that refers to the remote table.

– The statement is sent to the remote server using the MySQL client API.

– The remote server processes the statement and the local server retrieves any result that the statement produces (an affected-rows count or a result set).

If the statement produces a result set, each column is converted to internal storage engine format that the FEDERATED engine expects and can use to display the result to the client that issued the original statement.


Now You are able to get your required output by firing sql query



Helical IT Solutions