About Analytical Databases
Companies and Corporates that build Business Intelligence solutions for their business, often want to consolidate their data that is spread across various applications, for reporting and analysis purposes. For that, they build datawarehouses or data marts as the foundation of their enterprise-wide BI solutions. These DWs have been implemented using traditional relational databases and successfully so. But as organizations add more and more data and users to these systems, the scabiltity and performance become crucial. One of the useful contributors to solving the problems of scability / performance could be the use of Analytical Databases or Datawarehousing DBs.
What are these Analytical Databases ?
Analytics databases are optimized to be used for retrieval, aggregation and analytics. They deliver benefits in a BI environment because they are designed for high performance in this environment.
These Analytical databases, or analytical platforms, span a range of technology from appliances and columnar databases to shared nothing, massively parallel processing databases. The common thread among them is that most are read-only environments that deliver exceptional price/performance compared with general-purpose relational databases originally designed to run transaction processing applications. Analytical databases are now a central component in any BI ecosystem, providing the necessary performance and scalability to support large numbers of analytical users and growing volumes of both structured and unstructured data.
Below are a few differences between transactional and analytical DBs:
- On-line transactional procesing (OLTP) databases are designed to host operational systems that run the business. Analytics databases step in when operational use of the data has ended, and the data are to be used for information and analysis.
- OLTP-oriented databases, are designed for high performance in an on-line transactional environment, where additions, updates, and deletions are being continuously made to individual or small numbers of records in a random manner over time, to data scattered accross tables. In an analytics database, there are mostly bulk uploads of data, deletions and update are rare / infrequent.
- Data integrity is critical to transactional data, OLTP databases are optimal for these.
- OLTP databases support environments where many concurrent users are retrieving, adding, updating, and deleting data on realtime basis. Analytics databases, on the other hand, support environments where very many concurrent users are retrieving data but adds, updates, and deletes are done periodically, on scheduled basis.
- In OLTP environments, retrievals are oriented towards single or few records where many or most columns from these records are required. In analystics databases, retrievals are oriented towards many or most records (statistical operations); few columns from these records are required.
What makes Analytical DBs faster and more scalable ?
Let’s look at some of the technologies I mentioned above to see how these make analytical DBs different. These are some but not all of the technologies that are used.
Columnar Databases :
A columnar database, also known as a column-oriented database, is a database management system (DBMS) that stores data in columns rather than in rows as relational DBMSs. The main difference between a columnar database and a traditional row-oriented database are centered around performance, storage necessities and schema modifying techniques. The goal of a columnar database is to efficiently write and read data to and from hard disk storage and speed up the time it takes to return a query. One of the main benefits of a columnar database is that data can be highly compressed allowing columnar operations — like MIN, MAX, SUM, COUNT and AVG— to be performed very rapidly. Another benefit is that because a column-based DBMS is self-indexing, it uses less disk space than a relational database management system (RDBMS) containing the same data.
Column-oriented organizations are more efficient when an aggregate needs to be computed over many rows but only for a notably smaller subset of all columns of data, because reading that smaller subset of data can be faster than reading all data.
Massively Parallel Processing:
MPP (massively parallel processing) is the coordinated processing of a program by multiple processors that work on different parts of the program, with each processor using its own operating system and memory. An MPP system is also known as a “loosely coupled” or “shared nothing” system. MPP allows the warehouse to break up large data analytics jobs into smaller, more manageable chunks, which are then distributed to multiple processors. Analytics is run simultaneously – or in parallel – on each processor, then the results returned and synthesized.
In Memory analytics:
I don’t think I really need to elaborate on this one. They are databaes which load the source data into system memory in a compressed, non-relational format in an attempt to streamline the work involved in processing queries.
Online analytical processing (OLAP):
These are the databases, which store multidimensional “cubes” of aggregated data for analyzing information based on multiple data attributes.
This is a little bit about analytical databases and how they might add value to a DW/BI Solution.
Shraddha Tambe | Helical IT Solutions
Slide share – determine right analytic db