Automated data testing for disparate data sources
Disparate source data testing is a major challenge in IT industry. In our last article 10 Key Performance Indicators to check for quality data after ETL, we have seen how the data leak may happen and different checks during ETL development. In this article we will see how to perform automated data testing for disparate data sources such as xml to Excel, MySQL to SQL, Json to Postgres etc.
Helical has come up with new and unique product based on Helical Scrunch for comparing data from desperate data sources and can produce result at individual cell level. The terminology used in below explanation is Expected and Actual. Expected data is records to be compared with which may be in one of the source or may be prepared by developer. Actual represents Actual source data may be from xml, csv, mysql or json or any source data. After comparing the data, following differences are available for analysis.
- Additional columns in Expected
- Additional columns in Actual
- Additional records in Expected
- Additional records in Actual
- Mismatch Records from Expected
- Number of Records in Expected
- Number of Records in Actual
- Rejected/Mismatched/Additional records in flat file / database column for analysis
- Row number of rejected/mismatched record
- Column name of rejected/mismatched record
- Actual rejected/mismatched record from Expected
Using this tool it is effortless task to compare diverse data sources in no time with required details for data comparison.
- MySQL – SQL Server
- Excel – MySQL
- XML – MySQL
- CSV – SQL Server
- MySQL – MySQL
Limitation which is found in this case is that if your source data is not moved as it is to target, then the developer needs to create Expected data to be compared with target and this is time consuming. But, if you have data available in two different sources, then task of data comparison becomes effortless.
- Data Migration
- Data Testing
- Data warehousing
- Data synchronization
- Data consolidation
Usually when developing larger projects having 100s of database tables or data from different sources it is difficult to perform data testing on each record and verify whether the records are properly inserted into the target or not. This tool ensures that even after performing changes to ETL, if you execute this comparison test, it will provide you with the comparison results for all the tables. This is one time setup and you can execute it multiple times for data verification.
- Can perform or simulate data testing for every ETL execution
- Saves time and efforts of developers for regression testing
- Ensures correctness of data
- Improves quality of project
- With slighter modification, this can be utilized for incremental and initial load as well.
So, if you have any daunting task of comparing records and identifying defaulters in financial, relational or any diverse data sources, please contact us and we would give you the solution which will match up to your expectations.
Vaibhav Waghmare (DW Architect), Helical IT Solutions