ETL Project development using Agile-Waterfall-ETL Framework method
ETL projects development while performing all the following tasks takes very long time. But, using Agile-Waterfall-Framework method could enhance the speed, quality and delivery of ETL projects. Various tasks which are required to have a quality ETL project are as follows.
- Create custom logging strategy and develop future jobs accordingly
- Create database structure and tables for logging and analysis
- Create error logging, statistics and flow meter logging
- Create context group and variables and manage global context
- Create and load property file for variables
- Create job(s) for data migration
- Implement and test complete job with variable and repository driven approach
- Combine custom logging with the job
- Create job for notification and alerting
- Create a job for checking source, staging and target connections
- Identify test strategy for data migration
- Recursively test all the jobs with all the components in place and their logic with logging
- Implement connection and strategy for pre and post processing of data migration
- Create documentation for complete ETL structure
- Follow naming conventions for jobs, logging, table names, flow path, component names, comments, description etc.
- Controlled flow for various sub job execution stages
- Change once apply to all strategy for commonly used tasks
With the help of this approach you can implement various data migration and data warehousing projects. Sample is given below
- Data migration/DWH
- Staging – SQL Server
- WH – SQL Server
- ETL tool – Talend Open Studio
- Data quality tool – Talend Data Quality
Agile approach could be used to break down complete project into different user stories and decide iteration duration based on the complexity of iteration. Medium to less complex iteration could be given 5 days Sprint and Medium to complex iterations could be given two sprints of 5 days each. In case of time zone differences, iteration start – end time could be from Wednesday-Tuesday, this helps effective communication between different teams and gives buffer period during week end to bring up the pace if required.
Within each sprint waterfall model of development can be used which contains following steps
- Requirement understanding
- Architecture planning
- Design and Development
- Unit, Deployment and UAT testing
Initial sprint while implementing new framework takes one more extra sprint. But as soon as the architecture for new environment is matured enough to accommodate all the required changes as per clients need, subsequent development can be very rapid and complete sooner than expected.
Time killers are the ones which will not allow your sprint to complete in planned duration. This has to be taken care of before starting the project implementation with the help of (offshore) team.
- When the client, Domain expert, Development and Test teams are in different Time Zones
- When the technical requirement for iteration is not well documented. Many times, iterations are planned by the business user and lacks technicality in story description
- When sufficient data is not available for unit and load testing
- When the target DWH is not stable. Changing and retrofitting earlier jobs takes more time
- Unit, Deployment and UAT testing takes more time when in different time zones
How to improve?
- Complete understanding of project implementation and planning number of iterations before starting the development
- Ready with all the test scenarios and acceptance testing for each iteration
- Plan sufficient time and efforts for testing other than the development efforts
- Dividing complete project into multiple stories in sequence
- Differentiating new story developments with bug fixes and changes in old iterations with reference to time and efforts
- Designing a flexible, modular and expandable architecture
- Using proper and effective release planning
- Effective documentation for each iteration
If you follow above approaches for improvement, this could directly lead to reducing or avoiding unexpected delays during iterations, saving on project time & efforts and improving your ROI.
Please get in touch with us for the development of new architecture for your ETL projects and improve the development quality and speed of project completion.