Article
7 Ways to reduce ETL project cost and risk
By Karthik Muthuraman, Practice Manager - Business Objects Data Services
ETL as a tool has evolved over time, from being used only for data warehouse to being used anywhere there is significant movement of data and data transformation required. And it continues to evolve. However, ask any veteran project manager where the biggest risk and cost escalation lies, the overwhelming answer you will receive is that it is in development phase. So does adopting any particular software development methodology reduce the risks and cost? There is no fit-all methodology. Instead, emphasizing on a few key objectives and best practices will definitively help.
- Keep it simple:
The ETL designs should be kept simple and unnecessary complexities avoided. As techies we have a tendency to over optimize and in the process make things complicated. Simple designs are easier to develop and maintain than complicated ones. Simplicity means freedom from hardship, effort or confusion. This leads to fewer errors, faster development, there by leading to reduced testing time, and smaller production support teams. - Template based development:
Develop a generic design for all common design scenarios that will be encountered and use this as a template. This will bring uniformity and reusability of design. A few good examples would be developing a template for the ETL mappings to be used for type 1, 2, and 3 dimensions in a data warehouse. CDC (change data capture) and simple pass through mappings are also prime candidates. - Reusable Objects:
Custom coding must be minimized, while the use of reusable objects must be maximized. This leads to less coding error which in turn leads to lesser development time. - ETL tool:
An often overlooked factor is the ease of use of the tool. Let us assume there is a tool "A" that is highly efficient with many bells and whistles but is overly complex. The probability of getting the expected end result and/or low cost is low. An easy to use tool means less dependence on consultants, so that existing staff could be trained to do the development work. There are some myths to be busted about the open source tools. The initial cost may be low, but it should also be checked to see if the long term cost will be low. Questions to consider that may escalate the cost include: how well is it supported, is there a professional service and option for SLA, and how well established it is. A mailing list isn't too useful when your mission critical server is down through no fault of your own. Often you will find answers to error messages and common mistakes, but all too often it turns out to be a bug that you must somehow work around. Your product review should include questions like: how mature the product is and if it is featured in reviews like Gartner survey, how is it ranked? - Documentation:
Every step of the project needs to be documented from the preliminary analysis to the final code sign off. To ensure this happens have a checklist at every stage. This checklist will have all the documents that need to be created and signed off before moving to next phase. And a template for all the documents should also be created. A standard practice is to have a common folder in a shared network drive called project templates, and have the different phases of the project as a subfolder with the related template in it. - Avoid scope creep:
Define your deliverables, then understand your priorities, and justify your scheduling decisions based on priorities. If it is not in the deliverables then it is not a priority. Expect that there will be scope creep. Implement Change Order forms early and educate the project drivers on your processes. A Change Order form will allow you to perform a cost-benefit analysis on changes requested by the project drivers. - Strong Change Control & Impact Analysis:
All development must be done before going into production. However, there may be situations where a new set of jobs are released periodically along with changes to old ones. If not managed properly, this can lead to the nightmare scenario. So a strong change management coupled with a good impact analysis, before any new release or change, is necessary. Impact analysis includes asking questions like: Will the new job intervene with other jobs due to conflicting batch time, will this lead to resource conflicts or will this impact any jobs downstream or is it impacted by any job upstream, Will this impact the end users data? For instance, let us say there is a master table that is accessed & used by 10 reports. This leads to interesting questions such as: will any change in data lead to key data getting out of sync or data getting stale in some reports and so on?





