Disclaimer: The author does not have any commercial connection with the vendor of this tool.
I recently worked with an organization where we had looked at a tool that was able to analyze all the source code for the ETL jobs of their data warehouse and provide several benefits to the organization as a result. The tool is called CAST, and comes from a company called Cast Software (http://www.castsoftware.com/). During the presentation of the tool and its features by the CAST team, I was very impressed by what the tool could do in terms of analyzing source code, and even more, what it could do with the information gathered from that analysis. I was, however, hesitant to implement such a tool in this organization. This was because I didn’t feel that the data warehouse group operated at a sufficient level of maturity to be able to fully exploit the features of this software, and as a result, would be implementing a tool that would never get used.
Never the less, the organization wanted to implement the tool and they were, in fact, able to mature in their software development methodologies, more quickly than I had expected. In the end, the data warehouse group was able to take advantage of the tool, although not as completely as I would have liked. In this article, then, I will examine in more detail, why we decided to implement the tool, what it could do out of the box, what it couldn’t do out of the box (but could do later, with a little customization), what we did with it, and what we didn’t do with it but probably should have.
The Challenges
Because the organization’s data warehouse had been built in bits and pieces over a number of years, there was a lack of cohesive technical documentation. It existed, of course, but in documents spread over hundreds of project directories, and in the heads of the internal staff and in the heads of the external staff, if they were still there. This was always a problem in that, a) they were not sure what they had, or how it was built, b) doing an impact analysis across all the ETL tools involved (unix shell scripts, DataStage, custom PL/SQL, and some java) was very labour intensive, and c) with all the ongoing projects for the data warehouse, it was impossible to properly evaluate the quality of the deliverables from the vendors (most importantly, the source code).
Getting More Done Better With Less
They originally looked at CAST to solve the above problems. The tool could analyse all the source code across all the ETL tools we had, and in doing so, could provide a) complete technical documentation of the production data warehouse platform, b) end-to-end impact analysis, and c) provide a quantitative analysis of the source code delivered by the vendors and provide industry-standard measures of the quality of the code. Having all this would, in turn, allow the organisation to a) significantly reduce the learning curve for new internal staff and external vendors, b) significantly reduce the time and increase the quality of the impact analysis studies they carried out, and c) enable automated enforcement of the development standards and guidelines, reducing the number of errors in the deliverables whilst improving the overall code quality.
Code Analysis
CAST can parse just about all the source code you can throw at it, regardless of the language or tool it was written in. If they don’t already have a parser written for a specific tool, their universal parser can be tweaked very quickly. With our organization, we have some source code in Unix shell scripts, some ETL written using DataStage, and some just in PL/SQL. The reporting tool used was Business Objects. CAST had no trouble with anything we gave it. The one issue we did have was that the PL/SQL parser was not equipped to deal with dynamic SQL and PL/SQL that was data driven. Much to my surprise, it took the CAST Switzerland team only a handful of days to update their PL/SQL parser to handle this problem! When they were done, we had a tool that could analyze all the source code in our platform, along with the databases (mainly Oracle) so that we ended up with complete technical documentation (albeit using the included Enlighten tool, with a bit less on the generated web pages). To put it succinctly, they managed to do in 12 days, what the organisation had not been able to do in almost 10 years!
Impact Analysis
With all the source code analyzed, and all the database objects analyzed, the CAST Enlighten tool could generate an automated impact analysis in the time it would take to get a coffee to start the analysis doing it manually. In the past at this organization, doing a thorough impact analysis required extensive reviews of document when you could find them, reading all the source-to-target documents if they were up-to-date, talking to the developer if they were still on-site, and in the end, reading all the source code line by line if you could get access to it. As you can imagine, this was not only time-consuming, but frought with errors and omissions. It could take weeks on a larger change, but with CAST, it could be done in an afternoon, with just a little digging inside the tool itself.
Code Quality
Another big issue the organization suffered was that there were never the resources within the organization to do a proper inspection and review of the source code deliverables from the vendors. In some cases, the projects were just too big to manage with a small internal staff size, in other cases just too small to worry about. This led to visicous, lengthly cycles of testing and re-developing, as new bugs were not found until user acceptance testing, or, even worse during the pre-production system testing, and in a few cases during use after production rollout. Not to mention the inevitable commercial disagreements between the vendors and the organization where one would blame the other for the delays and the errors. Since the organization didn’t have the internal resources to do this due-diligence of the vendor deliverables, a lot of problems like the ones mentioned above crept into every project, pushing back delivery schedules and blowing the budgets of the projects.
The CAST tool, because it comes with a huge number (>600 when we used it) of quality measures for source code, and because you can modify them to your taste or add new ones, all the delivered source code could be checked against these quality measures, even before they were manually tested. I didn’t pretend to understand all the types of measures they would calculate against the source code (things like Cyclomatic Complexity are the things of rocket scientists, I’m sure), but what I did understand was that they could tell you in an instant how well written the code was, how well it complied to your internal standards as well as industry standards, and where in the code you might have a hotspot later on (if the complexity measure was too high). All these things could be used to do an automated reinforcement of the development standards and guidelines of the organization, and acted as an independant third-party to help in the decision to accept or not the source code being delivered by a vendor.
The Solutions
To address the challenge of maintaining up-to-date, complete and accurate technical documentation of the data warehouse platform, CAST was used to analyze the production platform after every deliverable. The Enlighten tool was made available to all the internal and external staff working on data warehouse projects, and the web-site created by CAST was made available to the entire organization.
And because the orgranization also had several on-going projects, they would also analyze the UAT environments from time to time as well. This allowed the impact analysis to capture changes to a future version of the data warehouse platform as well as the current one.
The code quality analysis was reviewed on a periodic basis to keep track of the “hotspots”, as these problem areas would be included in projects that touched on those areas, so that the number of potential problem areas could be reduced over time.
The Benefits
By using CAST, the organization was able to reduce the costs of project affecting the data warehouse while increasing the quality of the deliverables. These cost saving were realized by having all the technical information of the platform in one place, always up-to-date, and easily available and navigatable. As well, by automating many of the otherwise manual tasks that never seem to get done, the cost of testing and doing impact analysis was signicantly reduced. Also, the abilty to set the expectations of quality and being able to measure those expectations objectively, let to better, more bug-free code being delivered on-time and under budget.
Extending the Benefits
There are a few things that could have been done with CAST and its features that were not taken advantage of by this organization.
Recommendation
I highly recommend this product to anyone who has a large system or systems that they need to get under control or not lose control of. CAST helps to implement a layer of automated governance that can be otherwise far more expensive to implement, and, as mentioned above, can certainly reduce project costs through increased productivity and increased quality of deliverables. I haven’t worked on a project since where I didn’t wish CAST had been implemented there as well.
If you have any questions, I urge you to contact them. You can also leave questions in the comments below and I will answer the to the best of my ability.

I recently worked with an organization where we had looked at a tool that was able to analyze all the source code for the ETL jobs of their data warehouse and provide several benefits to the organization as a result. The tool is called CAST, and comes from a company called Cast Software.  I was hesitant to implement such a tool there because I felt that the data warehouse group didn’t operate at a sufficient level of maturity to be able to fully exploit the features of this software, and as a result, would be implementing a tool that would never get used. However, during the presentation of the tool and its features by the CAST team, I, for one, was so impressed by what the tool could do in terms of analyzing source code, and even more, what it does with the information gathered from that analysis, that we decided to implement the tool. In the end, the data warehouse group was able to take advantage of the tool, although not as completely as I would have liked. In the next few articles, then, I will examine in more detail, why we decided to implement the tool, what it could do out of the box, what it couldn’t do out of the box (but could do later, with a little customization), what we did with it, and what we didn’t do with it but probably should have.

The Challenges

Because our data warehouse had been built in bits and pieces over a number of years, there was a lack of cohesive technical documentation. It existed, of course, but in documents spread over hundreds of project directories, and in the heads of the internal staff and in the heads of the external staff, if they were still there. This was always a problem in that, a) they were not sure what they had, or how it was built, b) doing an impact analysis across all the ETL tools involved (unix shell scripts, DataStage, custom PL/SQL, and some java) was very labour intensive, and c) with all the ongoing projects for the data warehouse, it was impossible to properly evaluate the quality of the deliverables from the vendors (most importantly, the source code).

Getting More Done Better With Less

We originally looked at CAST to solve the above problems. The tool could analyse all the source code across all the ETL tools we had, and in doing so, could provide a) complete technical documentation of the production data warehouse platform, b) end-to-end impact analysis, and c) provide a quantitative analysis of the source code delivered by the vendors and provide industry-standard measures of the quality of the code. Having all this would, in turn, allow us to a) significantly reduce the learning curve for new internal staff and external vendors, b) significantly reduce the time and increase the quality of the impact analysis studies they carried out, and c) enable automated enforcement of the development standards and guidelines, reducing the number of errors in the deliverables whilst improving the overall code quality.

Code Analysis

CAST can parse just about all the source code you can throw at it, regardless of the language or tool it was written in. If they don’t already have a parser written for a specific tool, their universal parser can be tweaked very quickly. With our organization, we had some source code in Unix shell scripts, some ETL written using DataStage, and some just in PL/SQL. The reporting tool used was Business Objects. CAST had no trouble with anything we gave it. The one issue we did have was that the PL/SQL parser was not equipped to deal with dynamic SQL and PL/SQL that was data driven. Much to my surprise, though, it took the CAST Switzerland team only a handful of days to update their PL/SQL parser to handle this problem! When they were done, we had a tool that could analyze all the source code in our platform, along with the databases (mainly Oracle) so that we ended up with complete technical documentation (albeit using the included Enlighten tool, with a bit less on the generated web pages). To put it succinctly, they managed to do in 12 days, what we had not been able to do in almost 10 years!

Impact Analysis

With all the source code analyzed, and all the database objects analyzed, the CAST Enlighten tool could generate an automated impact analysis in the time it would take to get a coffee before starting to do the analysis manually. In the past, doing a thorough impact analysis required extensive reviews of document when you could find them, reading all the source-to-target documents if they were up-to-date, talking to the developer if they were still on-site, and in the end, reading all the source code line by line if you could get access to it. As you can imagine, this was not only time-consuming, but frought with errors and omissions. It could take weeks on a larger change, but with CAST, it could be done in an afternoon, with just a little digging inside the tool itself.

Code Quality

Another big issue we suffered with was that there were never the resources within the team to do a proper inspection and review of the source code deliverables from the vendors. In some cases, the projects were just too big to manage with our small internal staff size, in other cases just too small to worry about. This led to visicous, lengthly cycles of testing and re-developing, as new bugs were not found until user acceptance testing, or, even worse during the pre-production system testing, and in a few cases during use after production rollout. Not to mention the inevitable commercial disagreements between the vendors and us where one would blame the other for the delays and the errors. Since we didn’t have the internal resources to do this due-diligence of the vendor deliverables, a lot of problems like the ones mentioned above crept into every project, pushing back delivery schedules and blowing the budgets of the projects.

The CAST tool, because it comes with a huge number (>600 when we used it) of quality measures for source code, and because you can modify them to your taste or add new ones, all the delivered source code could be checked against these quality measures, even before they were manually tested. I didn’t pretend to understand all the types of measures they would calculate against the source code (things like Cyclomatic Complexity are the things of rocket scientists, I’m sure), but what I did understand was that they could tell you in an instant how well written the code was, how well it complied with your internal standards as well as industry standards, and where in the code you might have a trouble spot later on (if the complexity measure was too high). All these things could be used to do an automated reinforcement of the development standards and guidelines of the organization, and acted as an independant third-party to help in the decision to accept or not the source code being delivered by a vendor.

continued in the Part 2…


4 Responses to “CAST – Software analysis to the extreme! – Part 1”

  1. Edwin says:

    Hi Chris,
    the solution you describe here looks as powerful as the initial challenge seems common place. Before choosing CAST, did you evaluate alternatives?

    Thanks
    Edwin

    • Hi Edwin,

      I can’t say that we did an exhaustive search for similar tools, but we didn’t find anything that came close. I know that some ETL tools and some reporting tools have their own built in features for doing impact analysis and checks on the code they generate, but we didn’t find anything that worked across so many tools and languages.

      If there is a competitor, please let me know and we will try to set up a review of that tool here at BI Review!

      Thanks,

      Chris

  2. nick horne says:

    Hi Edwin,
    Very interesting article we have Cast installed at NHSBT in the UK and have used it in a limited sense for a while, however as all of our code is currently PL/SQL with lots of dynamic and application reference data (e.g. select statements) stored in tables we found CAST lacking – reading your report i am very pleased to note that CAST have upgraded their PL/SQL parser to include dynamic SQL and table based SQL this is something that we raised with CAST a while ago but they had no intention of resolving at the time, so as i said i am actually over the moon that this has now been resolved.

    Interestingly our codebase consists (now) of lots of oracle pl/sql code, 1 million lines and 10k’s worth of Java/JSP/JEE code for which CAST is very good at analysing. I am now going to investigate upgrading our version of CAST from 6.2.0 to the latest and making good use of it. i will post another commnet when this is achieved.
    Regards Nick

    • Hi Nick,

      The Cast guys in Switzerland were excellent to work with, and I believe it took some custom work by them to get it all done. I recommend calling them (ask for Nik Hirt), as I am sure they can male the code available to you if they haven’t already added it to the Cast codebase.

      Chris

Leave a Reply

(required)

(required)

© 2009 Business Intelligence Review Suffusion WordPress theme by Sayontan Sinha