2013-07-24
Static Code Analysis

Typical development environment for an average software developer sucks. Whether it’s Eclipse with a bunch of plugins, or a purist Vim environment, the situation is the same. Making changes line-by-line, it’s impossible to see how the whole program and the health of it is evolving as you go. Static code analysis to rescue!

Why static analysis?

Static analysis gives an instant boost to your productivity. It gives you a helicopter that you can use to lift yourself above mere file and folder views. It also hands you a brain extension and an extra pair of eyes to find those pesky security flaws from your program.

You may claim that code size and complexity metrics are pure nonsense, but at the end of the day they are meaningful e.g. for project budgeting. Interesting data starts to pour out, when numbers are compared against projects having similar characteristics. As an example, look at the statistics of Firefox and Chrome.

Firefox vs Chrome

The majority of their code is written with the same set of languages, nearing 10 million lines. Firefox is heavier on JavaScript, while Chrome is on C. With these naive statistics I can already relate them to my earlier projects, and understand the resourcing and expertise involved with the task (yes, both are major engineering efforts!). While I cannot guarantee the correctness of Ohloh-statistics, I think I have grasped the efforts better than without seeing this data.

What is static analysis?

When you analyze the software without executing it, you are doing static analysis. In that sense compilation or plain code reading could be thought as being primitive static analyzers, but let’s not focus on those. Wikipedia lists a comprehensive set of available static analysis programs. I selected just the multi-language ones to get at least some sort of comparison done, and then picked the most interesting products for a deeper look. This list was spiced with a few additions from a quick Google search. Since tools offer a mixed bag of functionalities, I ended up creating the following categorization:

  1. Line-by-line: Detect security flaws, standards compliance (e.g. MISRA) and concurrency related problems. Errors and warnings are typically shown next to the lines causing them. In this category lies your extra pair of eyes.

  2. Metrics: Calculate values for SLOC, Cyclomatic complexity, Halstead complexity and a bunch of other metrics. These values usually require some interpretation or a reference point; what is good and what is bad?

  3. Technical Debt: This is one step further from the previous category, showing technical debt with numbers (e.g. Euros) that non-software savvy persons understand. Yes, you too want to hear this number!

  4. Duplicate Code: Are you again in a situation where you have to fix that same bug into five different places? Duplicate detection will help you, pointing out the refactoring candidates.

  5. License Compliance: Shipping a proprietary software product? Are you sure that you are not violating open source licenses by shipping some compiled copy-paste fragment in your binary? License compliance checkers scan the potential use of open source code, and report associated licenses to you.

  6. Exploration: How would you feel about lifting relevant abstractions from your source code, illustrating them visually, and making them searchable and browsable? Good tools support various visualizations, like package structure and dependency reports (imports, inheritance, call graphs, references and so on).

  7. Documentation: Everyone hates to write documentation, so why not extract the relevant parts directly from the source code? These tools find, for example, class and method declarations with associated source code comments, and output nice HTMLs and PDFs for project newbies to consume.

  8. Architecture Compliance: This one is my favorite. Architecture compliance tools help you to define constraints on code, and assist you to follow agreed guidelines. Sometimes we need to cut corners, but let’s at least be aware of those cases, and have a plan to correct them in the future!

Several tools spice up different categories by collecting data across time (i.e. trend). This helps you to understand if the direction of your codebase and project are as intended.

Tools

In the “Line-by-line”-category, there are plenty of tools which brag about tens of years of expertise in the field with related credentials. The most impressive ones are Grammatech Codesonar, KlocWork Insight and Coverity Advisor family. If you look for an instant action list how your code can be improved, you probably can find it here.

The next tool, SonarQube, covers the categories of “Metrics”, “Technical Debt”, and “Duplicate Code” quite well. It also has plenty to offer in the “Line-by-line”-category. SonarQube is typically run as part of a continuous integration system and monitored through a web browser. Being open source and easily extendable, it’s a lucrative choice for teams with strong engineering spirit.

The king of the hill in the “License Compliance”-category seems to be the Black Duck Suite. Black duck is also the one hosting the Ohloh site, thus bringing value to open source communities. Hats off for them!

Exploration” and “Architecture Compliance”-categories are represented best via Structure101 and NDepend. NDepend has innovative code query language, which is used to monitor metrics and design rules. Structure101’s focus on dependency management is refreshing, and the examples are illustrative. Both tools support DSMs and dependency graphs to visualize the architecture - see example screenshots below:

Structure101 screenshots

A couple of notable documentation generators are Doxygen and Doc-O-Matic. Doxygen is an open source alternative with plenty of projects using it, while Doc-O-Matic is a commercial alternative with more polished UI. Both seem to get the job done.

Conclusion

So should you use static analyzers? If you take care of a large codebase, you definitely should. For a project that is not doing any static analysis at the moment, I would install SonarQube with the right set of plugins, and hook that into the development process. Furthermore, if you mind about the design of your software system, one of the architecture compliance tools should be a no-brainer.

Without thinking too hard, tool vendors have plenty of opportunities that could make their products more lucrative. For example, why not engage users with more advanced visualizations, touch experience and improved social aspects? Dropping a few pieces of gamification into UI could also help the adoption.