Comments
L W wrote: Dear Sir, Please do forward a Google Wave Invitation to lvw.iv4 (at) gmail (dot) com, at your earliest convenience? Much appreciated!
Cloud Expo on Google News

SYS-CON.TV

2008 West
DIAMOND SPONSOR:
Data Direct
SOA, WOA and Cloud Computing: The New Frontier for Data Services
PLATINUM SPONSORS:
Red Hat
The Opening of Virtualization
GOLD SPONSORS:
Appsense
User Environment Management – The Third Layer of the Desktop
Cordys
Cloud Computing for Business Agility
EMC
CMIS: A Multi-Vendor Proposal for a Service-Based Content Management Interoperability Standard
Freedom OSS
Practical SOA” Max Yankelevich
Intel
Architecting an Enterprise Service Router (ESR) – A Cost-Effective Way to Scale SOA Across the Enterprise
Sensedia
Return on Assests: Bringing Visibility to your SOA Strategy
Symantec
Managing Hybrid Endpoint Environments
VMWare
Game-Changing Technology for Enterprise Clouds and Applications
Click For 2008 West
Event Webcasts

2008 West
PLATINUM SPONSORS:
Appcelerator
Get ‘Rich’ Quick: Rapid Prototyping for RIA with ZERO Server Code
Keynote Systems
Designing for and Managing Performance in the New Frontier of Rich Internet Applications
GOLD SPONSORS:
ICEsoft
How Can AJAX Improve Homeland Security?
Isomorphic
Beyond Widgets: What a RIA Platform Should Offer
Oracle
REAs: Rich Enterprise Applications
Click For 2008 Event Webcasts
The Paradox of Writing Perfect Code
Static code analysis versus Santa Claus and the Easter Bunny

Don't you love looking at a good piece of code? I'm talking about the kind of code where the design is so sound that the code practically wrote itself, where there were no nasty surprises at implementation, where it was 100% feature complete and bug-free, and you didn't have to patch it up a bunch of times. Maybe I'm squarely in the land of Santa Claus and the Easter Bunny, but I believe, deep down, all developers want to write that perfect piece of code. Unfortunately, real life has other ideas. Deadlines, unclear or conflicting requirements, ridiculous scope, being human - all these things keep us from the promised land of perfect code.

But here's the rub: though it may be satisfying to dream about, it's likely that you'll never produce truly perfect code for real-world applications. You'll sit down to write a piece of code, you'll do the best you can, taking into account everything you know about how the system works, how your piece of code fits into that system, and so forth. But we all know there will be mistakes - probably lots of them. And you'll do some testing, and the QA guys will do some testing, and the beta customers will do some testing, and then poof, the business-minded people in your organization will decide it's good enough to be released. At that point, the code isn't perfect, and every time you have to change that released code, you introduce risk into the system. Thousands or hundreds of thousands or millions of people are using it as is, and if you decide to make changes it might work differently for those people. This is risky, and the tools you use to help write your code must be cognizant of that fact.

Tools for Writing Imperfect Code
This article is about a certain kind of tool - static code analysis - that can be used to help you in writing good code. Not perfect code, but good code. As introductory computer science classes increasingly move to Java (even the high school AP computer science curriculum is Java-based now), the tools available to C/C++ developers should move over to Java as well. Over the last decade, as Java exploded in popularity, there have been tremendous breakthroughs in the area of practical static code analysis for defect detection. Today many commercial tools are available to do static code analysis of your C, C++, and Java code. I work for one such tool provider and I'll discuss our experience expanding from C/C++ into Java here. We'll explore how some of the concepts we used to analyze C/C++ code translated into the Java realm and the lessons we've learned in making this type of technology practical to help you write good code. First, I'll dig into some discussion of architecture and then I'll give you my philosophy on finding bugs automatically and the true purpose of these tools.

C++ and Java: What's the Same?
From a code analysis perspective, C++ and Java have a lot in common. Both require you to build some representation of the code into the guts of your analysis for dataflow analysis. This means breaking each function or method into basic blocks, computing a control flow graph, and having an analysis engine that can push checks down each possible execution path in the methods while keeping track of the relevant variables and their values. With this, each check can then pull out relevant constructs while analyzing the code. For example, if I'm looking for NULL exception problems, my NULL checker simply looks for places where objects are compared against NULL or assigned a NULL value, and then lets the analysis push down a path until I see a dereference when that value is NULL.

Listing 1 shows an example from the Struts framework. Notice that on line 171, the developer compares body against null. Unfortunately, the developer probably meant to make that comparison == instead of !=. In the case where the pointer is null, the code will skip over the assignment on line 172 and dereference the body variable on line 175. Oops. Listing 2 shows you what that code looks like in the interface of a static code analysis tool. The analysis engine pushes the checker down all the paths in this function. The checker notices the comparison against null, keeps track of the body value as being null when the condition on line 171 is false, and then reports a problem when it's derferenced as null. Simple enough, right?

False Positives and Java
Well, almost right. The biggest problem that the designer of any static code analysis tool faces is false positives. What is a false positive? Basically, any time the analysis reports a defect where there is none, that's a false positive. Some people call this noise, but I like to stay away from that term. Noise is a problem, but it's a different problem. To better understand a case that might trip up a static code analysis tool, take a look at Listing 3. The struts code from the previous example has been slightly modified to introduce a data dependence between the value of body and the value of body_tracker. Notice that after the test of body against NULL, the value of body_tracker will be 5 if body is not NULL and 12 if body is NULL. As such, there's no longer a NULL dereference on line 177 because it's guarded by the check of body tracker. This example is simple enough, but may fool some simple analysis engines into reporting the defect where there really is no problem at all because there's no possible execution path that leads body to be dereferenced when NULL.

False positives cause developers to lose trust in a tool. Why? Because the tool is wrong, and if it's wrong more often than it's right, eventually the user won't trust the tool at all. Fortunately, the techniques available for reducing false positives in C/C++ analysis translate rather nicely into the Java space. We simply provide additional checkers to search for "false paths" through the code - paths that can never be executed when the program is running. These additional checkers keep track of data flow in different ways, and any time they find a path that can't be executed, it's pruned from the analysis. This "false path pruning" is a key way to significantly reduce the false positive rate.

C++ and Java - What's Different?
There are a few key differences in analyzing C/C++ code versus Java code. Unlike C/C++, Java affords us more luxury in choosing which code to analyze. We chose to analyze bytecode instead of source code. There are tremendous advantages to looking for defects at the bytecode level. The biggest, of course, is the fact that the code has already been compiled - you don't have to deal with compiling the code and juggling the many different flavors of build systems out there. The disadvantage (if you can call it that) of analyzing bytecode instead of source code is that you need some way to tie the errors you find back to the source code. This means that the bytecode needs to have debugging symbols in it or the errors you produce won't be of much help in actually fixing the code.

The types of defects that you look for are also different. Defects in Java code have different runtime implications than their C/C++ counterparts. A NULL pointer dereference throws an exception in Java and crashes your system in C/C++. A resource leak in C/C++ happens any time heap-allocated memory isn't freed, but in Java, resource leaks occur under different circumstances - when clean-up must be done on an object that the garbage collector can't be responsible for.

Interprocedural Analysis
One key feature of the most powerful static code analysis solutions is their ability to understand what happens when one method calls another. This not only helps in finding more complex defects in the code, it also reduces the false positive rate because analysis mistakes are less likely. However, the analysis of Java introduces a challenge in this regard because virtually every method call is, er - virtual. This means that it's not so clear which instruction a virtual method call will jump to when the code is being analyzed. It depends on the runtime type of the object invoking the method. While this is a problem in C++ as well, it tends to be less systemic due to the fact that most people developing C++ code (a) don't always use objects in their code and (b) don't make all their methods virtual. To tackle this problem with a practical code analysis tool, we've developed techniques to infer the correct types of objects at runtime to determine which virtual methods could be instantiated at any given call site. Of course, our technology must make the appropriate trade-offs to retain as much precision as possible while still scaling to analyze large real Java systems. There's some great research out there to discuss the academic techniques from which we draw our ideas for implementing this in the real world. If you're interested in learning more, check Google for "Rapid Type Analysis" or "Class Hierarchy Analysis."

Noise
As I mentioned earlier, false positives are the number one challenge for static analysis. The number two challenge, and unfortunately a harder problem to deal with, is noise. How is noise different from a false positive? Noise is any issue reported by the analysis that, while technically correct from an analysis perspective, is something you just don't care about. It's obvious why this is so hard - it's completely subjective! Yet it's very important to address this to produce useful results. Take a look at Listing 4. Notice that on line 173 there's an extra space before the statement. Your static code analysis tool could report that extra space as a defect, but I'm willing to bet that most of us would consider that noise. The analysis isn't wrong per se - the statements don't line up - but I just don't care. Sure, this example is extreme, but there are less extreme cases that can be equally frustrating - even within checkers for things like NULL pointer exceptions. I've heard developers say, "Sure, but if that happens, we're totally hosed anyway, so it doesn't matter that it throws an exception there!" So the analysis can be spot on, producing an actual "defect" that could occur, but it's still reporting noise.

What To Look For
There's no silver bullet for eliminating noise, and there will always be a trade-off between the aggressiveness of an analysis and its false positive rate. But this brings me back to my initial point about the risk of changing your code. The purpose of a static code analysis tool, whether for C/C++ or for Java, is to help you find defects that would hurt the most, and to find them earlier in the software process. The purpose of these tools is not to find everything that's bad in your code, and that's a subtle distinction. There's too much risk associated with changing your code to address every little nitpick a static analysis tool can report. So when you're looking to add this type of technology to your arsenal of tools to help you ward off the bugs, take a close look at what it's going after. More "bugs" aren't necessarily better. Your time is valuable, and you don't want to waste it poring through false positive-ridden and noisy reports. Fortunately, there are tools out there that are on your side.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Latest AJAXWorld RIA Stories
Performance implications of certain CSS Selectors are not specific to a certain JavaScript Library like Prototype. I recently blogged about the internals of CSS Selectors in jQuery. The same holds true for every JavaScript library that offers CSS Selectors. Certain lookups can be...
Adobe put out this press release - well, kinda, it was released at 6am Saturday morning and the company didn't bother to tell its staff about it, least of all its sales people. Anyway, it's about how Acrobat.com, Adobe's contribution to the flock of Office-challenging web apps, h...
The .append() method is perhaps the most misused of all jQuery methods. While an extremely useful and easy method to work with, it dramatically affects the performance of your page. When misused, the .append() method can cripple your JavaScript code's performance. When used well,...
Recently I installed the Beta 2 version of "Geneva", or ADFS 2.0. All of my machines are now Windows 7 machines, including just about all of my VHDs and virtual machines. The only time I use Win2k8 R2 is when the product I'm installing specifically requires me to do that. So when...
SYS-CON Events (http://events.sys-con.com) announced today that the "show prospectus" for the 5th International Cloud Computing Conference & Expo (www.CloudComputingExpo.com) is now shipping. 5th International Cloud Expo will take place April 19-21, 2010, at the Jacob Javits C...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


SYS-CON Featured Whitepapers
ADS BY GOOGLE