Sunday, June 22, 2014

Code Review Reality

It's been a while since I posted. However, I've been more active in development than ever, and naturally have developed more opinions on certain aspects of coding. Call it a by product of experience and observation, coupled with a reporter's instincts (if you didn't know, I was a newspaper reporter in a past life).

One thing that I have latched onto, is code review.

I'll be blunt as to the reason for this research; I instinctively never bought the idea that ad-hoc code review is a meaningful way of identifying and reducing defects, and my research and observation up to this point indicates that code review is generally adopted based on anecdote, is usually implemented in a short-cut fashion ("desk inspections"), and as such, will have little to no real, consistently measurable result as far as reducing defects.

When you've finished derisively snorting and if you haven't already dismissed this blog post as ignorant, please continue reading.

As some background, over my 2.5 decades as a professional developer, here's how I've seen code review introduced at most organizations:


  • A team of coders creates a product, very quickly, to get a prototype off the ground.
  • That product begins to gain traction.
  • "In the wild" bugs, become embarrassing, and sometimes costly.
  • Management pushes engineering on the matter.
  • Senior coders state that most problems are due to the prototype quality of the code.
  • Senior coders request time to reengineer the tiers of the system.
  • Management rejects this recommendation in favor of delivering selling-point features.
  • Management insists on a process to measure bugs.
  • Bug counting at phases of the development-release cycle is introduced.
  • Management demands that the bugs at each phase of the cycle be drastically reduced or eliminated.
  • Senior coders again state that the prototype nature of the software makes this very difficult.
  • Management again rejects requests to reengineer.
  • Hiring begins in earnest: "more coders means more productivity," (a well known fallacy).
  • Hiring rates for senior level coders are disappointing.
  • Management looks to things like "process improvement".
  • Code Review, supposedly cheap, easy to implement, and a quality improvement "no brainer", is introduced.
  • It doesn't get the anticipated results; bug patterns appear more or less the same over time.
  • The data is pushed around, argued, determined to be insufficient to draw real conclusions.
  • More processes are introduced (common reluctance to abandon "investment" in a failed system).
  • Nothing really changes.
  • The process is often perceived as ponderous and/or punitive by coders.


Regarding the latter, the litany of articles on how to try and avoid code review being perceived as punitive and ponderous, should make it clear that the sentiment is common. I won't bother citing references to such an easily predictable fact.

I'm sure many of you reading this have seen a similar pattern. Does it mean code review is pointless and ineffective?

Not at all. I want to be clear on this point; I am NOT saying that code review has no purpose.

What I am saying, is that short-cut "desk inspection" code review without the prerequisite processes to improve the code that is submitted for review in the first place, and without building in preparation time to conduct reviews properly, can reduce productivity and morale, and increase cost, without any real result in terms of reducing defects.

So if you are going to implement code review, be realistic about what can be achieved with it. If you are doing desk inspection of fragments, then no other result aside from improved formatting and use of internal standards, if the organization actually has such standards to refer to (and most do not), should be expected. If the organization does not have an internal standard to refer to (and again, most don't), then ad-hoc desk inspection code review is pointless.

To keep this blog post short of a long white paper, I'll present a handful of findings that correlate with my overall findings and observation.

Consider this statement from "Agile Testing and Quality Strategies: Discipline Over Rhetoric" by Scott Ambler and Associates. Note that he makes an important distinction regarding Agile vs. Serial development:


"First and foremost, I consider the holding of reviews and inspections to be a process smell which indicate that you have made a mistake earlier in the life cycle which could be very likely rectified by another, more effective strategy (non-solo development, for example) which has a much shorter feedback cycle. This goes against traditional wisdom which says that reviews are an effective quality technique, an opinion which is in fact backed up by research -- in the book Practical Software Metrics for Project Management and Process Improvement Robert Grady reports that project teams taking a serial (non-agile) approach that 50 to 75 percent of all design errors can be found through technical reviews. I have no doubt that this is true, but my point is that although reviews are effective quality techniques in non-agile situations that in agile situations there are often much better options available to you than reviews, so therefore you should strive to adopt those quality techniques instead."

There's another important point I infer from this. If you are finding lots of bug in your code reviews, then your coders need to up their game. An obvious bug caught in review should be quite rare. 


Let's talk about the cost of catching a bug in the coding cycle. I found this graph, taken from the previous source, to be pretty much dead on.


Note that defects found via review or inspection are more costly than defects found in a number of processes that should be implemented before code review, such as Pair Programming, Continuous Integration and Test Driven Development. If bugs are getting through to review...you have a problem that is costing you money.

Another interesting read was the IEEE white paper, "The Impact of Design and Code Reviews on Software Quality: An Empirical Study Based on PSP Data" by Chris F. Kemerer, Member, IEEE Computer Society, and Mark C. Paulk, Senior Member, IEEE.

This paper is related to much more formal and highly academic software development setups than your typical "agile" (I use the term loosely) startup or small development shop has in place (again, stressing the result between serial, and agile, development) But even when using such an environment as the control, it correlates that preparation prior to a review, such as familiarizing oneself with the overall system, requirements, and design, is crucial in achieving the effect of reduced defects.

"Despite consistent finding that inspections are generally effective, Glass has summarized the contradictory empirical results surrounding the factors that lead to effective inspections. Weller found that the preparation rate for an inspection, along with familiarity with the software product, were the two most important factors affecting inspection effectiveness."

So, if you are not intimately familiar with a given area of the software product, and do not have time to prepare for the review to get a complete understanding of the design and requirements, you should not be expected to conduct reviews that measurably decrease defects. Anything short of this discipline throws the value of code review and inspection, as far as reducing defects, into debate:

"In summary, although the benefits of inspection are widely acknowledged, based on these competing views and conflicting arguments the discipline has yet to fully understand the fundamental drivers of inspection costs and benefits."

Let's focus on "preparation" for a code review.

The faux-agile notion of code review is a quick "desk inspection": you get a fragment of code, look it over quickly, give it a pass, or open an issue for comment. There's a little back and forth, something might get tweaked (e.g. "we put returns before our brackets"), the coder eventually gets a thumbs up, and commits.

I reject the notion that enough meaningful and costly-to-fix bugs can and will be caught in such a review process to make it worthwhile. I found this correlated in numerous sources that were far more critical of the subject, but I decided to lead with the IEEE paper because it speaks of the power and value of code review...if it is done correctly. That means proper preparation, which means more time, which means more cost.

So if you are implementing code review and expecting it to be a cheap way of reducing defects...well, I hope your job doesn't depend on the results.

Here is one such more outspoken source.
From "Code Review - The Meaningless Ritual" by Daniel Pietraru:

"Done properly code reviews are extremely time consuming. Today’s programmers are more efficient, due to tools and better languages, then 30 years ago. They can create more code faster. But the amount of code inspected per hour is about the same because they are not smarter than people 30 years ago. Imagine 1000 lines of code that one programmer can write in 1-2 days.

With a traditional code inspection 4 people are required to inspect this code at a rate of 200 LOC/hour after they prepared the code review by reading the code and understanding the design. If we are optimistic the whole thing will require 6-8 hours per developer. This means a total of 24-32 hours spent on inspecting code produced in 8-16 hours. Probably this kind of review can find a lot of bugs in the code but is it worth the price? How many teams get away with estimates including 2-3 hours of inspection for every hour of development?"

People might say those production numbers are unrealistic. So, I'll give real data. In one month, I pushed 84 commits, with 2000+ deletions and 5700+ changes (numbers from GitHub). So push those numbers out to a team of say, 15 coders. In fact, cut the number in half if you think I'm unusually productive. Then do the review-cost math submitted by Pietraru.

EXPENSIVE. And that's just the desk inspection version. With proper inspection, those reviews could have taken a week...just for one coder, me.

And to be frank, if I had to get through reviews of all my code, that productivity would have dropped significantly. There's stats for you; if I reduce my productivity to zero, I won't introduce any bugs, proving the value of reviews.

Pietraru also goes on to point out a very interesting fact; where did we get code reviews from at all? Again, numerous sources correlate the following:

"Code inspections or reviews as we know them come from the hardware world. Initially Michael Fagan [from IBM, 1976] used them for hardware design reviews. This fact is very significant because uncaught defects in hardware design are pretty expensive.

Also defects in low level software were very expensive at the time. Imagine you write a piece of code that has to be burned on a ROM chip (non erasable). You cannot afford to wait and test/debug your code until after the ROM was destroyed. But this is not the case anymore. Unit tests and running the code in debuggers for success and failure paths are pretty cheap with modern IDEs and testing frameworks."

This reinforces the notion stated by the "Agile Testing and Quality Strategies" paper cited before; in a shop that purports to be "agile":

"...you should strive to adopt those [other] quality techniques instead".

In closing:

There is simply too much information available to show that ad-hoc "desk-inspection" code reviews in and of themselves do not usefully correlate to a measurable reductions in defects. In its useful form, code review is a disciplined practice involving a significant investment of time, which translates to money. It met with success in hardware programming, because the investment was offset by the irretrievable cost of producing bad hardware. But this has nothing to do with most software development today; modern IDEs and productivity tools allow us to write FAR more code than hardware programmers did in the 70s, and we can push out hot patches, update UIs without swapping out server bits, and so forth. We're not worried about the cost of 100,000 bad ROMs.

Ultimately, such short-cut reviews can raise overall costs and have other negative effects. There are other more effective cultural changes and processes available that can really make a difference in the quality of your product.