Metrics That Count (and it ain't "points")

I've lived through my fair share of software productivity and management frameworks. Kanban, agile, scrum, XP, waterfall...there's probably more that I'm trying to subconsciously suppress as I write this. Each of these is concerned in some respect with metrics. How do we evaluate how much work is getting done and who is doing it? How do we use metrics to improve "cycle times"? How do we improve "burn down"? How can we use these metrics at performance review time? Well, IMHO, none of these "metrics" really matter. What matters is shipped software, delighted customers, rising stock prices, and stress-free employees who get to go home at the end of the day and spend quality time with their spouses, significant others, and/or kids. Nothing else matters.

Management thinks it needs hard numbers and metrics to determine if the program is meeting its software development goals. It also needs numbers to determine which developers are meeting expectations and which are unsatisfactory. One problem is that software development is not assembly line work. In a Toyota factory, management has various mechanisms to determine efficiency of individuals. Number of defects, speed of the line, number of work stations mastered, the ability to train new employees, etc etc.

Art vs Science

Software development is *not* assembly line work, no matter what new language or Big Data system the cool kids are all using. Software development is more "art" than "science". And management of "art", with its inherent lack of metrics, is so much harder to do than managing something with defined metrics and "numbers"...something like science or math.

Think I'm exagerrating? Do you think Pope Julius II evaluated Michelangelo based on the number of cherubs he painted on the ceiling of the Sistine Chapel everyday? It's true that they argued over the scope of the work and budget, but the Pope never tried to evaluate The Master based on some concocted metric.

There is so much in software development that simply cannot be measured up-front. We generally call these things the "non-functional requirements." Some shops call them "-ilities". Performance is generally considered a non-functional requirement. We all try very hard to evaluate the performance of our software before it ships, often using tools such as LoadRunner. But more often than not we find that we have not met the necessary performance metrics once the software is in the customer's hands. So, how do you measure a performance metric early? You really can't. So, do we ding the team or individual for failing this metric?

The only metric that matters

... in software development is working, released features that a customer wants. If the feature has not shipped then you get zero credit. There is no A for Effort. Even if you are 80% feature-complete, you get no credit. If you shipped it but the customer doesn't like it, you get no credit either. I hear developers complain that it isn't their fault that the product failed...the requirements from the analysts were wrong and the developers merely implemented the requirements as given. I appreciate that argument, and I feel your pain, but the only metric that counts is a happy customer. When your company goes bankrupt because your product failed because of bad requirements, I'm not sure your mortgage company is going to care.

Other Metrics

There are lots of metrics management uses to evaluate us. Here are a few and my rebuttal as to why they don't work for project evaluation:

Tickets closed: I've worked at shops where each branch of code needed its own ticket for check-in purposes. And we always had 4 supported versions/branches open at any time. So a given bug may need 4 tickets. That's called "juking the stats."

Lines of code written: so now we are incentivizing people to write more code instead of elegant, short-and-sweet, supportable code. More lines of code = more bugs.

There are two ways to design a system: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies

Story Points: A quick google search for "what is a story point?" yielded this article which pretty much concludes that you shouldn't use story points for metrics. Oops.
Velocity: this supposedly shows management the "rate of progress" in our development. In a perfect world our velocity will improve when we develop new tools that help us work smarter and faster, such as automation. But many times velocity is merely going up because developers are incentivized to make the velocity improve and they do this the simplest way possible...cut corners.
Code Test Coverage: there are lots of tools that will analyze how many lines of code you have that have no unit tests. I covered this in my blog post [[Paradox of Unit Testing?]]. This leads to people juking the stats again...writing a bunch of tests to make the code coverage analysis tool happy.
Unit Tests Written: see [[Paradox of Unit Testing?]] again. I have worked with people who have refused to add new features to their code because there were too many unit tests that would need to be rewritten.

The last two are the WORST offenders. Most developers realize that lines of code, points, and tickets closed are ridiculous metrics, but many otherwise thoughtful developers fall for static code analysis and unit tests. I've seen whole teams spend entire sprints writing unit tests for code that was 5 years old with no reported bugs because there was no test coverage. It sucks the life out of the product and the team.

I once tried to remedy this situation, and I hate to admit this, by merely adding empty test bodies to get the metrics acceptable. And I've seen lots of people merely comment out broken tests to avoid getting weenied for that.

Why do we rely on metrics?

Numbers suggest control. Management likes control. But this is totally illusory. I've worked on teams where every sprint had some task called something like "refactor the Widget interface settings" that was assigned 15 points. If the team had a bad sprint they merely claimed these points to make the numbers like good. No work was ever really getting done and management had no idea. That same team, after a 12 month release cycle, had ZERO features to contribute to the product's release. Management was not happy. But every sprint showed progress and burndown.

Heisenberg and Perverse Incentives

When something is measured too much then the measurement itself will skew the system under measurement. Loosely, this is known as the Heisenberg Uncertainty Principle. I've worked on teams where there was an over-reliance on points as the metric to determine productivity. People knew they were being measured and they geared their activities to those things that could generate the most points. This usually meant pulling simple cards with small points that were low-risk. The more important, higher point but longer duration "architecture" cards were never worked on. They were too risky, you either got all of the points, or none of them.

Summary

I'm sorry about this long rant on software development metrics. Every team is unique and will determine how best to structure itself for optimal efficiency. So many of these metrics are an effort to shoe-horn the team into a structure that management is comfortable with, even if that is forsaking the goals of the program. Let the team figure out what metrics it needs. Management's only metric of concern should be shipped features. Nothing else matters. When management evaluates its development teams on that metric I believe you will see teams that are super-productive because they are no longer juking the stats and wasting time on anything that is not leading them to more shipped features.

But nothing is going to change. It is too risky for management to not have a solid metric in which to evaluate us and our products. Very depressing.

You have just read "Metrics That Count" on davewentzel.com. If you found this useful please feel free to subscribe to the RSS feed.

Dave Wentzel 2014-04-16 CONTENT
other management devops