If you've ever wondered how to measure developer productivity, you're not alone. As engineering leaders we constantly get asked to produce numbers to show how our organization is improving.
As software developers, we produce things - so it's logical we want to measure how much we're producing. The problem is we can't measure 'how much' we're producing because software development isn't like a factory assembly line. You can't count up the number of widgets produced and quickly determine how much they cost.
This ongoing search for software development metrics is a tale as old as time. In this article we'll talk about some common metrics used to measure engineering 'productivity' today. Specifically their pros, cons, and why they ultimately fail.
The flawed 5 metrics all share one thing in common. They all try to measure engineering output and pretend this translates into productivity. That doesn't work.
Lines of code can give you a high level view of the size of a software system, how the codebase is changing and potentially the complexity of that code base.
With that said, lines of code is easy to identify as a bad metric. Not only do different languages have different formatting conventions (3 lines of code in one language is 20 lines in another) but it's extremely difficult to compare.
Any good software developer knows that good code or refactors are actually trying to produce less code, not more. So why measure and track a metric that incentivizes the opposite of the good practices and behaviors we want?
It's a terrible KPI for software development
Commit frequency is typically used to reward teams with a high frequency and improve teams with lower one.
At face value, commit frequency may seem like an okay metric. Of course, it's not perfect but it might be useful and definitely easy to count. You can encourage small, frequent commits and - if used right - could support greater transparency, collaboration and continuous delivery. You can also identify a team with lower number of commits - which could be a problem - and encourage smaller commits which makes sense too.
The problem with number of commits is that doesn't tell you actual value delivered. Just like lines of code, it's very easy to game. Just create more commits. Even if it's not gamed, a rise in commits doesn't indicate more productivity, output or value delivered.
So what are we really measuring?
Pull request count can give you a sense of release cadence and continuous delivery.
Unfortunately it's a vanity metric. Pull request count alone doesn't take into account size or difficulty of pull requests - it just tells you how many PRs were created. It's less offensive and less obviously wrong than lines of code but both suffer from the same core problems.
Pull request count - just like lines of code - encourages negative behaviors. We encourage developers to create an excessive amount of small pull requests just to inflate their metric. This causes bloat in the code review process and creates unnecessary overhead across the team.
This is typically what happens when you incentivize, reward or create KPIs for developers based on an output metric.
Velocity points are a great tool when used for delivery forecasting and estimations.
Unfortunately, they are often misused as performance metrics. When you turn velocity from an estimation tool to a measure of productivity or output you end up rewarding teams based on points. This immediately jeapordizes accuracy of estimations as developers are incentivized to inflate points.
You turn a potentially useful tool into a meaningless metric - as it undermines the usefulness of the estimation process.
This is a new metric used by many engineering 'intelligence' platforms but it's far from intelligent.
'Impact scores' essentially boil down to lines of code with extra steps. They factor in number of lines touched, new vs. existing code, etc - all combined as an "impact score". A lot of companies attempt to use this metric and - in almost all cases - developers HATE it.
Not only does it suffer from the same flaws as lines of code but it's even more difficult to understand. So many things go into it's calculation that it becomes inactionable.
The biggest flaw in this metric is it's name. "Impact" suggests to executives and managers how this metrics should be used. It's a strong signal as an end all be all way of evaluating and ranking developers - aka the exact thing we're afraid of.
Being a manager and not being able to measure - that frustration - is really powerful. Leaders in charge of thousands of engineers - have absolutely no idea what's going on or how they're doing. Open source maintainers of the largest projects in the world have no idea whether their communities are healthy, growing or what the impact of their projects even are.
These are areas where metrics would be useful - but the metrics used today aren't great ways of doing so.
Burning desire and desperation for insights can lead us to measuring the wrong things or using metrics in the wrong ways - even when we see the flaws.
Before implementing any metric, you should always think about why you want to measure in the first place. Typically we want to encourage collaboration, teamwork and improvement while having a tangible way to express that to other stakeholders.
If this is your goal then you're not alone in this quest.
If you want to learn about healthy patterns and metrics to use on your team - we've put together this collection of metrics used by successful engineering leaders across the globe. Learn more about healthy patterns to look out for using data that's already available to you.
Download a collection of healthy metrics to use here.
Haystack is a github tool that helps engineering leaders identify blockers and trends. Instead of guessing if you're improving, or constantly bothering your team for progress updates, simply use Haystack to get alerts in your inbox every morning. Plus a dashboard to track improvements over time.
Discover our latest articles, feature releases, and more!