"Software development isn't a factory assembly line"
If you've ever wondered how to measure developer productivity, you're not alone. As engineering leaders we constantly get asked to produce numbers to show how our software development team is improving.
As software developers, we produce things - so it's logical we want to measure how much we're producing. The problem is we can't measure 'how much' we're producing because software development isn't like a factory assembly line. You can't count up the number of widgets produced and quickly determine how much they cost.
This ongoing search for software development metrics is a tale as old as time.
In this article we'll talk about some common metrics used to measure engineering 'productivity' today. Specifically their pros, cons, and why they ultimately fail.
The Flawed 5 Metrics
In this blog post, we summarise the ‘flawed five’ metrics as described in Abi Noda's original LeadDev post: "The ‘flawed five’ engineering productivity metrics".
The flawed 5 metrics all share one thing in common. They all try to measure engineering output and pretend this translates into a teams’ productivity. That doesn't work.
- Lines of Code
- Commit Frequency
- Pull Request Count
- Velocity Points
Flawed Metric #1: Lines of Code
Lines of code (loc) can give you a high level view of the size of a software system, how the codebase is changing and potentially the complexity of that code base.
With that said, lines of code is easy to identify as a bad metric. Not only do different languages have different formatting conventions (3 lines of code in one language is 20 lines in another) but it's extremely difficult to compare.
Any good software developer knows that good code or refactors are actually trying to produce less code complexity, not more.
So why measure and track a metric that incentivizes the opposite of the good practices and behaviors we want?
Flawed Metric #2: Commit Frequency
Commit frequency is typically used to reward teams with a high frequency and improve teams with lower one.
At face value, commit frequency may seem like an okay metric. Of course, it's not perfect but it might be useful and definitely easy to count. You can encourage small, frequent commits and - if used right - could support greater transparency, collaboration and continuous delivery. You can also identify a team with lower number of commits - which could be a problem - and encourage smaller commits which makes sense too.
The problem with number of commits is that doesn't tell you actual value delivered. Just like lines of code, it's very easy to game. Just create more commits.
Even if it's not gamed, a rise in commits doesn't indicate more productivity, output or value delivered.
Flawed Metric #3: Pull Request Count
Pull request count can give you a sense of release cadence and continuous delivery.
Unfortunately it's a vanity metric. Pull request count alone doesn't take into account size or difficulty of pull requests - it just tells you how many PRs were created. It's less offensive and less obviously wrong than lines of code but both suffer from the same core problems.
We encourage developers to create an excessive amount of small pull requests just to inflate their metric. This causes bloat in the code review process and creates unnecessary overhead across the team.
Pull request count (just like lines of code) encourages negative behaviors.
Flawed Metric #4: Velocity or Story Points
Velocity points are a common agile approach along with sprint burndown charts and can be a great tool when used to forecast delivery and estimations.
Unfortunately, team velocity and story points are often misused as performance metrics. When you turn velocity from an estimation tool to a measure of software productivity or output you end up rewarding teams based on points. This immediately jeapordizes accuracy of estimations as developers are incentivized to inflate points.
You turn a potentially useful tool into a meaningless metric - as it undermines the usefulness of the estimation process.
Flawed Metric #5: "Impact"
This is a new metric used by many engineering 'intelligence' platforms but it's far from intelligent.
'Impact scores' essentially boil down to lines of code with extra steps. They factor in number of lines touched, new vs. existing code, etc - all combined as an "impact score". A lot of companies attempt to use this metric and - in almost all cases - developers HATE it.
Not only does it suffer from the same flaws as lines of code but it's even more difficult to understand. So much work goes into it's calculation that it becomes inactionable.
Note: The biggest flaw in this metric is it's name. "Impact" suggests to executives and managers how this metrics should be used. It's a strong signal as an end all be all way of evaluating and ranking developers - aka the exact thing we're afraid of.
Why are these metrics so commonly misused?
Being a manager and not being able to measure - that frustration - is really powerful. Leaders in charge of thousands of engineers - have absolutely no idea what's going on or if their software development process is a healthy one.
Open source maintainers of the largest projects in the world have no insight into whether their communities are healthy, growing or what the impact of their projects even are.
These are areas where software metrics would be useful - but the metrics used today aren't great ways of doing so.
The desire for key performance indicators can lead us to measuring the wrong things or using metrics in the wrong ways - even when we see the flaws.
So are there better software development metrics to use?
Short answer is yes.
Good Examples of Software Development Productivity Metrics
Before implementing any software engineering metrics, you should always think about why you want to measure in the first place. Typically we want to encourage collaboration, create efficient workflows, and optimize development processes while having a tangible way to express that to other stakeholders.
Any metric you chose should fit those criteria. If this is your goal then you're not alone in this quest.
Here are a few great examples:
- Cycle Time (lead time) - Speed
- Mean Time Between Failures (mttr)- Responsiveness
- Change Failure Rate (defect density) - Quality
- Deployment Frequency - Throughput
These metrics give you a high level view into how your team is operating and gives you insight into key bottlenecks the team may run into.
Of course, these are not the only metrics out there so if you want to learn about healthy patterns and types of metrics to use - we've put together this collection of metrics used by successful engineering leaders across the globe.
Download our full collection of healthy metrics.
Note: The full collection of metrics can be pulled from your Github Repositories / Source Code
Haystack is a github tool that helps engineering leaders identify blockers and trends. Instead of guessing if you're improving, or constantly bothering your team for progress updates, simply use Haystack to get alerts in your inbox every morning. Plus a dashboard to track improvements over time.