Top Articles on Measuring Developer Productivity
We've done some research on the top content in our space and decided to publish a few of the key articles we've found.
Measuring Developer Productivity - Content Categories
- Engineering a black box
- The output of software varies
- Common KPIs used in software engineering
- Methods of measurement and their fallbacks
- Is measurement even possible?
- Stack ranking is inherently flawed
- Successful outcomes are situational
- Simplified metrics that hurt engineering culture
- Core drivers of productivity
Software engineering is complex and invisible.
Software development is an "invisible" task
Outsiders find it difficult to appreciate the consequences of what they ask software developers to do. This is largely due to lack of data and understanding about how they should operate.
Software ≠ Coding
There are common misunderstandings about what software engineers do. More specifically, what makes a good software engineer. Some of the most impactful work software engineers do have nothing to do with the lines of code they write. In order to truly begin to measure engineer productivity, you have to understand their work and what 'successful outcomes' mean for software engineers.
The output of software development varies
Many engineering leaders, many opinions
From number of bugs to on-time delivery; this article shows how the approaches of each team can differ drastically. It's clear that software is misunderstood and the market is making attempts at quantifying the complexity of software engineering with basic metrics. The real take away is that any software analytics company that focuses too narrowly on any particular metrics is missing the big picture. Each team is different. Each engineer is different. By making attempts to compare them using the same metrics is inherently missing the mark.
KPIs used to measure engineer performance
Outcomes over outputs
is the typical statement made when evaluating which metrics to measure. The book accelerate (largely adopted) specifies 4 KPIs that emphasize outcomes over outputs. Lead Time, Deployment Frequency, Change Fail Percentage, and Mean Time to Restore. These can work well at a high level but give nearly no visibility into how the team is actually working or steps you can take to improve. It's important to track these metrics, but they give no visibility into how productive your team is or where they can improve.
9 Common Software Metrics
Additional high level metrics to track team progress. Again these are good practice to understand improvements but give no visibility into how productive the team is or where they can improve.
Methods of measurement and their fall backs
Measurements and fallbacks
From lines of code and revenue/cost savings to velocity. This article maps some of the attempts made at measuring engineering productivity and why their fallbacks. Can individual performance be measured? Better to measure at the team or organization level? Or is it better to evaluate opportunities to improve? "Measure things that matter" and that can differ across organizations. Use data to learn and improve. Not compare across teams or individuals.
History of measurements
This is an opinion piece that does a decent job of listing the history of measurements and why they fail to truly measure engineering productivity. While we agree with the author when he states "there still doesn’t exist a reliable, objective metric of developer productivity"; we strongly disagree with his end thesis that measurement is a lost cause. Funny thing is, he makes suggests focusing on "measuring anything that impedes progress, or the progress of delivering value to the customer" rather than attempting to measure and compare individual productivity with simplistic metrics. Enter Haystack.
Is measurement is even possible?
Measuring individual performance is not possible
Many failed attempts have been made to measure this inherently complex function. We tend to agree, but the premise of the article is measuring productivity for individual engineers. The nature of that implies that individuals can be measured and weighed against each other in a quantitative manner. The problem is that each team and individual on the team are different people with different preferences, skills, and experiences. Comparing them with the same metrics are flawed.
It's interesting to note that the author's suggestions for ways to measure IC productivity "subjectively" can in fact be deduced using data. Now while we agree that engineer productivity is difficult to measure; we believe it's being looked at from the wrong perspective. Finding patterns that affect developer's productivity certainly can be measured. Especially when you evaluate trends. Now this isn't to say that Jimmy should crunch out a feature every 2 days but to say "jimmy's slowest features tend to be in areas where there is limited team-wide knowledge".
Attempting to stack rank is fundamentally flawed
Each developer and team is a 'snowflake'
While we can easily measure all kinds of things about software components, individuals, teams, and projects, but the metrics we choose will have no basis for comparison outside of their own scope. Otherwise you're comparing apples to oranges. The only valid comparison in all of these cases is relative to individual history, as an indicator of progress toward a goal—ideally a business goal.
Successful outcomes are situational
'Successful outcomes' for software engineering are largely contextual.
'Success' for an engineer, team or project can change over time and is largely situational. For example, imagine a team that has a huge spike in the number of features they completed this sprint. Is that good? Well, they did more than the other team so at face value you can call it 'good'. But the reality is that there were no changes made to process, automation or team. This indicates that the team increased their bandwidth above their means in a non-sustainable way. The only way to truly know that this is in fact detrimental to long-term performance is if you analyze that team's history and determine this is an anomaly. You'll see an immediate decrease in performance in the coming weeks. You can see that having a single metric, in this case # features completed, as an indicator of success will slowly drive teams to unhealthy patterns of taking on more work than they can chew. This will lead to increase in defects, decrease in knowledge sharing (as engineers focus solely on crunching out features), and eventually burnout. A 'successful' outcome one day becomes a nightmare the next.
Situational metrics over traditional KPIs
The nature of knowledge work, specifically the output of knowledge work, cannot be quantified at all times. In this case, traditional KPIs fail to accurately measure productivity and incentivize the wrong outputs (i.e. lines of code). 'There is convincing evidence that it is beneficial to setup situationally relevant metrics ... using a more sophisticated framework ... to quantify productivity from different dimensions will conduct a more balanced and fair result.'
Traditional measures of productivity don't work for knowledge workers
The traditional measure of productivity, is calculated as "output per hour" of work. Knowledge workers are those who “think for a living,” making productivity challenging to measure since outputs and how to calculate them varies widely.
Simplified metrics hurt that engineering culture
Metrics are useless or evil
"Many programmers think that metrics are useless or evil, especially if they are used by management to evaluate and compare programmer productivity." Metrics like lines of code, number of coding hours, number of commits, etc. are overly simplistic measures of engineering productivity.
6 Factors with highest impact on productivity
From the review of more than 800 individual research papers and 35 meta analyses, the six factors that had the highest statistical association with the performance of teams involved in knowledge work are:
- Social cohesion
- Perceived supervisory support
- Information sharing
- Vision and goal clarity
- External communication
Having global metrics to measure engineers, teams and projects violates the core cultural factors with the highest impact on software engineering productivity. Engineers know that no one metric should define their success. By using any metric to stack rank and measure their 'productivity', you are actively violating many of the highest impact factors of productivity such as trust, perceived support from managers, removing clarity of top-level goals. You can see how this approach has a negative impact on culture and thus, productivity.
Core productivity drivers and blockers
With baseline metrics for each engineer and team, we can use pattern recognition to identify core drivers and impediments to productivity. By focusing not on team-wide metrics and KPIs but looking at changes in productivity and opportunities to improve, we can maintain the top cultural factors that with the highest impact on productivity. This allows engineering leaders to introduce data without sacrificing culture. For example, using data to determine the impact of distracting meetings, technical debt, and over work enables data-driven decision making without sacrificing culture.