The Accelerate Book
In the book, Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations, Dr. Nicole Forsgren, Jez Humble and Gene Kim studied what made separated strong performing technology organisations from their less effective counterparts.
The book summarises years of rigorous research from years of State of DevOps Reports, built upon 23,000 datasets from companies all around the world. The organisations studied included start-ups and enterprises, profit and not-for-profit organisations and companies which were born digital alongside those that had to undergo digital transformation.
The Four Key Metrics
The research identified that just Four Key Metrics distinguish the performance of various organisations. These "North Star" metrics serve as indicators of overall software engineering health.
These metrics aren't "The Goal" of a business, but organisations which did well against these metrics had higher rates of profitability, market share and customer satisfaction. In other words; they allowed organisations to experiment faster, ship reliably and prevent burnout.
Likewise; these goals aren't "leading indicators" or "local metrics" which tell you whether you need to increase, say, Unit Test coverage or cut build times - they measure the entire engineering health of a team. By analysing these metrics and drilling down into inefficient areas, you can ensure you're constantly optimising things that will improve the end-to-end engineering health on your team, rather than optimising in a local area that will make no improvement.
The Four Key Metrics were as follows:
- Cycle Time (Change Lead Time) - Time to implement, test, and deliver code for a feature (measured from first commit to deployment)
- Deployment Frequency - Number of deployments in a given duration of time
- Change Failure Rate (CFR) - Percentage of deployments which caused a failure in production
- Mean Time to Recovery (MTTR) - Mean time it takes to restore service after production failure
We have covered the definitions of these metrics, their averages and further information in both the video above and in our Knowledge Base in the article Measure Engineering Efficiency & Developer Experience with Haystack Metrics.
Why These Metrics Matter
Where is the evidence though that these metrics actually help organisational performance though? How do you convince your boss that these are things that matter?
Martin Fowler on Accelerate's Metrics
Well, firstly, the mechanism that these metrics is well understood. Martin Fowler has described them as follows in his talk "What Does Tech Excellence Look Like?":
"These are really quite interesting because they boil down to a very important measure that many people like me have been talking about for a while which is that of Cycle Time. Cycle Time is, what is the period of time between somebody in the business world, or in the domain that the software supporting, having an idea of something they would like to see in software - and how long does it take from that idea to form and then go into software running in production."
"The faster you can cycle ideas into production, that gives you an advantage in experimenting with things and being able to try things out. And what the study indicated is that Cycle Time had a direct correlation with how good overall an organization's performance was."
"The other factor that they brought up this Mean Time to Recovery and that's actually quite interesting in itself. There's a phrase that goes out there, and comes from John Allspaw, that Mean Time to Recovery is more important than Mean Time Between Failures for most failures. The point being that, by putting stuff into production it's more important that you can realize that something's gone wrong and fix it than it is to prevent the failures in the first place."
"That's not something that many organizations really understand, they put a lot of effort into trying to prevent things going wrong but one of the things that we're learning is it it's more important to detect and fix failures than it is to prevent that. At least the most of the time, at least the non catastrophic cases. "So that's an important part of the picture is being able to recover fast, but let's go back onto this Cycle Time. Cycle Time as I've said is a very key measure and the reason I said that is important is it because it allows us to get early information about whether our ideas are worth pursuing or not."
Nicole Forsgren On How DevOps Metrics Correlate With Organisational Performance
Secondly, we have an abundance of evidence showing these metrics are correlated with the performance of organisations. Dr Forsgren summarised this in her talk, "The Key to High Performance: What the Data Says":
"High performers, year over year, are twice as likely to exceed their organisational goals. Profitability, productivity and market share. [...] We added additional measures like non-commercial goals, and the crazy thing is we find this same 2x multiplier. High performing organisations are twice as likely to exceed their non-commercial goals; effectiveness, efficiency, customer satisfaction. By the way, the measures for these are drawn from the academic literature - highly, highly rigorous measures. For bonus points, in the 2014 study, we had a large enough dataset that I could actually pull from stock market data - and we find that high performers had a 50% higher market cap growth over the previous 3 years."
In other words; rigorous appraisal of these Four Key Metrics (in research led by Dr Nicole Forsgren) has shown that higher performers are 2x more likely to meet their commercial goals (productivity, profitability, market share, number of customers) and their non-commercial goals (quantity of products or services, operating efficiency, customer satisfaction, quality of products or services and achieving organisational or mission goals). Indeed, companies which do well under these DevOps metrics have a 50% higher market cap growth over 3 years.
These metrics are even correlated to engineering well-being; indeed, the 2018 State of DevOps report found that companies which were elite performers against the Four Key Metrics "are 1.8 times more likely to recommend their team as a great place to work".
Measuring DevOps: The Four Key Metrics
Hello, I'm Junade from Haystack Analytics, and I wanted to take a few minutes today to talk a bit about an introduction to measuring DevOps using the four key metrics. So to kick things off, I wanted to start off by talking a little bit about the Accelerate book. The Accelerate book is all about the science of DevOps. It's written by three excellent authors, Nicole Forsgren, Jez Humble, and Gene Kim. The authors really studied the data points beneath this book for many, many years. They did a lot of research for the State of DevOps Reports, which are published by Puppet. I think in total, they analyzed 23,000 data points, everything from startups to enterprises, for-profit businesses as well as not-for-profit organizations, organizations which were born digital as one of those which are legacy and had to undergo digital transformation, so really a wide breadth of data points that they were able to study. And one of the things which came out of this was the four key metrics. And in effect, companies with engineering teams who did well against just four key metrics had higher rates of profitability, market share, and customer satisfaction.
Now, I should be clear, these four key metrics aren't telling you how to run a business, or which market you should be in, or how your product should look. Those are things which are really about the direction of your business, this is more about the engine and how your engineering health is going. I mean to use an analogy, one I've actually heard which describes this quite well is that the four key metrics measure how well your car is doing, how well it will get you from A to B, but it won't necessarily say where B is. So these measure overall engineering health, they're not leading indicators. So leading indicators are usually in a local area, there could be something like how is your code review going? How is your code review performance going? How is your development stack? How's your unit test coverage? These are leading indicators, they impact your key North Star metrics, but the leading indicators are more localized. They are more things which give you an indication as to how things are in a local area. The North Star metrics are about the global performance of your engineering organization.
Additionally, they aren't risk factors. Risk factors are usually things like if an individual pull request is merged without review, or a pull request is stuck in back and forth discussion, those things can ultimately impact the indicators and then your North Star metrics, but these North Star metrics, they aren't risk factors. That said, at Haystack Analytics, our product is capable of measuring all three of these. So we'll be able to measure your North Star metrics, we'll be able to measure leading indicators, and risk factors. So although these aren't things which I talk about in this video here, these are things which our product is capable of doing. And remember, it isn't the case of favoring one metric over another. If you start by overly optimizing a local metric, you could actually damage your North Star metrics, so you need to view these from the top down, you need to go and measure your North Star metrics, and then identify where the bottlenecks are to really understand things.
Similarly, as I mentioned, this isn't the goal of your business. This isn't going to tell you what your product should look like or how your marketing strategy should be. This will help you get there better, it will help make sure you have the engineering health to be able to deliver on your business vision, but the goal of a business is usually to generate revenue, to deliver return for your investors. The four key metrics will measure your engineering health, which will help you get there. The reason I put the goal in with a capitalized T and G is because I'm referring to an excellent book called The Goal by Eliyahu Goldratt, which is a really good book if you're interested in finding out how you can measure and optimize certain things in a business. In effect, these are North Star indicators, which measure the engineering health of your business. Really, really critical metrics, but it's also really critical to understand what they do and what they don't do.
So the first of these metrics, which is perhaps the most important, is Cycle Time. So Cycle Time is the time to implement, test, and deliver code for a feature. It's measured from first commit to deployment. And in effect, this measures the time to implement, test, and deliver code for a feature. We find that for Haystack teams, customers of ours, the average team has a cycle time of less than three days. The elite teams will usually get this down considerably more than that, but at least an average is three days for performance of cycle time. Now, this is formed of a variety of different leading indicators. So the cycle time itself, it measures that entire time for implementation, test, and delivery of code, but really there is development time, which is time spent developing something, there is also review time. Review time is the things which are to do with once someone's got a pull request, the first response time, then you've got the rework time, and idle completion time. So you have two leading indicators and things break down even more.
So for instance, if you looked at your cycle time, it's quite long, and then you drill in and you find that review time has taken a lot of time, and then it's your first response time, which is excessively long, you may be able to drill down into that and say oh wait, it's actually build times. Build times are holding everything up and meaning that the delay is there. So you're able to drill down from this North Star metric to a leading indicator, and then ultimately to understanding where there is an issue in the pipeline. This is a very, very, very powerful metric. To use a Mark Andreessen quote, which we often like at Haystack, "Cycle time compression may be the most underestimated force in determining winners and losers in tech." So why is this the case? Cycle time basically determines how well your company is performing from a developer experience perspective, from being able to get features worked on, and to be able to get them out of the door.
In other words, by being able to compress cycle time, you're able to deliver value to your customers even quicker and you're able to learn from them even faster. You're able to take those learnings and you're able to adjust the course of your business. And that's really, really the critical thing in being successful in any industry, really, being able to listen to your customers and being able to adapt as you need to. Secondly is Deployment Frequency. So Deployment Frequency is again another velocity metric. It's the number of deployments in a given day. Most teams we find deploy at least once a day. The best performing teams, they'll deploy every single time something has merged into a master branch. This identifies how quickly you're shipping business value to customers. So what this really means is that how quickly you're able to get our business value in front of your customers is something which is really, really critical, and it's critical for variety of reasons.
Smaller deployments have less risk of things going wrong. You're less likely to have to deal with two different changes which could react badly with each other, keeping things simple and keeping things small. There's less risk of things going wrong when you do many small deployments instead of one big bang deployment. Secondly, they provide an opportunity to deliver value to your customers in shorter iterations. So suppose you have a customer who has reported a bug. If you're able, one of your support engineers is able to find where that is in the code base and is able to fix it in about 15 minutes, you want the customer to be able to see that quite quickly, you don't want them to wait an entire week, because they'll be really, really happy if they see oh, 15 minutes. Wow, they fixed this issue for me. And likewise, if we're talking about functionality, getting functionality in front of users quicker will ultimately lead to more success in the marketplace. This brings us on to the third point, which is really that it allows you to learn quicker by getting things into customer hands quicker.
So if you're able to get a piece of functionality into a user's hands a bit quicker, to be able to get them to test it in that production environment, you'll learn things quicker, and that's something which is also really, really critical to bear in mind. So deployment frequency really helps us measure how things are from the perspective of actually getting things into the hands of customers and being able to keep our deployments lean. Thirdly, we have the Change Failure Rate. Change Failure Rate, or CFR, is basically the percentage of deployments which caused a failure in production. The average teams will typically have less than 15%, but the elite teams will usually be able to do this in less than 4%. Really, this metric measures broken builds that get shipped into production, so if you have something which is either something wrong in either the deployment pipeline or an issue with perhaps something not being tested and causing a critical issue, then this will help you measure that, but it is not a silver bullet.
What I mean by this is it doesn't necessarily cover all production incidents. So incidents can sometimes just happen in a production environment without there being a change. And similarly, you can have things which aren't necessarily a complete failure. You can have user bugs which are important nevertheless, but they can still cause an issue. And this really brings us onto the next metric, which is Mean Time To Recovery, and MTTR helps fit in some of the gaps around change failure rates. So it's the time to restore service after a production failure. And in the best teams, this will be heavily automated in seconds, so their Mean Time to Recovery will be quite low, they won't need human interaction very, very often. It's formed really of both the detection and the response components to this.
So detection, what this means is how long does it take to detect that something's wrong? So this could be captured in health checks, critical path, end-to-end testing and production, or in the event you're not able to automate this for whatever reason, customer reporting, and support, and that human process as well. So detection and response. So if you supposedly had very, very high unit test coverage which was only when there was a change, but you weren't doing any end-to-end testing and production, your Mean Time to Recovery would be higher because of that gap in that change failure rate metric there. Similarly, you also have a response component. The response could be anything, it could be a roll back, it could be a hot fix. Really, this metric is formed of both those parts. Mean Time to Recovery is that mean time to be able to detect and respond to a critical outage. Again, I'll talk about production failures.
These metrics here, they don't necessarily discuss things which are more to do with user bugs and things, but obviously having things like shorter cycle time allows you to really be able to get respond to user feedback quite quickly, and that balancing act, which is very important. You don't want to have vexatious levels of reliability, the cost at which you aren't able to ship business value quick enough, you have to balance between risk and reward for the area you're in. So if you just have a simple single page website, the business risk probably allows you to be a bit more agile and have a shorter cycle time, but obviously if you're in something which is more critical, you need to pay a greater balance to that risk factor as well, so it's not as if there is a fixed metric for this. Engineering professional judgment on the balance of risk and reward is really, really critical when you're interpreting these metrics, particularly around reliability.
So I will link in the description of this video to some resources on our knowledge base around the North Star metrics and leading indicators with support in Haystack. This article I've got on screen now, Measure Engineering Efficiency & Developer Experience with Haystack Metrics, really covers those North Star metrics and those leading indicators. And I'll also link some of the material from our knowledge base, which describes some of the risk factors we're able to alert on and how we use our real time alerting, the real risk factors that we pick up in product there, to really give an overview of how these metrics fit together into a definition of all of them, including the ones which aren't necessarily in the Accelerate book, which we cover as well. So thank you so much for taking the time today. If you have any questions, feel free to jump on Haystack's Slack community if you haven't already signed up. Here, you can interact with my colleagues and other Haystack customers. And thank you so much for taking the time, and I will hopefully see you again later. Thank you. Bye-bye.
Why Cycle Time and The Four Key Metrics Matter for Software Engineering Teams
Hello, it's Junade here from Haystack Analytics and today I wanted to share some of the evidence on why Cycle Time and the four key metrics really matter for software engineering teams. Now, in my last video, I covered what these four key metrics are, but just to quickly recap them I screenshot a slide here from the State of DevOps report for 2019, which lists out what these four key metrics are, these four key North Star indicators for DevOps best practice for how your team is doing at software delivery. You've got the deployment frequency there. You've got the lead time as well, sometimes referred to as Cycle Time. You've got the change failure rate there as well. and mean time to restore.
Another slide, I captured here has on the left-hand side, it's got a brief summary of each of the descriptions, but it also really splits out the different groupings of teams by how they do at software delivery performance. You can see the elite teams, they're able to deploy on demand. Their Cycle Time is less than one day. Their MTTR, mean time to restore, is less than one hour. Change failure rate is in that zero to 15% range. Typically at Haystack, we see that around the 4% mark with the elite teams. Then you can see that broken out as well for the high performance, medium performance and the low performance. You can really see how the Cycle Time in particular, that lead time for changes, how that varies throughout those four columns. It's less than one day for the elite teams. For the high performance teams, it goes down to that one day and one week. Medium teams one week and one month and the low performing teams is really between that one month and six months mark.
The first bit of evidence as to why these metrics are important that I wanted to share is a quick excerpt from a talk given by Martin Fowler as to where he's really discussing these metrics and he talks about the mechanics of how they work within your software engineering team. Martin is a very credible leader within the software architecture community. He is a signatory to the original Agile Manifesto and acts as a chief scientist at ThoughtWorks. I think in this clip, he really summarizes the value of these metrics.
Now, those two first two, the deployment frequency and the deployment lead time, these are really quite interesting because they boil down to a very important measure that many people like me have been talking about for a while, which is that of Cycle Time.
Cycle Time is how, why is that period of time between somebody in the business world or in the domain that the software is supporting having an idea of something they would like to see in software and how long does it take from that idea to form and then go into software running in production. The faster you can cycle ideas into production that gives you an advantage in experimenting with things and being able to try things out. What their study indicated is that that Cycle Time had a direct correlation with how good overall an organization's performance was.
The other factor that I brought up is mean time to recovery, and that's actually quite interesting in itself. There's a phrase that goes out there and comes from John Allspaw about mean time to recovery is more important than mean time between failures for most failures. The point being that by putting stuff into production, it's more important that you can realize that something's gone wrong and fix it, than it is to prevent the failures in the first place.
That's not something that many organizations really understand. They put a lot of effort into trying to prevent things go wrong. But one of the things that we're learning is that it's more important to detect and fix failures than it is to prevent that at least the most of the time, at least the non- catastrophic cases. That's an important part of the picture is being able to recover fast, but let's go back onto this Cycle Time.
Cycle Time, as I've said, is a very key measure. The reason I assert it's important is because it allows us to get early information about whether our ideas are worth pursuing or not.
In that clip there, Martin Fowler, who's a very credible individual, was able to talk through some of the mechanics of how the four key metrics really help drive business performance. That said, it is still some distance away from the empirical data that these metrics really have an impact that they're correlated to organizational performance. Fortunately, the excellent team of researchers who worked on the State of DevOps Reports and the Accelerate book led by Nicole Forsgren have been able to really look at this and study how DevOps performance correlates to organizational performance. Nicole has summarized some of this work in an excellent slide in this talk she gave on the key to high performance, what the data says.
Also, DevOps is good for organizations. Here's what we find. High performers' year-over-year are twice as likely to achieve or exceed their commercial goals again, profitability, productivity, and market share. I like money and your organizations, anyone here works for an organization that likes money? Profit-driven, right? This speaks to us, but it's interesting. After a few years, some people came back to us and they said, "I work for a not-for-profit. I work for government. I work for some educational institutions," or, "I worked for a for-profit institution that has broader goals." We also care about other things, other things matter to us. What about non-commercial goals, even if I'm in the for-profit sector?
In the 2017 study, we added additional measures like non-commercial goals and the crazy thing is we find the same 2X multiplier. High performing organizations are twice as likely to achieve or exceed non-commercial goals: effectiveness, efficiency, customer satisfaction. By the way, the measures for these are drawn from the academic literature, highly, highly rigorous measures.
Okay. For bonus points, in the 2014 study, we had a large enough dataset that I could actually pull from stock market data. We find that high performers had a 50% higher market cap growth over the previous three years. This is great. Okay. The ability to develop and deliver software technology is driving organizational performance.
There we have it. I particularly find the correlation data between the Accelerate metrics and the stock market data particularly powerful because of course the ultimate goal of any business is making money. The fact that it's really quite clear between how technology teams in an organization performing against these four key metrics really correlate to that stock market performance shows just how goal oriented these metrics are.
For further information on Cycle Time in particular, which is perhaps the most powerful of these four key metrics, I've linked to a knowledge base article on the Haystack support site, which goes through in a bit more detail digging into how Cycle Time is put together and how it's formed. That should hopefully give you some insights and help you as you're looking through the data in the Haystack dashboard.
Thank you for taking the time. Any questions, feel free to ping them to me on the Haystack Slack community. If you haven't already signed up for that I'll add a link in the description of this video, but thank you so much for taking the time and I will speak to you later. Bye-bye.