Blogs

Discussing Quality in Engineering Teams with Kan Yilmaz - Engineering Insights Podcast Ep. 5

Junade Ali
May 12, 2021
  • toc-h2
  • toc-h3
  • toc-h4
  • toc-active

Summary

In this episode of the Engineering Insights podcast, Junade Ali discusses how software engineering leaders can measure code quality. The discussion covers a number of interesting areas; including metrics, tracking bugs, technical debt, code review process, development environments.

Listen

Listen on anchor.fm.

Resources

Video Clips

Developer Satisfaction and Retaining Software Engineers

Tracking Software Quality with Bug Reports and Change Failure Rate Metrics

Continuous Improvement of Software Engineering Process in LEAN/Agile Teams

Transcript

Junade Ali:
Hello, and welcome to another episode of the Engineering Insights podcast presented by Haystack Analytics. I'm your host Junade Ali. In today's episode of the podcast, I sat down and spent some time talking with Kan Yilmaz, who is a Co-Founder and the CTO of Haystack Analytics. We covered a number of really interesting areas which are relevant to software engineering teams, particularly in high growth companies. But the central theme which really ran through our discussion was really around measuring quality in the software engineering teams. So whilst there's a lot of different areas of discussion here, and I'll likely use some of the clips for other pieces of content. I thought it'd be really useful to have the entire conversation here, as our discussion proved really fruitful. And there's lots of really interesting areas we discussed. Enjoy.

Junade Ali:
Okay, so I was wondering, just to kick things off, if you could tell me a bit about you, and how you found yourself as a co founder of Haystack?

Kan Yilmaz:
Certainly. So I used to work at Cloudflare. And at Cloudflare, we were a small team, a team of four people between 500 engineers, and we were able to deliver 10% of the company's revenue with only one person on the engineering team. So how do we do this? To be able to do this, we actually moved really fast, the iteration cycle was extremely fast and we made sure that the customers were really happy. We build the correct thing at the correct time. That learning or that experience allowed me to see how can we actually make engineering processes better? Because most teams are not effective. I believe all teams can be effective, and that is what makes me more interested in this whole developer experience, tooling and all this, what do you call it? All this sector.

Kan Yilmaz:
In that sense, Haystack is a tool which helps developers to be more effective, the whole engineering organization to be more effective, delivering value to customers much, much more faster, with a much, much more higher quality. And I want to make all teams to be able to be a super or an elite performer.

Junade Ali:
Awesome. So I guess there's kind of two different dimensions, you've measured that, or you're talking about what we do at Haystack. There's one angle, which is really around driving up speed. And then, I guess there's another angle, which is about quality. How does Haystack really go about doing this? How do you drive these improvements to engineering efficiency and developer experience for our customers? What is it that we offer to help them do that?

Kan Yilmaz:
So developer efficiency is effectively, how can you understand if someone is getting better, or as some team is getting better or worse? Does the product iterates faster or slower? How can you know this? There's gut feeling, but unless you have concrete measurements, you will never have iteration cycle, you will never be as fast as someone who actually tracks the correct metrics, improves those metrics, and make sure that the value delivered is much, much more higher than a non metric delivered.

Junade Ali:
Excellent. I guess we can talk a little bit about some of these different metrics. So I guess there's one of the, in terms of the North Star metrics we use to really understand how an engineering team is performing. We're often called the Four Key metrics, the term coined by the Accelerate book. We've got Cycle Time, change lead time. We've also got things like throughput, which we measure against, as well, which is kind of outside the scope of those four key metrics. But it's about volume. And then I think on the speed side as well, we've also got the number of deployments, which are there as well.

Junade Ali:
So those are, I guess, what we'd consider speed metrics. But complementing those, we've also got quality metrics. We've also got things like Change Failure Rate. And we've got things as part of the four key metrics Mean Time To Recovery. So those four key metrics which a term coined in the Accelerate book by Nicole Forsgren, Jez Humble, and Gene Kim, Cycle Time, number of deployments, Change Failure Rates, Mean Time To Recovery. What does those metrics really symbolize and help us measure? And how do they integrate with one another, really?

Kan Yilmaz:
So the way that I can describe this, the best way is the best performing teams, they're both really fast. And they have high quality of outputs. The worst teams, they're slow, and they have bad quality. Normally, what we assume is, okay, if I'm going to make something quality, I need to compromise on speed. That is not the case, the best team can achieve both of them. And these metrics, they're lagging metrics on revenue. So if you actually have these four metrics in an optimal place, then you can assure that yourself, your revenue will be higher, if you improve on these metrics, it will even go higher. That's it. There's like a correlation there. And that's how we know how to use these metrics. There's, I believe, around I would say 31,000 professionals has been researched to find these metrics. So this is not random, someone made it up. These are all backed by correlation based on revenue of those companies. And the better these metrics get, the higher your revenue will get.

Kan Yilmaz:
That's that's how we can ensure that engineering is a successful team and you're getting better or worse. If we have to go a bit more deeper, I actually want to go more into quality rather than speed. Because in quality, there's quite a few misunderstandings on what's going on. I see a lot of people try to track number of bugs, they try to track something like test coverage as high level metric. I'm going to give an example, if test coverage is a high level metric, it was let's say 80%, you increase it to 100%, can you say that your revenue will be higher? Actually, I would disagree. Most likely your revenue will be lower. You're spending way too much time on writing tests and not delivering value to customers. The high level metric is really, really important there. And there is a big misunderstanding in the current ecosystem of engineers.

Kan Yilmaz:
So I want to divide this into a bit more define what actually quality means. The definition of it is it ensures a product or service that their honest ability to satisfy given needs. So if there is an expectation and it is not met, that means that is not quality. Coming back to test coverage, how does that affect any customer? Okay, there's like some correlation there, but what's the optimal number? Is it 60? Is it 40? Is it 100%? We don't know. And it depends on team. What should be the top level metrics? Change Failure Rate is one of the North Star metrics which is mentioned in state of DevOps and Accelerate book, it is a really good metric. If we define on quality into two separate parts, I would put one quality as the product quality. And the second as the engineering process quality. Change Failure Rate represents both of them. So how does it do this?

Kan Yilmaz:
Imagine we did it release and this release caused an incident. We need to roll back now. Okay, the customers immediately felt that, there is an incident, there's downtime. That's obvious. There's product quality Change Failure Rate captures this in a really high manner. How does it capture engineering process quality? Let's say the same issue, you deployed a pull request, you did it on release, the release failed. You had downtime, you reverted back. But you didn't revert that pull request because you don't know which pull request actually caused that. There might be 500 pull requests inside that release, there might be five, you don't know. The current trunk branch is currently unusable. So the whole pipeline of engineering process is taught, nobody can deliver value now. And change filter, it actually captures this.

Kan Yilmaz:
You can also have slight improvements on Change Failure Rate, such as meantime to recovery on the trunk branch itself. That also provides you insights on the engineering process quality. But coming back to high level, again, we said product quality and engineering process quality. Okay, this is great. Just to give a bit more information on Change Failure Rate, according to state of DevOps, zero to 15% is the number that a person should achieve. And this percentage number is number of hot fixes divided by number of deployments. And it can actually be higher than 100%, which is quite interesting. I have seen that on quite a few teams. Our goal is to make sure that it's between zero to 15%. That's a good metric. There's stability metrics, such as Mean Time To Recovery, but I'm not going to go over there, I'm going to stay in this Change Failure Rate.

Kan Yilmaz:
If you have zero to 15%, you notice your current, let's say, Change Failure Rate is 22%. You improve it in the next three months into 14%. That's great, you know you're getting better, you know that you are giving value to your customers regarding quality. But where do I focus? How can I make this number better? How can I dive deeper? I don't get that information from Change Failure Rate. The way that you can get this is actually a few leading metrics can represent this. Leading metrics are metrics that are not high level, they are correlated with improving one other metric, but they don't directly correlate to revenue. So in the example that I gave previously, which was test coverage, if your test coverage was 40% and you increased 60%, then you saw that your Change Failure Rate decreased from 20% to 15%. You know that, okay, test coverage was actually good. But then you can actually experiment, let's make it 80%, 100%. Change Failure Rate might decrease, but you can actually see a compromise on speed, this time.

Kan Yilmaz:
Your Cycle Time might actually triple. That's a really bad thing. So you need to make sure you're measuring quality and speed together and finding the optimal numbers that works for you. But test coverage is not a high level metric. It is a leading metric on top of Cycle Time and Change Failure Rate. Coming back to focusing on, how can I understand what is causing my product quality or my engineering process quality suffer? Actually, number of bugs is a really good metric. It's not including accelerate, there is a really good reason for it. Almost nobody tracks it correctly. It's one of the hardest metrics to track. Imagine you're tracking number of bugs, let's say you're tracking in GitHub. Okay, an engineer randomly goes, there's like a place nobody reads, no customer cares about. And there's a typo. They tag it as bug, they put it there. Okay, that's like, basically almost no value, but it increases the number. And you don't actually know if that is a signal or a noise.

Kan Yilmaz:
That's another good way to track. You can do the same thing on JIRA, the product managers do this. They test themselves, they create bugs, which are not used by customers, because the customer doesn't even use that product. Not because of the quality. It's just the product doesn't give value. But that number of bug that you just labeled, it doesn't give signal regarding how good you are with your application. So we need to go one step further. How can we track number of bugs? First, let's go into the original definition of quality; service that bears on ability to satisfy given needs. So if the customer is trying to do some action, and they cannot, then they file a bug report, then you know that this customer did not get what they wanted. That is a really high signal book. And that's what we need to track. Bugs needs to be tracked by customers, not by internal product managers, or by engineers themselves.

Kan Yilmaz:
Of course, there will be some issues like you didn't do a release it and you know there is a bug. Don't take that as a bug. That's not something that you should track. That's inside that feature request that you're actually building or the original issue that you're trying to solve. It's not an external coming bug. And that's not something that we want to measure. We want to measure only customer... what do you call it? Progressed or submitted bugs. That gives a good thing. Okay, now we know which bugs are signal which bugs or not. But I still don't know, how can I actually take action because a typo that a customer represented is a bug. But also a bug, which a future doesn't work completely is also a bug. They're not equal. How can we understand this? So there is actually two dimensions that we need to make sure when we track bugs, one of them is priority. How important is this bug? One of them is maybe incident downtime or system. The other one's a typo.

Kan Yilmaz:
Typos are like, okay, who cares? But the incidence is really important. So labeling each bug as P1, P2, P3, and so on, gives you ability to see what kind of bugs are being produced. The second thing is even if there is P1, P2, P3, how do I make sure that this doesn't happen in the future? You need to look into specific teams. Each team has different process. It's both engineering side, management side, and so on. They build different products as well. The second dimension is which team is responsible for that bug? So if there's a bug on front end, why would the back end get any kind of punishment for that, or any kind of results for that? It's not their KPI, it's the KPI of that specific team. Front end team needs to make sure that their number of bugs should be low, not the back end team. Back end team needs to make sure of their bugs but not of front end team's bug. They're completely separate.

Kan Yilmaz:
If we go into this coming back to high level; making sure that the bugs are submitted by the customers, and tracking with two other dimensions, which is priority of a bug, and the teams who are producing bug. Then suddenly, you have some kind of metrics such as this. We have P1 bugs as 50. And it increased to 60 in the past quarter. What's happening? Now we dive deeper, we go into the teams. Okay, front end team had the bug and their bug count increase from 10 to 20 P1s. But the back end team has only two P1 bugs, for example. You know that the backend team doesn't need any actions, but the front end teams needs more resources. This might be technical debt, this might be test coverage, this might be some other problem. There are lots of different things that you can do to fix this, but you know where it is. And once you talk to engineers, you can immediately take action and see what's going on.

Kan Yilmaz:
In high level resource allocation, this number of bugs is an amazing metric to use. If I actually have to put in the spectrum of how much quality does Change Failure Rate and number of bugs actually represent, I would say Change Failure Rate represents only the last bit. It represents like product quality and engineering process quality in a really good manner. But it doesn't have the actionability, and it doesn't do fine grained control, such as number of bugs. Number of bugs encapsulates more of that spectrum. You can capture more places in the whole process, both engineering and product, but it's really hard to track. You need to have a system which works with support, product management and engineering. And you need to make sure that none bug is not tracked. So you need to have a one to end bug tracking. What do you call it? A bug tracking system, where each bug is tagged to at least one JIRA ticket.

Kan Yilmaz:
That gives you a good understanding of what's going on. And now you can track what is your engineering team's or engineering organization's quality.

Junade Ali:
Awesome. Yeah, that sounds sounds excellent. So I think that there's a few different things to unpack there. I kind of want to drill more into I guess the number of bugs measurement. It's actually particularly interesting to me, because when we were both at Cloudflare, one thing I ended up doing was running the support operations team. That area of metrics was something that was very, very relevant in my day-to-day. But I guess, I guess, to summarize, when we look at kind of these North Star metrics, these indicators, we have the evidence that these four key metrics, that they are linked to both... I mean, we know that the businesses which do well against these metrics are twice as well to succeed in both their commercial goals, and then on commercial goals, things like customer satisfaction, the market shares. These types of metrics, they're likely to do well.

Junade Ali:
I mean, one of the really remarkable metrics I heard from a presentation that Nicole Forsgren gave was that when you correlate the performance against these four key metrics to stock market data, you find the companies which do well against these four key metrics tend to have a 50% higher market cap, which is just absolutely remarkable. So I guess coming back into measuring qualities, one of the metrics you spoke about was Change Failure Rate. Change Failure Rate is what we're talking about, there is a number of deployments that needed some form of intervention whether it needed to be rollback hot fix, over the total number of deployments. So let's say, one in every 100 deployments needed to be rolled back, we would say the Change Failure Rate was 1%.

Junade Ali:
Number of bugs, I guess, is the metric we're talking about, which is, when things go through a customer support ticketing system, when a user file something, we record them, and we flag them that way. And then finally, we've also got things like Mean Time To Recovery, which is how long it actually takes to resolve a bug. Is that a fair summary of the key quality metrics?

Kan Yilmaz:
I would say it is a fair summary. One thing I want to add is tracking metrics is hard. You need to track it accurately. That's one of the hardest things in the whole... It can be your product manager, you're tracking your daily active users, even that is hard. If you are marketers, you want to track the funnel, that's hard. The accuracy of the metrics is really hard. And it takes time to build this infrastructure. One thing that I want to say is Change Failure Rate, it's really simple to track, it's not that hard, you just need to tag a few either automated or manual pull requests issues, JIRA issues, or so on. But once we go into number of bugs, you actually need to build quite a bit infrastructure, you need to change the process of support team, change the process of a product and engineering in the sense that they can make this actionable.

Kan Yilmaz:
It's a hard metric to track. That's why it's not being used widely across the ecosystem. Not nobody, but a lot of organizations are having troubles tracking number of bugs, and the accuracy is quite low. That's why I would recommend starting, if you're just starting right now, the first metric that you should track regarding quality is Change Failure Rate. Once you're more mature, once you want to get more fine grained control, you have a custom developer experience team. Okay, now I can actually implement new processes, make sure that my organization can capture more of the spectrum of the quality, and then we can actually track number of bugs.

Junade Ali:
Excellent. And I think, from my own experience, I think that there tends to be a few different challenges that people come up against when they want to track number of bugs. So suppose the customer support team is there, and you have people writing in, and those people are basically able to write in and they're able to report bugs. The first thing is, I guess there is many-to-one relationships that often exists between customers actually writing in about a specific issue, and then linking them up to specific support tickets. A lot of people I think, tend to struggle with building that infrastructure. And then there's also I guess, the attribution component as well as actually attributing which team is responsible for which type of bug and making sure there's appropriate systems to basically handle that.

Junade Ali:
So I guess there's kind of those two key kind of issues there when you're tracking number of bugs. The other thing which really comes to mind as well, for me is kind of the taxonomy of how you actually internally make sure those things are locked without overwhelming a support team, without putting them under lots of stress to put fine grained metrics, which they're needed, the more fine grained they are, the more inaccurate they become. So I was just wondering if you had any thoughts on how companies can go about introducing this type of system in a fairly minimalist way to start with, before scaling it up to something which is when that initial success has been demonstrated, what would you say are those first few kind of baby steps really?

Kan Yilmaz:
Like I said, the first step is actually not tracking number of bugs, the first step is tracking Change Failure Rate. Change Failure Rate is highly correlated with revenue of the organization or the whole organization. But if we go back to number of bugs, it will take time. The first step is support team, you need to ensure that every single support ticket is tagged with two dimensions, the priority or team. Once you actually make sure that the support team has a process, then you can actually move forward and take action on top of it. Regarding this, I wouldn't immediately go and change the whole support process. There is a few things which are hard, changing processes takes time, you need to educate quite a few users. So probably testing on small sections might be a better step just to see how it works.

Kan Yilmaz:
Small sections might be, for example, your support teams already probably have systems such as this bug is related to future. And it might be for example, your signup flow. The bug is related with signup flow, and you can... There's always some tags for it. I assume there's already some tags that you can be based on. You can track that signup flow and add something on top of it. Then you can link that to a JIRA ticket. Okay, we tagged the signup flow into a JIRA ticket. Every JIRA ticket, every support ticket that is linked, then we can slowly add more features such as priority, because the support team might not know the priority if it's P1 and P4. You need to discuss, what's the taxonomy? What's the process that needs to have an internal conversation on how do we actually make sure that which bug fits into which priority? Again, which bug fits into which team? You need some infrastructure to build this, it doesn't come immediately.

Kan Yilmaz:
To be able to do that, I would suggest first link every single support ticket with a JIRA ticket and make sure that it's at least tagged to a future. If it's not a T. Once you have those two information, you can take a bit more time, you can see that it will be inaccurate, because there'll be lots of low priority bugs that you don't care. Maybe there's like 80 bugs, which are P4 in your definition, but there's one P1. Does that mean your quality is really bad? No, that means you're efficient, you're ignoring the low impact task, and you're focused on the high impact one. That's actually really good. But if you don't track the priority, you have 81. What do you do? That's the problem.

Kan Yilmaz:
Number of bugs is quite hard to track. You need to go step by step, then see if it's working, can we actually improve this process? Do people get it? And slowly increase this process so that it encapsulates all of it. After it encapsulates customers priority and team, then you can actually use it as a North Star metric. But before that, I would not recommend it using as a North Star metric, it would mislead you, it won't represent the real quality of your business. Use Change Failure Rate much, much more better in that sense.

Junade Ali:
Awesome. And I guess in terms of that priority dimension, I mean, that ultimately consists of a few different variables, right? It's not just priority can be different things to different people. For a customer success team, it could mean being able to retain big enterprise customers. For a support team, it could mean actually how do we decrease our costs so the business can spend more money actually marketing itself rather than having to spend all this money with a support team, doing manual actions? It can mean I guess, a lot of different things to a lot of different people. So I guess finding out how you measure that priority dimension, in itself is like a can of worms almost.

Kan Yilmaz:
Exactly. Actually, if I don't have to put more... What are the pitfalls that you will see? Like you said, one of the pitfalls, "How do I categorize priority?" That's such a hard task. The second thing is, how do I make sure that the bug went to the correct team? Of course, you can fix it later once the team gets the product manager, "This is not my team, this is someone else's team." The support team might not have the correct information to correctly tag it. There'll be a lot of conversations, [inaudible 00:30:25] is one of the biggest ones. There is also another one that will be a big bottleneck, which is... In Eliyahu M. Goldratt, The Goal book, he mentions that a team should end to end on a single metric.

Kan Yilmaz:
Number of bugs, it slightly goes different in that sense. The tracking is not done by you, the tracking is done by a different team. It's the support team. But the measurement is actually impactful for the engineering team. And suddenly there's this, "Wait a minute, this shouldn't be tagged as bug, it should be tagged as..." There's going to be some conversation going on there. There's going to be some, "Okay, how do we use this metric in an effective way?" You need to teach all engineers and there will be conversations which will take time for the organization to understand what works for yourself. And these are a few bottlenecks of tracking number of bugs. That's why it's not used commonly, only a few teams can actually successfully do this in a correct manner. And passing these bottlenecks or passing these pitfalls, that's a better way to put it, passing these pitfalls takes time, and you need to invest quite a bit energy into it, you need to be ready to invest in that.

Kan Yilmaz:
But once you do get it, you have a really actionable quality metric. And now you can put your resources much, much more better. You can know if you need to fix technical debt or not. And all these small questions, you will have an answer for most of them just by measuring number of bugs accurately.

Junade Ali:
Awesome. So I guess I'd really want to drill in into some of the leading indicators. But before we get into that area, I think one of the things which I find fascinating, some things I probably haven't spoken actually about before is really one of the things I've noticed when I was in the support operations world is this concept of a cycle is also really relevant there. I mean, actually, one of the things I looked at was the strongest correlations. Basically, I saw that customer satisfaction was basically inversely correlated to median full resolution time. So the shorter the time it would take to resolve a customer issue, the higher the satisfaction would be, and there are limiting factors to this, some issues where they are resolved very, very quickly. There are other things which take part, but especially when you're looking at things where the customer ends up spending a significant time waiting, the indicator tends to be that immediate, full resolution time.

Junade Ali:
And I guess that ties in with a lot of concepts we see in the engineering world as well, we see the importance of Cycle Time, being able to iterate quickly. We also see things like MTTR, Mean Time To Recovery. And of course, Change Failure Rate itself is very much about the number of rollbacks or number of hot fixes over the total number of deployments. I think one of the things which comes very importantly into a lot of companies, which are doing agile software development or trying to adopt lean practices, you mentioned The Goal, seems to be this importance on really getting that batch size down, being able to adopt the processes whereby they're able to iterate very, very quickly. And a key part of that seems to be decreasing the individual cycle of a piece of work they're doing.

Kan Yilmaz:
Yeah, I definitely agree with that. I would like to just mention that these notes are metrics, they're not random. They actually encapsulate a lot of different parameters inside a single metric. Just like revenue. Revenue captures value to customer in a lot of different ways. It's from all the way to marketing, to growth, to sales, to engineering. Everything is included inside a single metric revenue. The same thing is happening in Cycle Time. Just like you said, mean time to resolution for support is such a great metric for customer satisfaction. Cycle Time represents the same thing included inside of this metric. It includes mean time to resolution, if you can actually deliver value faster, you can deliver these bug fixes really fast or feature requests really fast to the customers. Their satisfaction will be better.

Kan Yilmaz:
And all of these metrics that I mentioned, or we measured together, which is Cycle Time, deployment frequency, Change Failure Rate, Mean Time To Recovery, number of bugs, again, goes on ties with change filter, all of them encapsulates so many different parameters. That's why it's a good metric and that's why it corresponds to a correlate to the organization's revenue in a really good manner.

Junade Ali:
Definitely, for sure. I think the next area I really wanted to dive into in this is that there were often metrics, engineering teams look at, what we'd call leading indicators, things like test coverage, build failures, build times, et cetera, which often, I guess, they can be double edged sword, right? They can be single things which are very useful when you're drilling into why North Star metrics aren't looking right. But also they can be if you obsessively focused on local optimization here, they can be things which are dangerous as well. So what's your usual approach to looking at leading indicators in engineering teams? How do you go about recommending that engineering leaders take a look at those metrics?

Kan Yilmaz:
I would put metrics into... Metrics are a signal for you to be able to take action. So what you do is, okay, my Cycle Time is, let's say, five days, and my Change Failure Rate is, let's say, 40%. What you do is, okay, let's make sure that the Change Failure Rate decreases all the way to 15%. And I have a goal of decreasing my Cycle Time to three days. If you have this goal, what do I do now? You start to brainstorm, you look into the whole development process, you realize that, "Oh, we have a QA team. But they're taking so much time." So if I remove the QA team, I will ensure that Cycle Time will go to three days. But what will happen to Change Failure Rate? Will it actually be worse or better? Okay, let's see, we did that, we removed QA from that, then we need to take another action to make sure the quality actually matches that, we add more review processes. And we experiment with test coverage, we experiment with other leading metrics, which might impact to my goal, which is we will have less rollbacks, or hot fixes during this period of time.

Kan Yilmaz:
You identify the problem first, you have a goal for it, you dive deep, figure out which parts can actually affect. You will probably figure out more than 10 leading metrics, which will actually affect that metric, that high level metric, that North Star metric, then you go and try to fix each metric. You don't even have to do one by one. One by one is more scientifically [inaudible 00:38:07] methodology. But we're startups, we're all businesses we're competing, we can actually do multiple at the same time. Some of them might fail, some of them might be positive, but you will know in the end results if you're good or bad, because your North Star metric is your benchmark and you're measuring yourself against yourself. So the important part is be fast, iterate as fast as possible. And the leading metrics will change every time you try to do it. The first time, you might need to focus on build times, first on Cycle Time. The second time, you might need to get rid of a complete team. The third time, you might need to do something else.

Kan Yilmaz:
So you slowly go down, you figure out what the North Star metric you want to improve. Actually, I would say both speed and quality to go together, you select a goal with both of them included in your OKRs or quarterly goals and so on. It not one, you need to include both of them. Once you have a speed and a quality goal, you dive into what met... Where can the problem be? Is the problem the QA team? Is the problem the review process? Is the problem not enough tooling? Is the problem CI? You just figure it out. You go talk to engineers on which ones they seem to have issues with, you track if you can track. Some of them cannot be tracked, some of them can. You go ahead and then try to figure out where the problem is. If it can be tracked, that's great. You can immediately say the build time is six hours, let's decrease this to 20 minutes, 10 minutes, five minutes. That's a really great goal.

Kan Yilmaz:
Okay, there will be a huge team working on it, but the Cycle Time will improve drastically. It will take a long time maybe, but it will improve. So you will identify the problem and iterate one by one. If there is a bottleneck regarding these different leading metrics, I would recommend focus on the bottleneck. So it's better to actually measure all of them consistently. So all leading metrics have some benchmark, some stuff cannot be measured, for example, development environments, how long does it take for a single engineer to onboard to your system? Does it take three weeks because the environment is so complex? Or does it actually take less than a day. That's the thing that is not possible to measure on with the current tooling that we have. But you can still do measurements based on surveys or MPSs. All big companies do MPs surveys to their engineers to see the qualitative part of engineering process and have a benchmark. Then they focus on the biggest ones.

Kan Yilmaz:
Solve the biggest one during iteration, solve the biggest one during iteration. You just continue this flow multiple times, I can ensure that you will have a better North Star metric, if you iterate a few times based on your goals.

Junade Ali:
Awesome. So I guess there's in that process of continuous improvement, there's really a few different steps there, right? In the first instance, you have to be able to measure the North Stars, then you're able to drill into these leading indicators. And as you mentioned, there's kind of both qualitative sources of data as well as quantitative. So the quantitative things are nice, we can measure them, we can get a really good insight. But then also on the qualitative side, you actually have to go and talk to people, you have to be able to discuss with them as to where things are, you can also use things like NPS surveys, things like this.

Junade Ali:
I was wondering if we could talk about some of the trends, which you tend to see with the leading indicators as to we know, for example, with test coverage and potentially with build failures, that these things are... Low test coverage tends to be bad, build coverage tends to be... We generally want to get our build failures, we generally want to get down. We want to cut build times, we want to have good dev environments, we want to have good linting systems, we want to have good code quality, but obviously, there's always a balancing act with these various different factors. And some of these things tend to be issues that companies tend to see more often than others. So I was wondering if you could give us a feel as to what the problems generally tend to be.

Kan Yilmaz:
Regarding leading metrics, usually people have an assumption, but they think that's the truth. And mostly, they think this because they read from a blog post, or from some podcasts that they listen, this number needs to be X. That's not necessarily true. Maybe you don't even need to measure that. It completely differs based on what your team is. For example, if you're an iOS team, in Apple store, the fastest that you can do release is one week. Your Cycle Time can never be at this for the release cycle, the deployment frequency can never be more than a week, less than a week, you need to satisfy that. What does this bring? This brings out your quality needs to be maybe much, much more higher, so that you never have to do a rollback because it's so expensive to do a rollback. App store has all these processes, which blocks you to be fast.

Kan Yilmaz:
But if you're a web developer, you can do like every second, that's totally possible. So it depends on what team you're working on, what technology you're working on, who are you serving? A kernel engineering team is not the same as a front end engineering team. If you're building space rockets, that's not the same as building apps. So it all depends on what your team needs. Some teams will need really high quality, some teams will need really high speed, and they can actually sacrifice from quality for other reasons. So coming back to leading metrics. Okay, once we have this baseline of my team has a goal, which is we need to make sure the quality is really high because we do iOS apps. How can I focus on where to go? What are the bottlenecks on measuring it?

Kan Yilmaz:
If you go and try to measure, for example, failed build is a really good example. A lot of teams try to measure the total number of build fails, how long has it been failed, but they track it for all branches. If I'm an engineer, I'm writing a code, I have this branch, I did a git push. It failed because the test didn't pass. Oh, I see that it didn't pass. I fixed the test. Push it again. Okay, it passed now. Now I can merge. There was a failure in build there. But why should I track that? That doesn't provide any value to customers. Once you go into trunk branch, if the trunk branch is not building, good luck. The pipeline is completely blocked. Now, no engineer can do a release. And if an engineer tries to start on a new task, they will try to get pulled, it doesn't work. And they're confused, they're spending so much time on it, you need to track the correct leading metrics. Some of them are common sense.

Kan Yilmaz:
Some of them, they're straightforward. It's just obvious, it depends on what you track. I need to go into one by one all of them. So it's quite a big list, it's hundreds, I'll put it that way. There's so many things that we can improve upon. But eventually, what it boils down to is, you have assumptions, you start tracking with three leading metrics. You focus on the bottleneck, you fix that, then you look at the other two; I can't focus on these, but my gut feeling doesn't say they're correct. Let me track more leading metrics. Then you talk to engineers, you talk to other people in the organization, figure out where they actually have issues. Okay, they have issues with five other things that came up in those conversations. You start tracking them, you have a benchmark. Now, the bottleneck is between these eight new leading metrics, it's a different one, you start fixing that one, then iterate again. Then you need more leading metrics, you increase on flow.

Kan Yilmaz:
It's a process like this, the further you go along the company, the number of leading metrics will increase. Some, you will fix it, they will never become a problem again. Some of them, you fix it, two years later, you see it, it's again, a bottleneck, you'll need to fix it again. It all depends, but slowly start by... your gut feeling will basically allow you to track the correct metrics, then you can figure out which ones are the bottleneck, then you slowly increase this list. And eventually, you won't need to have gut feeling because everything is tracked, you have benchmarks on everything. And now you know where to focus on with quite high confidence. And that's where we want to be, we want to avoid gut feeling, and we want to move that into data-based confidence. And this is a methodology that I would recommend.

Junade Ali:
Awesome. And I think one of the really key concepts you mentioned there towards the start is really about this idea of balancing risk and reward in many ways. The idea that if you're able to deploy very quickly, if you're able to restore things, and there's very limited risk to things going wrong, I guess, the example is if you're watching, just watch the Netflix movie, and the rating systems at the end goes down for two minutes. That's far less of a problem than if you're deploying an app, which is broken, which can take many cycles, or something, which is far, far more mission critical.

Junade Ali:
I think one of the things that I guess we tend to see is that there's often some organizations get trapped into this area of vexatious quality, where they will focus too heavily on one particular element. For instance, you may have in a company, they'll focus on potentially limiting the Change Failure Rate as much as they can to get it to 0%. But they do that at the cost of then not being able to deploy quickly enough, not being able to restore things in the future, because they have that. Everything has an end to end test, everything will take a lot of time to be able to build and deploy, which ultimately harms them by affecting those metrics. So it seems like there's quite a heavy balancing act at play, and you really need to understand where your priorities are in order to drive that ultimate goal of driving up your revenue, while sourcing and certainly environments complying with your regulatory or ethical risks and so on as to how high your risk is.

Junade Ali:
So it seems really, really fundamental to the engineering process, that balance between risk and reward.

Kan Yilmaz:
Absolutely. This is actually going into more of a subjective concept because there is no objective truth. There is no process which works for all, even if you're comparing 10 different organizations and there's like 200 teams in each one of them. They're all in finance sector. Still, you cannot have a playbook which would work for all. You need to figure out what is your company's needs, what's your priorities, what's your mission? What's your values that you care about? You figure out based on these. They might eventually change in the beginning of the company. When you're a startup, you don't care too much about quality. You want to figure out what your product market fit is, you want to optimize Cycle Time as much as possible. You want to optimize deployment frequency. Whereas Once you actually hit product market fit, now you have something to lose. Now you have customers, now they're paying. And if you don't satisfy them, things won't go well.

Kan Yilmaz:
So depending on which sector you are, which type of company which how, how long have you been alive? Do you have product market fit or not? All of these are different parameters which will affect you. And you will need to make that decision yourself. There is no playbook, which will allow anybody. The only playbook that I can see is the North Star metrics allows your engineering organization to be much more effective. You will need to figure out what the numbers will be for your organization. You iterate them, improve upon them, your values might be updated, such as from a non product market fit company into a product market fit. You will change these metrics goals, but that is depending on each company, there is no playbook, you need to figure it out, you need to have discussions with your team, understand what you want to serve to your customers as a value. Coming back to the measurement of quality, that definition of quality is service that bears its ability to satisfy the customers needs.

Kan Yilmaz:
So it boils down to there, you need to make sure that your company gives value to your customers. And what works for you, you need to figure that out yourself.

Junade Ali:
Awesome. And I guess one of the last things I really want to ask about is this whole question of engineer well beings themselves. If an engineer constantly has to be worrying about things going down at 2:00 AM in the morning, when there wasn't a deployment, if they have to worry about issues occurring, if they have to, this also ties in with things like their development environment, if it takes a week for them to get set up with a development environment, if they can't just spin it down and spin it back up. And that that is what stresses them out, that's what keeps them up, that keeps them in the office late at night or distracts them from an hour in the sort of this whole work from home environment. It stops them being able to go and relax in the afternoons.

Junade Ali:
These things are also challenges to I guess, individual well being. They'll also ultimately affect the business, they ultimately roll up those North Star metrics, things like staff retention, things like that. So I'm just curious as to what you feel is the role of quality is in ensuring engineer wellbeing, ensuring developer satisfaction, these types of things?

Kan Yilmaz:
I believe there was, I'm trying to remember the name of it, but it was a report on why do engineers leave their company? What gives engineers satisfaction? And there are quite a few lists, I forgot the whole list on the top level ones as well. But it basically boils down into if you are a fast moving company, if you are delivering value to your customers, that gives quite a bit of satisfaction to the engineers themselves. But you need to make sure that you communicate all of these. You need to make sure that they are learning, they're improving, there are more opportunities. I will put a successful company usually has not six, but satisfied engineers. It's correlated. It's really interesting, but people would assume like in the premise where you work, you just work people and then they work hard. And then you build a permit and that's success.

Kan Yilmaz:
Making people suffer, making your employees suffer, or engineers suffer. And you can actually build a product on top of, which is grant. That's not the case, actually. The case is, if the engineers are happy, if they're satisfied with their life, if they're learning a lot, if they have challenges in their lives, if there isn't lots of politics discussion, so they can focus on their work. All of these actually affect your company's success. There are different kinds of measurements to do these. I would say a few measurements are, most of these are qualitative measurements. So they're not quantitative, you can't actually have a single metric which are represented. The qualitative metrics, again, it boils down into either one on ones. You can have lots of one on ones, have lots of quality metrics inside of it. Some of them have it, not all of them.

Kan Yilmaz:
You can do surveys to engineers periodically so that you can measure their satisfaction. If you ask an engineer, "Would you recommend this company to your friends?" If they say no, you know that they will eventually turn, they won't even recommend their friends. If you're working at an amazing company, you want all your friends to come here and you'll have such a great time, you want to share that experience with your friends. That's like how I would think regarding... I can go down, and boy, where the bottlenecks are. I will have assumptions. Okay, these are the top level, what you call lists. You can find these lists in, I believe it was State of Engineering 2020, I'm not sure if that's correct. But there's quite a big list, I think like 10 to 15 different questions. You can ask these questions to your engineers, then you can actually do other questioning or questions such as, "Is it easy to do a deployment? Is it easy to do a spin up a development environment? Do you actually are satisfied?"

Kan Yilmaz:
These are all different kinds of questions. One of them is affecting, like the engineering process that we taught, which is like North Star metrics that we talked, the four Accelerate metrics, or 4T metrics. The other one is completely health, engineering health, but both of them actually affect into the engineering satisfaction. That's how I will put it; make sure that your engineers are satisfied, they will produce better and the satisfaction is learning. They're being challenged, they are being listened inside their organization. They're not ignored, so they feel included within that organization, and they want to make this place a better place. That's what we want to achieve for. Once you hit that, although the metrics actually boil down there, it's not the premise, it's a happy environment, which will produce a really good and successful product.

Junade Ali:
Awesome. And I think that's a good place for us to wrap things up. I know it's getting late over there in Singapore, so I don't want to keep you away from being able to relax at the end of your day. I don't think that would be good for employee wellbeing. That'll be hard. So to wrap things up, I guess, where's a good place for people to find any additional resources, or to maybe interact with you if people are running an engineering team and they can really benefit from using Haystack?

Kan Yilmaz:
So I can recommend, if you haven't read The Goal, by Eliyahu Goldratt, read it. Secondly, if you haven't read Accelerate book, go ahead and read that, understand why these metrics are important and how to track them. What are the correlations and so on. All the research is included in the Accelerate book. State of DevOps is a sequel of [inaudible 00:58:01] you can think like that, it has every year, they do the same metrics over the population to see where the trends are going. You can check that out. It's quite interesting. So that boils down into [inaudible 00:58:12] books. That boils down into external...

Kan Yilmaz:
There's also, on Haystack blog posts, which we've constantly write about these, feel free to check them out. They're a really good podcast that Junade has done. I'll definitely recommend other podcasts of Junade. And on top of it, I would also recommend checking, I think it was State of Engineering. I'm not sure about the naming, we can put a link I believe in this podcast so people will be able to find it in a much, much more better sense. And if you have any questions, feel free to email at kan@usehaystack.io. That is KAN@usehaystack.io. So feel free to email me, I'll be happy to help you out. So yeah, thanks for having me here, Junade. This was a wonderful conversation.

Junade Ali:
Thank you for joining me for another episode of the Engineering Insights podcast. This podcast has been recorded and produced in Edinburgh, Scotland. I've been joined remotely by Kan Yilmaz, who's in Singapore. The soundtrack used in this podcast has been Werq by Kevin MacLeod. That's spelled W-E-R-Q. Find more details of how Haystack can help your engineering team at usehaystack.io.

About the author

Our latest news

Discover our latest articles, feature releases, and more!

Ready to get started?

Start Free Trial