Facebook Outage Increased Developer Throughput by 32%

Dr Junade Ali

October 5, 2021

Yesterday (Monday, 4th October 2021), Facebook saw outages which took services including Facebook, Instagram, Messenger and WhatsApp offline. At Haystack, we took a look at our data to see what impact this had on developer throughput (number of Pull Requests merged).

We calculated a baseline based on the averages of the three previous Monday’s prior to the outage and compared this to our baseline. For the entire day, we saw 32% higher developer throughput.

Timeline

According to DownDetector, the Facebook outage started at 15:24 UTC. At 22:46 UTC, Facebook’s CTO tweeted that services were coming back online but "may take some time to get to 100%". The incident was largely resolved around midnight.

Throughout the day, Haystack saw that developer throughput continued to follow the baseline. This changed significantly after 21:00 UTC.

Whilst it’s typical for us to see an increase in throughput at this time on Mondays, the growth was far more substantial than usual. Between 21:00 UTC and midnight, we saw roughly a 2.6x increase in the number of Pull Requests being merged. For context, midnight UTC aligns with 17:00 Pacific time (where many Haystack customers are located).

Causes

Whilst developer throughput went up, we saw that the lead time (time from first commit to PR merged) increased dramatically for these Pull Requests.

This indicates that the real reason for this increase is more that developers utilised the extra time at the end of their day to do some housekeeping of old Pull Requests, closing off old long-running Pull Requests.

Indeed, Haystack as a product offers development teams alerts about long-running Pull Requests (such as those already reviewed and waiting for merge). Rather than seeing any dramatic increase in programming productivity, we saw developers taking care of their housekeeping.

As Kan, our CTO, explained to me this morning: “Facebook going down made developers clean their backyard”.

Not a Reason for Micromanagement

As a developer analytics tool, Haystack is careful not to enable micromanagement. Unlike our competitors, we do not compare engineers.

Research of software engineering teams has continuously shown that micromanagement is harmful to team effectiveness and psychological safety is essential to improving productivity, addressing software reliability and preventing burnout.

Indeed, the fact we didn’t see a substantial decrease in productivity during the interval when the incident entered discourse on other social media platforms shows that developers are less prone to distraction from productive work than we might think.

The key to encouraging sustained developer productivity rests in building a flow-focussed developer experience, where manual process is replaced with automation and tooling. When developers are unblocked from inefficient processes, bureaucracy and technical debt; work is able to flow fast without compromising reliability or causing burnout. This is why more technology organisations are building EngProd teams to focus on removing these blockers.

Conclusion

Haystack data shows an increase in developer throughput at the end of the working day as the Facebook outage was ongoing. Between 21:00 UTC and midnight (14:00 PST to 17:00 PST), we saw roughly a 2.6x increase in the number of Pull Requests being merged. This resulted in the entire day seeing a 32% increase in developer throughput.

The bottom line here is that the Facebook outage gave developers some extra time to do housekeeping and close off long-running Pull Requests. This wasn’t a boost to programmer productivity as such, but utilising time at the end of a day to do some clean-up.

‍I am hugely grateful to Kan Yilmaz, CTO of Haystack Analytics, for his support early this morning to collect the data necessary for this blog post.