Crowdstrike… quis custodiet ipsos custodes?

The Roman satirist Juvenal included the line ‘quis custodiet ipsos custodes?’ in a poem (Satire VI, lines 347–348), which translates as ‘who guards the guardians?’ While this has been used in recent times to criticise dictatorial governments, it is equally a question we should ask about anti-virus software companies.
Last week Crowdstrike posted a regular update for its enterprise customers’ IT staff to install on around 8.5 million* (according to Microsoft) Windows PCs and client devices. Unfortunately a bug in the software, described by Crowdstrike as a ‘defect found in a single content update for Windows hosts’, caused all of these machines to crash, leaving their users with a ‘blue screen of death’. Repairing a machine requires it to be started in safe mode, the removal of the update and the installation of a fixed version of the software. Although the problem is relatively simple to fix on a single machine, it is a major challenge for those IT staff who often are responsible for thousands of machines used in multiple locations in a combination of stand-alone and virtual PCs running under Microsoft’s Hyper-V hypervisor on premise or cloud-based servers.
This unexpected event continues to create significant disruption around the world; preventing airlines, rail and bus companies from processing passengers, hospitals from managing patient appointments and many other sectors from working as normal. Recovery time will stretch from hours, to days, even to months for some and is having major financial implications, which might even of them put some out of business. It reminds me of the feared-for affects of the millennium bug, when year codes recorded in databases and spreadsheets as 2 digits were predicted to cause system failures with the onset of the year 2000.
In our industry we are well accustomed to the aftermath and consequences of major cyber attacks, which increase in severity as our organizations become more highly digitized. However it is seldom the guardian that creates the problems. McAfee was the last anti-virus supplier to fail in an equivalent way. In April 2010 it released buggy software which, despite being available for download for only 4 hours, nevertheless affected millions of enterprise and home PCs. McAfee claimed at the time that only 0.5% of its customers suffered the failure. I wonder what role, if any, this event had in the company’s acquisition by Intel for $7.68 billion just 4 months later.
Obviously there are some major questions over Crowdstirke’s software testing processes, but it has behaved impeccably since the mistake – quickly informing customers and quickly posting an update that works. Users have also been helped by the legal requirement to disclose events of this sort in an increasing number of countries, which is another reason why we found out about the problem quickly. While Microsoft has launched advice and software for affected customers of its Azure cloud, the headache is affecting users of almost all cloud service providers, market shares for which I show in my Figure above.
In the next few months some customers will take legal action against Crowdstrike for the financial consequences of this event; many more will re-introduce internal software testing, having experienced the consequences of relying on their supplier to deliver working code. It is essential for Crowdstrike to re-establish trust not only with its enterprise end-users, but also among its cloud services partners. I wish them well; after all anyone can make a mistake. It’s just unfortunate that this one is affecting so many organizations.
*Notes: *8.5 million affected workstations is equivalent to around 0.9% of the total installed base of PCs, or 1.3% of those installed in enterprises, as opposed to homes.