The CrowdStrike butterfly effect

July 24, 2024
Posted by: Evans Asare

The CrowdStrike butterfly effect: cyber pros weigh in on the far-reaching disaster. A few hours without flights, medical procedures, or financial transactions. A day or two without less critical services. And a week or two of work restoring every affected system. Yet the ripples caused by CrowdStrike’s calamitous software update will be felt far beyond these short term disruptions, cybersecurity experts believe.

Cybernews received dozens of insights from leading cybersecurity professionals worldwide. Here are 20 takeaways that paint a sobering picture of our digital dependencies that we think are worth sharing.

On the incident scale

1. “This outage has hit almost everyone. Taxpayer-funded services like airports, hospitals, and schools have been heavily impacted around the world. A huge amount of business has been lost due to downtimes. The amount of money this outage will cost is mind-boggling – we do not know the full extent of the impacts just yet, but this is probably the most costly IT outage in history,” said Alexander Linton, director at Session, the end-to-end encrypted messenger.

2. “That was kind of an apocalypse. That is why there are no ready-made recovery procedures or a fail-safe mechanism,” said Victor Zyamzin, chief business officer at Qrator Labs.

3. “One way to view this is like a large-scale ransomware attack. I’ve talked to several CISOs and CSOs who are considering triggering restore-from-backup protocols instead of manually booting each computer into safe mode, finding the offending CrowdStrike file, deleting it, and rebooting into normal Windows. Companies that haven’t invested in rapid backup solutions are stuck in a catch-22,” said Eric O’Neill, cybersecurity expert, former FBI Counterterrorism & Counterintelligence operative, attorney, and founder of The Georgetown Group and Nexasure AI.

On the root causes

4. “Single points of failure deeply permeate our current internet infrastructure, and hospitals, companies, and the traditional financial system sit on top of a house of cards that can easily collapse,” said Yannik Schrade, CEO and Co-founder of Arcium. “The biggest lesson for us from this global outrage is to not trust but verify. Trust should not be part of the equation with systems this important for healthcare, finance, and infrastructure.”

5. “Unfortunately, it won’t get better. More and more systems are depending on just a few vendors because of the immense consolidation in the cybersecurity market,” said David Brumley, CEO of ForAllSecure and cybersecurity professor at Carnegie Mellon University. “Redundancy is getting harder and harder. Two systems running the same software will both crash together. Instead, we need diversity. That is going against the industry, where there is a massive consolidation among vendors. Google buys Wiz, Cisco buys Duo, and all the other unicorns means that our software reliability stacks are in the hands of just a few meta companies.”

6. “Having a single point of failure in your system means that, eventually, there will probably be a failure,” Alexander Linton said. “Having so much critical infrastructure relying on a centralized service is a huge mistake and something we should try to remedy as an industry going forward.”

Update cycles taken out of the hands of sysadmins

7. “Many security teams don’t realize that their endpoint protection platforms’ signature updates often themselves contain code, further exacerbating the issue. We should expect to see changes in this operating model. For better or worse, CrowdStrike has just shown why this operating model of pushing updates without IT intervention is unsustainable,” said Jake Williams, former NSA hacker, faculty at IANS Research, and VP of R&D at Hunter Strategy.

Who’s to blame?

8. “Ultimately, it’s the vendor (CrowdStrike) that had pushed the changes which broke things… not even the end-user organizations or businesses themselves. Critical infrastructure might have an EDR or XDR solution slapped onto it just to “check the box,” but the scenario where the provider accidentally breaks the infrastructure isn’t one you ever really think of,” said John Hammond, principal security researcher at Huntress. “With all that root-level power and capability, it is especially fringe. A small mistake in code, any accidental misconfiguration, or just simply unexpected behavior can cause the whole computer to crash.”

9. “When you buy a car, it’s been thoroughly tested for safety. CrowdStrike didn’t do enough testing on their software, which resulted in a broken update. That was paired with their worldwide distribution, immediately crashing computers across the internet,” said David Brumley. “CrowdStrike needs to make a radical investment in improving their software testing and get better about incrementally rolling out updates so not everything breaks at once. Organizations need to put pressure on their vendors.”

Follow us on Facebook @cyber1defense communication

10. “Accountability likely rests with CrowdStrike for the faulty deployment. However, organizations also need to take responsibility for having adequate backup and recovery procedures. Both parties play a role in ensuring system reliability and resilience,” Matthew Carr, Co-founder & CTO at Atumcell Group, said.

11. “The world was on notice after the SolarWinds attack, where Russian cyberspies infiltrated the patch update process to send a Trojan update to SolarWinds customers. Following that attack, a Russian cybercrime syndicate deployed a similar attack against Kaseya’s customers. Every company should have learned the lesson about controlling updates, especially CrowdStrike, which was called in to solve both the SolarWinds and Kaseya cyberattacks, said Eric O’Neill,

12. “I hope this doesn’t undermine confidence in cloud-based security solutions. As cybercrime and espionage become more sophisticated and leverage top-tier AI for attacks, rapid deployment of intelligence from the cloud is the only effective response. Consumers have two options: rely on cloud-based technologies or air-gap their systems and dust off their old typewriters,” O’Neill added.

The lackluster response

13. “The businesses would have disaster recovery plans, which, unfortunately, have remained more of a paper-based exercise than a plan that was tried and tested at scale across the key simulation scenarios,” Alina Timofeeva, strategic advisor in technology. “It is very key for Companies to invest in operational resilience, which is broader than just technology. It would cover technology, data, third parties, processes, and people.”

14. “Companies that keep their infrastructure in the cloud coped with the problem quicker than others thanks to virtualization and API-based scripts. For AWS-hosted and Microsoft Azure-hosted virtual machines, the instructions are usually published in a matter of hours. Moreover, it does not take much time to implement those instructions compared to doing that for a full park of bare metal servers,” said Victor Zyamzin. “Companies that backup regularly probably were also less impacted.”

On potential changes

15. “Computer scientists have long known what to do here: you build in thorough, automated stress testing for every software change. We call this fuzzing. You pair that with incremental updates so that if something slips through the cracks, you can detect it early without the entire internet going down. Sadly, improving software security testing is the first thing companies skimp on. New features, or worse, cost savings, mean we’ve globally underinvested in software reliability,” said David Brumley.

16. “I would particularly encourage companies to look much deeper into their current reliance on cloud providers and mitigate the concentration risk both internally within the company and across the world to minimize the impact on the material services should a major IT Disaster occur,” said Alina Timofeeva.

17. The only one that comes to mind is the incremental basis of deploying future updates from CrowdStrike, meaning that the next updates could be rolled out by the first 1% of devices in the company, then by 5% in a few days, then we wait a week, and so on,” shared Victor Zyamzin. “And we recommend deploying this way not only updates from CrowdStrike but from all vendors that have an impact on your infrastructure.”

Read also: Global cyber outage grounds flights and disrupts businesses.

18. “The concept of EDR, which relies on frequent policy updates based on continuously discovered new attack patterns, is fundamentally flawed: It requires frequent software updates, which may contain bugs that risk the system’s business continuity,” David Barzilai, co-founder of Karamba Security, shared. “Mission-critical applications that run as closed systems (such as airport and hospital servers, vehicle systems, medical devices, printers, etc.) should be hardened to ensure they only run authorized programs and deterministically prevent any foreign code from executing. This approach eliminates the need for frequent policy updates and proactively blocks any attempts to run foreign code, such as malware.”

19. “Replacing an EDR or XDR vendor could take quarters, if not years. Plus, you may change vendors, but who could guarantee that the new partner would not fail too?” said Victor Zyamzin. “This situation would push more companies to switch from the endpoint security approach to the zero trust security approach. They, in fact, might make a transition to actual cloud-based security solutions, meaning they will put less trust in the endpoints and more in cloud security. Therefore, they would need cloud-based security solutions. If 20% of companies would do that, it would be a fantastic win for our industry. But I believe only 5-15% would actually go for that.”

On regulation

20. “I think that the more reasonable questions would be: what could regulators not do? The thing is that every company develops a risk model. It helps to choose what kind of protection to install and invest in. They are satisfied with their protection. But then the regulator comes and says: ‘We did some research and discovered that only 60 percent of companies use antivirus. Therefore, we decided that installation of antivirus will be mandatory from now on.’

They do not check if all businesses actually need that antivirus. Therefore, we see the situation when companies are forced to buy CrowdStrike or any other solutions based on cheaper prices just to make away with regulators. I believe that maybe from 50% to 90% of companies affected today wouldn’t be affected at all if they had not installed CrowdStrike or other EDR and XDR software products in the first place just for compliance reasons,” said Victor Zyamzin.