- Individuals and businesses using Windows systems around the world have witnessed a global outage.
- Many people have faced system crashes, Blue Screen of Death (BSOD) and bootloop.
- The global outage is being blamed on a faulty CrowdStrike update.
A faulty CrowdStrike update has caused a massive global outage on Microsoft Windows systems. Thousands of people and businesses have witnessed system crashes and the dreaded Blue Screen of Death (BSOD). Many Windows computers have also experienced a boot loop, where computers start and stop randomly. The issue is affecting Windows servers and workstations and has even taken entire businesses offline. Microsoft Azure and Microsoft 365 services are also reportedly experiencing disruptions.
The cause of the failure
CrowdStrike is a leading cybersecurity company, and many businesses around the world rely on the company’s software to protect their Windows servers and PCs from cyber threats.
The outage on Windows systems is linked to a software update by the cybersecurity service provider. Microsoft said the preliminary root cause was a “configuration change” in some Azure backend workloads that caused disruptions between compute resources and storage, leading to connectivity outages. These outages affected downstream Microsoft 365 services that depend on those connections.
CrowdStrike CEO George Kurtz said the outage was caused by a “flaw” in a content update for Windows hosts. Kurtz also ruled out cyberattacks and said Linux and Mac hosts were not affected.
A post on CrowdStrike’s support forums acknowledged the issue, saying the company had received reports of crashes related to a content update for Falcon Sensor, its cloud-based security service.
In a post on X, Kurtz said the company is rolling out a fix. He said: “CrowdStrike is actively working with customers affected by a flaw found in a single content update for Windows hosts. Mac and Linux hosts are not affected.”
“This is not a security incident or cyberattack. The issue has been identified, isolated and a fix has been deployed,” Kurtz added.
CrowdStrike is actively working with customers affected by a flaw found in a single content update for Windows hosts. Mac and Linux hosts are not affected. This is not a security incident or cyberattack. The issue has been identified, isolated, and a fix has been deployed. We…
— George Kurtz (@George_Kurtz) July 19, 2024
Who was affected?
Thousands of businesses across all sectors and industries were affected by the outage, including:
- Emergency services in Canada and many major US cities, including 911 emergency services in New York, Arizona and Alaska.
- The health hotline in Catalonia, Spain, would be impacted.
- Hospitals in the Netherlands, USA and Spain
- The UK National Health Services (NHS) clinical informatics system.
- Dutch television channel NOS reported that the problem had disrupted Schiphol Airport.
- Airports and airlines in Australia, Germany, Scotland, Spain, India, Netherlands and the United Kingdom.
- Several media and television channels, including ABC and Sky News, suffered disruptions.
- The London Stock Exchange in the UK reported disruptions. Several banks in Australia and New Zealand also reported being affected by the outage.
CrowdStrike offers a workaround
While the company fixes the issue, it has provided some workarounds for those affected.
- Start Windows in Safe Mode or Windows Recovery Environment
- Navigate to the C:\Windows\System32\drivers\CrowdStrike directory.
- Locate the file “C-00000291*.sys” and delete it.
- Start the host normally.
Industry leaders and experts share their perspectives
As soon as the global outage hit the headlines, several industry experts shared their perspectives. Here are a few.
- Jake Moore, Global Security Advisor at ESET
“These outages are increasing in volume due to the increase in users and online traffic. After witnessing the Blue Screen of Death (BSOD), many people are quick to suspect a cyberattack or see similarities to Netflix’s Leave The World Behind, but this can often add to the confusion. This highlights the importance of these services and the millions of people they serve.
Companies should test their infrastructure and implement multiple security measures, regardless of their size. This is commonly referred to as a cyber resilience plan. However, as is often the case, it is simply impossible to simulate the size and scope of the problem in a secure environment without testing the actual network.
The inconvenience caused by the loss of access to services for thousands of people reminds us of our dependence on technology giants, like Microsoft, to manage our daily lives and businesses. Upgrades and maintenance of systems and networks can unintentionally include small errors, which can have serious consequences, as Crowdstrike customers experienced today.
Another aspect of this incident concerns “diversity” in the use of large-scale IT infrastructures. This applies to critical systems such as operating systems (OS), cybersecurity products, and other globally deployed applications. When diversity is low, a single technical incident, let alone a security issue, can lead to global outages with subsequent repercussions.
2. Rob Reeves, Principal Cybersecurity Engineer at Immersive Labs
“It is still too early to determine how such an error occurred and whether it was due to a code error in the driver or an unforeseen and undocumented change in the Windows operating system that CrowdStrike could not foresee. However, it is clear that the heavy reliance on Falcon has become a double-edged sword and is causing incalculable disruption to business operations around the world.”
The severity of this incident is a stark wake-up call, highlighting the critical need for rigorous and reliable testing of EDR and ELAM drivers in cybersecurity systems. Now more than ever, it is crucial to re-evaluate and revise current testing procedures, quickly identifying and addressing any issues that arise.
This raises the question of whether security product updates should be applied automatically across the board for up-to-date protection, or whether customers should retain control of the update process, ensuring thorough testing before implementation.”
3. Aleksandr Yampolskiy, CEO of SecurityScorecard
“When I worked at Goldman Sachs, the policy was to get tools from multiple vendors. That way, if a firewall goes down at one vendor, you have another vendor that may be more resilient.
Today’s global outage reminds us of the fragility and systemic risk of “n-tier” concentration of the technologies that govern daily life: airlines, banks, telecommunications, stock exchanges, etc. SecurityScorecard, in collaboration with McKinsey, research produced showing that 62% of the global external attack surface is concentrated in the products and services of just 15 companies.
An outage is just another form of security incident. In these situations, antifragility is about not putting all your eggs in one basket. You need to have diverse systems, know where your single points of failure are, and proactively stress test them through tabletop exercises and outage simulations. Think of the concept of a “chaos monkey,” where you deliberately break your systems, such as shutting down your database or running your firewall to see how your computers react.
Whether caused by a malicious DDoS attack or a faulty patch update, an outage has the same end result: users are denied access to critical systems.
This disruption creates fertile ground for exploitation as attackers prey on the vulnerability of users looking for solutions. The timing of this event and its public nature are exactly what attackers are looking for to design targeted attacks. Threat actors can use social engineering tactics to disguise malware as legitimate recovery tools to gain unauthorized access to systems. Vigilance is paramount as organizations must not only cope with the outage, but also strengthen their defenses against opportunistic attacks that exploit the chaos.
4. Carlos Aguilar Melchor, Chief Scientist, Cybersecurity at SandboxAQ
“Having visibility into your software supply chain is critical, especially critical practices like cybersecurity, cryptography management, and of course, testing and patching practices. With this historic outage, along with other recent catastrophic events in the software supply chain like SolarWinds and Log4j, we cannot blindly accept software updates or blindly trust cybersecurity or cryptography practices. Every enterprise must immediately implement observability into their software systems to monitor these high-impact platforms and prevent these disasters.”
5. Graham Steel, Cybersecurity Product Manager at SandboxAQ
“This major outage was caused by a bug that CrowdStrike failed to detect before deploying an update to thousands of organizations worldwide. This latest outage should prompt all organizations to implement systems that scan each update before it is allowed into their organization. Recent consolidation in the cybersecurity market has increased the risk of this issue occurring again, with organizations relying on just a few vendors.”
For the latest service updates from providers, visit the status page for Microsoft Azure and CrowdStrike Falcon Content Update Statement for Windows Hosts.
The article will be updated as we receive more updates on the issue.