CrowdStrike and Microsoft: What we know about global IT outage

Swetha Sundaram
4 min readJul 29, 2024

--

Overview of the Incident

The Microsoft CrowdStrike outage was a major event that kicked off early on a Friday. The trouble started with a software update from CrowdStrike, targeting their Falcon sensor security software on Microsoft Windows. This update caused widespread “blue screens of death,” those infamous error screens on Windows.

Details of the Affected Updates

CrowdStrike’s update was supposed to enhance the Falcon sensor’s ability to detect new cyber threats. Instead, it had a logic error triggered by a routine sensor configuration update. This update rolled out just after midnight EST on Friday and led to system crashes.

Immediate Impacts Detected

The effects were severe and widespread, hitting various sectors globally. Critical services like air travel faced massive disruptions, with thousands of flights canceled and delays piling up. The healthcare sector was also hit hard, with some surgeries postponed and emergency services experiencing outages. This incident highlighted how essential cybersecurity software is to our modern digital infrastructure.

The Microsoft CrowdStrike outage had a far-reaching impact, affecting multiple sectors and regions. Here’s a closer look:

Affected Sectors (airlines, healthcare, financial services)

The airline industry was hit particularly hard, with over 4,295 flights canceled globally, causing chaos at airports. Healthcare systems like Mass General Brigham and Emory Healthcare had to postpone services and revert to manual systems. Financial services also suffered, with disruptions in payment systems and customer access at banks worldwide.

Geographical Spread of the Outages

This wasn’t just a local issue — it affected services across the U.S., Canada, the UK, Europe, and Asia. Major U.S. cities saw disruptions in healthcare and public transportation, while the UK’s National Health Service faced setbacks in managing patient records and appointments.

Operational Consequences on Businesses

Businesses worldwide faced operational hurdles. Amazon warehouse employees struggled with schedule management, and Starbucks temporarily closed stores due to mobile ordering issues. Big corporations like FedEx and UPS reported substantial disruptions affecting logistics and deliveries. This outage underscored how crucial stable and secure IT infrastructures are for modern businesses.

Responses from CrowdStrike and Microsoft

Statements from CrowdStrike and Microsoft Executives

CrowdStrike’s CEO apologized for the disruption and assured that they had identified and fixed the issue, focusing on restoring customer systems. Microsoft deployed experts to work with affected customers and collaborated with other cloud providers to mitigate the impact.

Technical Steps Taken to Resolve the Issue

CrowdStrike pinpointed the problematic update and reverted changes to stabilize systems. Microsoft provided manual remediation documentation and scripts and updated the Azure Status Dashboard to keep customers informed. Both companies mobilized full resources to address the issue quickly.

Customer Communication and Support Efforts

CrowdStrike used their support portal and official channels to update customers and recommended specific remediation steps. Microsoft shared updates and solutions through official platforms to ensure widespread awareness and swift resolution. CrowdStrike also provided guidelines on their blog and support portal for further assistance.

Challenges and Recovery Efforts

Technical challenges in the recovery process

Recovery was tough due to the need for manual remediation of many devices. A critical issue was the lack of a phased rollout of updates, which would usually help reduce the impact. Companies deployed hundreds of engineers to work directly with affected systems and used specific recovery tools to restore PCs.

Cloud vs. on-premises remediation

Addressing issues in cloud environments like AWS, Azure, and GCP involved unique challenges compared to traditional on-premises systems. Cloud platforms don’t support conventional recovery methods like “safe mode,” requiring administrators to use more complex procedures to resolve issues.

The role of BitLocker in recovery

BitLocker, Microsoft’s disk encryption technology, played a dual role. While it provided essential security, it also complicated recovery efforts by requiring access to the BitLocker Recovery Key to manage disks securely.

Learning from the CrowdStrike Outage: Enhancing Disaster Recovery Plans

The recent CrowdStrike outage teaches an important lesson for all organizations: the need for a solid disaster recovery (DR) strategy. This incident reminded us that in today’s digital world, no system is immune to disruptions. Whether it’s due to cyberattacks, technical issues, or natural disasters, having an effective DR plan is crucial for maintaining business continuity and minimizing downtime.

Here are a few key takeaways for bolstering your disaster recovery plans:

  • Practice Regular DR Drills and Update/Review Plans Continuously: Run simulations of possible outage scenarios to test your response strategies and find any weaknesses and regularly review your DR plans to adjust to new threats
  • Backup Essential Data: Regularly back up all crucial data and store it in multiple locations.
  • Have a Failover Plan: Determine your failback plan to get back to your production environment

Stay Vigilant: Scammers Exploit Chaos During Outages

The outage also shined a light on another big problem: opportunistic scammers. While CrowdStrike was handling the chaos, scammers swooped in to take advantage of the situation, making things even more complicated for businesses. This really drives home the point that we need not only a solid DR plan but also strong cybersecurity measures to protect against these kinds of threats when we’re most vulnerable.

--

--

Swetha Sundaram

A Security Analyst specializing in advanced threat analysis, protecting clients from digital threats. I have a passion for technology, fitness, and cooking. :)