Security

Patches

CrowdStrike meets Murphy's Law: Anything that can go wrong will

And boy, did last Friday's Windows fiasco ever prove that yet again


Opinion CrowdStrike's recent Windows debacle will surely earn a prominent place in the annals of epic tech failures. On July 19, the cybersecurity giant accomplished what legions of hackers could only dream of – bringing millions of Windows systems worldwide to their knees with a single botched update.

As a veteran tech journalist, I've seen my fair share of software snafus. Heck, I went hand-to-hand with the grandpa of all network blow-ups – the Morris Worm – in 1988 when I was a sysadmin. Even so, I can't help but marvel at the sheer scale and impact of this blunder. CrowdStrike, a company valued at over $70 billion and trusted by countless organizations to protect their digital assets, inadvertently became the source of one of the largest IT outages in history.

The fallout from this debacle was staggering – thousands of flights canceled, healthcare services disrupted, and 911 systems knocked offline. It's a stark reminder of how deeply intertwined our digital infrastructure has become and how vulnerable it can be to a single point of failure.

Let's break down the cascade of errors that led to this fiasco.

In the beginning, Microsoft enabled CrowdStrike's Falcon security software to run at the zero level of the Windows kernel. Any problem at this low level will likely cause a Blue Screen of Death (BSOD). Meanwhile, Microsoft reportedly wants to blame the European Commission – no, really – for requiring it to grant third-party software vendors this level of access.

You know, I think with all of Microsoft developers and lawyers, they could come up with a better, legal way to avoid this kind of foul-up and let software companies compete equally. It's not rocket science. 

Microsoft doesn't want any of the blame, but it deserves some of it. For far too long, we've placed too many vital IT eggs in the Windows basket. When that basket falls, so does much of the economy.

Returning to CrowdStrike, the company claims a "logic error" in a routine sensor configuration update caused the meltdown. But for a company of CrowdStrike's caliber, such a fundamental mistake is inexcusable. This wasn't some obscure edge case – it was a critical failure in its core functionality.

It wasn't even a code problem. This wasn't a software update per se. The villain of this piece was a Falcon configuration file called a channel file. One simple file containing what should have contained data to update a security setting ended up causing a cascade of one BSOD after another.

How did such a catastrophic bug pass quality assurance? CrowdStrike admitted: "Due to a bug in the Content Validator, one of the two Template Instances passed validation despite containing problematic content data [and] were deployed into production." When your software has deep hooks into millions of Windows systems, your testing should be bulletproof. Clearly, CrowdStrike's testing protocols need a massive overhaul.

We also now know, as security expert Kevin Beaumont pointed out on Mastodon: "The key takeaway – channel updates are currently deployed globally, instantly." I always send major patches to all my customers simultaneously and wait to see what happens next. Doesn't everyone? Who are these people, and why does anyone let them do security work?

There's a simple concept called canary testing. You may have heard of it. Like the proverbial canary in a coal mine, you first test whether a new space – or program – is safe by trying it on a canary – or a small group of users – and then, if all's well, let everyone else in.

Let's not forget that CrowdStrike's initial response was slow and inadequate. Users were left scrambling for answers while critical infrastructure faltered. Even today, almost a week later, I still have friends having trouble with their Delta flights.

This serves as a sobering wake-up call for the rest of us in the tech industry. As we rush to secure our systems against external threats, we must not overlook the potential for self-inflicted wounds. Rigorous testing, fail-safe mechanisms, and a healthy dose of humility are essential when dealing with critical systems.

In the end, CrowdStrike's Windows fiasco is a textbook example of Murphy's Law in action – anything that can go wrong will go wrong. It's a painful lesson but one that we would all do well to learn from. After all, in cybersecurity, your next big threat might just be an update away. ®

Send us news
98 Comments

Why did the Windows 95 setup use Windows 3.1?

If MS-DOS could play Doom, surely a battleship gray button was a possibility?

Windows 7 lives! How to keep your favorite fossil running

You probably shouldn't, but if you must, you can

Microsoft trims more CPUs from Windows 11 compatibility list

OEMs blowing dust from the processor stock cupboard, beware

Your days of driver sync via Windows Server Update Services are numbered

Microsoft suggests a move to the cloud

Why users still couldn't care less about Windows 11

No reason to upgrade other than the looming end of Windows 10

Microsoft to kill off Defender VPN this month

Throw Copilot down the same well, too, maybe? No? OK

Windows 11 stages a comeback – still miles behind older sibling

Microsoft's latest OS claws back market share from Windows 10, but the finish line is a long way off

Microsoft vet laments a world where even toothbrushes need reboots

Raymond Chen reflects on the never-ending cycle of updates and restarts

How Windows got to version 3 – an illustrated history

With added manga and snark. What's not to like?

Garmin pulls a CrowdStrike, turns smartwatches into fancy bracelets

Blue Screen of Death becomes the Blue Triangle of Doom for your wrist

WINE 10 is still not an emulator, but Windows apps won't know the difference

New double-digit vintage goes well with all sorts of things

Don't want your Kubernetes Windows nodes hijacked? Patch this hole now

SYSTEM-level command injection via API parameter *chef's kiss*