How regex was responsible for the crowdstrike outage
Do you know what one of the biggest nightmares of people who rely on computers often is? The computer not working correctly. The reason for it not working could be anything, but a couple of months ago, many Windows users saw the blue screen of death on their machines, which was chaotic. Flights got delayed, machines were not working in banks, hospitals could not treat patients, etc.
This blog is written by Akshat Virmani at KushoAI. We're building the fastest way to test your APIs. It's completely free and you can sign up here.
This was due to an update pushed by a top cyber security firm, CrowdStrike. It is stated that this happened because of failures caused by Regex, and this isn't the first time Regex has been responsible for these kinds of outages. Let's take a look at a few of them:
CrowdStrike
The incident happened due to poor implementation of regular expressions. The root cause was that the channel file was in an older data format with 20 numbers, but the Content Interpreter expected the array to be 21. When the system received input with the 21st parameter, the Content Interpreter tried to read beyond the allocated memory, resulting in out-of-bounds access and system crashes of around 8.5 million users.
The unit tests only tested the happy path, and no regression testing was done to ensure compatibility with the older data format. The 21st field lacked a specific test for "non-wildcard matching criteria." The update was distributed to all customers without staggered rollouts.
StackOverflow
Stackoverflow also experienced a 34-minute outage due to a regular expression that handled user input in the code. This happened because regex was stuck in catastrophic backtracking, which consumed a large amount of CPU resources. This was a bottleneck, as the regex evaluation increasingly tied up the system’s resources instead of processing regular requests.
Cloudflare
Another case of regex’s failure was in 2019 when a poorly optimised regular expression was prone to severe backtracking. This led to 100% global CPU exhaustion for Cloudflare. The sudden surge in CPU usage overwhelmed Cloudflare’s systems, leading to a worldwide outage that lasted around 30 minutes. After identifying the issue, Cloudflare quickly disabled the faulty regex, restoring services and implementing safeguards to prevent similar problems.
To Conclude
Regex is a powerful tool but can also cause many harms as listed above in the blog. Poor optimisation of regex can lead to backtracking, which results in severe system slowdowns or outages. Despite its downsides, regex remains invaluable when used thoughtfully and carefully, allowing for complex data operations with minimal code.
This blog is written by Akshat Virmani at KushoAI. We're building an AI agent that tests your APIs for you. Bring in API information and watch KushoAI turn it into fully functional and exhaustive test suites in minutes.
Member discussion