MazeBolt Blog - What the Cloudflare Outage Exposed

What the Cloudflare Outage Exposed

On November 18, 2025, the internet took a big hit. Cloudflare – the infrastructure company behind performance, security, and routing for roughly 20 percent of all websites – suffered a global outage that disrupted thousands of sites and applications. Services ranging from ChatGPT and X to major gaming platforms, e-commerce sites, and SaaS dashboards experienced errors, failures to load pages and interrupted transactions.

An Internal Configuration Error

According to Cloudflare’s own post-mortem, the events began at 11:20 UTC:

  1. A permissions change in one of its internal database systems (ClickHouse) caused an automated query to produce duplicate entries in a feature file used by its Bot Management system.
  2. That file then doubled in size, surpassing the limit built into the software that reads it across Cloudflare’s network.
  3. Once the file propagated to all machines in the network, traffic‑handling processes in the core proxy began failing.
  4. Error rates (5xx HTTP status codes) spiked.
  5. Impact began to be felt at 11:28; core traffic was mostly normal by 14:30 UTC and all services were resolved by 17:06 UTC.

Cloudflare initially suspected a hyperscale DDoS attack (given the symptoms and traffic behavior), and early coverage echoed that possibility.

The Dependency Risk Exposed

Because Cloudflare supports a vast number of websites, from enterprises to small businesses, the impact was wide-ranging. Major platforms including ChatGPT, X, Spotify, Canva, League of Legends and NJ Transit reported service degradation or downtime. The disruption reached across multiple geographies and industries, illustrating how a single provider’s failure can ripple through the ecosystem.

The incident also highlighted something less obvious: even services not under attack can be brought down when the infrastructure they rely on fails. It exposed a dependency risk that many organizations may have ignored until now.

Key Lessons

Some of the lessons learned from this particular event include:

  • Infrastructure-failure risk is real. Not every outage is caused by bad actors – but the impact can be identical.
  • Dependencies matter. If your upstream provider serves much of the internet and it falters, you may be collateral damage.
  • Testing matters. Had the broken component in Cloudflare’s system been caught under realistic load or configuration failure scenarios, the cascading failure might have been prevented.

Why This Matters for DDoS Readiness

Although the root cause was internal, the failure mimicked what a DDoS attack could do: overwhelm routing, saturate infrastructure, trigger error cascades and take services offline. For organizations preparing for real attacks, that means one thing: defenses must work not only in ideal conditions but also when unexpected load or misconfigurations occur.

According to MazeBolt’s landmark survey of 300 CISOs and senior security leaders, 86 percent of organizations test their DDoS defenses once a year or less. That long interval leaves large windows of exposure where DDoS misconfigurations and vulnerabilities can go unnoticed until they’re exploited or triggered by accident.

Detecting Vulnerabilities Before They Become Outages

That’s where RADAR™ by MazeBolt comes in. RADAR enables continuous, nondisruptive DDoS testing – running thousands of DDoS attack simulations in live production environments without impacting live services. By doing so, organizations can detect weaknesses and fix them long before they become outages.

MazeBolt’s survey found that while many enterprises invested in DDoS protection, few had validated real DDoS resilience or tested under realistic stress.

In the case of Cloudflare’s outage, a seemingly small configuration change produced cascading failure. That kind of scenario is exactly what RADAR is designed to uncover: the latent DDoS vulnerabilities that traditional, annual testing misses.

RADAR continuously validates defense readiness and provides audit‑ready reporting, so security and compliance leaders can prove ongoing risk reduction. With its continuous, nondisruptive DDoS testing and AI-enabled prioritization recommendations, RADAR enables enterprises to move from once-a-year validation to ongoing assurance – and dramatically reduce the risk of being the next big headline.

Want to learn more about preventing damaging DDoS downtime? Speak with an expert.

 

Skim Summary:

  • What happened: On Nov 18, 2025, a Cloudflare change triggered a global outage – causing widespread errors across customer sites and apps.
  • Root cause (Cloudflare’s account): A ClickHouse permissions change led to duplicate entries in a Bot Management feature file. The file doubled in size, exceeding a hard limit in the software that routes traffic across Cloudflare’s network, which then failed.
  • Timeline (high level): Impact began around 11:20 UTC; recovery started ~14:30 UTC; full resolution later that afternoon.
  • Not a DDoS attack: Cloudflare initially suspected a hyperscale DDoS attack (given the symptoms and traffic behavior), and early coverage echoed that possibility. But the cause was internal.
  • Why impact was wide: Cloudflare underpins a large share of the web; when an upstream provider fails, dependent services degrade even if they aren’t under attack.
  • Key lessons: Infra failures can look like attacks; dependencies magnify blast radius; and continuous testing would likely surface fragility before it cascades.
  • MazeBolt angle: RADAR™ runs continuous, nondisruptive DDoS simulations on live environments to reveal vulnerabilities and misconfigurations before they cause outages. According to a MazeBolt survey, 86% of organizations test annually or less, leaving long windows of exposure.

FAQ:

1) Was this a DDoS attack?

No. Symptoms looked DDoS‑like, but Cloudflare attributed the outage to an internal configuration error.

2) What exactly failed inside Cloudflare?

A permissions change in ClickHouse produced duplicate entries in a Bot Management feature file. The oversized file triggered failures in traffic‑handling software, leading to widespread 5xx errors.

3) Why did it affect so many services?

Cloudflare sits in the path of a large share of the internet. When a core component faltered, dependent services across sectors saw degradation or downtime.

4) What is the key lesson for DDoS readiness?

Internal faults can mimic attack conditions. Defenses must be validated under stress and misconfiguration, not only in scheduled, ideal tests.

5) How does RADAR™ help prevent similar outcomes?

RADAR continuously and nondisruptively tests the live production environment, validating every DDoS defense layer, delivering audit‑ready reporting, and proving risk is being reduced over time.

6) How often should organizations test, and is it safe to test in production?

Continuously. Most organizations still test once a year or less, which leaves long exposure windows. RADAR runs thousands of tests on live traffic without disruption, so DDoS validation can be ongoing.

Stay Updated.
Get our Newsletter*

Recent posts

Gamers vs. DDoS: Who Wins?

Gaming platforms face rising DDoS attacks. Learn why continuous, nondisruptive testing is essential to protect players, revenue and real-time experiences.