Profile
Cloud •  AI •  GameDev •  Robotics

Cloud Chaos: How the October 2025 AWS Outage Shook the Internet

By William Do October 20, 2025 Posted in Cloud Technologies
Cloud Chaos: How the October 2025 AWS Outage Shook the Internet

Photo by Francisco De Legarreta C. on Unsplash


On October 20, 2025, Amazon Web Services (AWS) experienced a significant global outage, disrupting a broad spectrum of online services and applications across the world. This incident brought to light the heavy reliance on cloud infrastructure providers and underscored the critical importance of resilience and disaster recovery strategies for cloud-dependent businesses and services. This article explores the details of the event, the affected services, root cause analyses, economic impacts, and the lessons engineers and organisations can learn from this outage.

Incident Overview

AWS, which is a vital backbone for many internet services, encountered a major service disruption primarily centred in its US-East-1 region. This region hosts many of Amazon’s data centres and is a central hub for cloud infrastructure supporting myriad websites and applications. The outage was first detected early on October 20 and quickly escalated into a widespread service disruption lasting several hours, with intermittent recoveries followed by relapses before full stability returned.

Numerous high-profile platforms experienced outages or degraded performance during this period. These include popular social media apps such as Snapchat and Reddit, gaming services like Fortnite, Roblox, and PlayStation Network, streaming platforms including Disney+, and online payment services such as Venmo. Additionally, essential services from major US airlines like Delta and United, financial institutions, cryptocurrency exchanges such as Coinbase, and AI platforms like Perplexity were affected, reflecting the incident’s broad impact across multiple industries [1][2][3].

Root Cause and Official AWS Statement

The fundamental issue behind the outage was traced to a DNS (Domain Name System) failure within the AWS US-East-1 region. DNS acts as the internet’s address book, translating human-readable domain names into IP addresses that computers use to communicate. The outage specifically disrupted DNS resolution for critical AWS services, including the DynamoDB API endpoints.

When AWS’s DNS servers failed to resolve these addresses correctly-possibly due to configuration or propagation errors-many connected services could not find their required resources, causing widespread outages despite the underlying servers being operational. Essentially, the network lost the ability to route requests properly, leading to cascading service interruptions.

AWS confirmed rapid mitigation of the DNS problem but acknowledged lingering effects due to request backlogs and propagation delays. This incident highlights DNS as a critical infrastructure vulnerability point that can silently cause massive disruptions, emphasizing the need for enhanced DNS redundancy and monitoring in cloud platforms [1][4][6][7][8].

Economic and Service Impact

Estimating the exact economic damage from such cloud outages is challenging, but the breadth and duration of this event suggest significant disruptions and associated costs. Thousands of companies experienced service interruptions, which translated to loss of revenue, productivity, and customer trust.

The disruptions affected critical digital workflows: online gaming sessions were interrupted, streaming services faced buffering and downtime, financial transactions were delayed or failed, and e-commerce activities stalled. In sectors such as finance and airlines, operational downtime can lead to cascading effects including regulatory scrutiny and logistical complications. Additionally, the reputational impact on companies relying on AWS reverberated widely, prompting intensified scrutiny of cloud dependency risks [2][3][5].

Lessons Learned and Recommendations

The October 2025 outage revealed crucial insights about building resilience on AWS and similar cloud platforms. While many affected companies employed redundancy across multiple availability zones and even multiple AWS regions, the incident exposed a major vulnerability: critical AWS global services and the control plane-such as DynamoDB Global Tables, Identity and Access Management (IAM), and global account management-are centrally managed in the US-East-1 region. Consequently, even workloads running in other regions still rely heavily on US-East-1 for key infrastructure components.

When US-East-1 suffered a DNS failure, these centralized dependencies caused cascading failures across regions and customers worldwide. This means that multi-region workload distribution alone does not guarantee full isolation from outages rooted in single-region control services.

True cloud resilience requires architectural awareness of these hidden dependencies. Organisations must map out all their direct and indirect service dependencies-especially shared global infrastructure-and not assume that regional redundancy fully protects them from outages. Moreover, multi-cloud strategies and the decoupling of critical control services from data planes can mitigate this risk further.

Additional lessons include:

This incident serves as a stark reminder that regional redundancy is necessary but insufficient by itself. To truly future-proof cloud deployments, organisations must understand and architect around underlying control-plane centralization-helping prevent outages from escalating globally due to single points of failure [6][7][8].

References


  1. Amazon's AWS nears recovery after major outage disrupts websites.
  2. AWS global outage hits Amazon, Snapchat, Roblox and others.
  3. Amazon AWS experiences major global outage impacting top services.
  4. Amazon says outage issue is 'fully mitigated' as sites return.
  5. Today's Massive AWS Outage Took Down Your Favourite Websites.
  6. AWS Outage exposes the resilience challenge of AWS's control plane.
  7. Expert reaction to Amazon internet services outage.
  8. The AWS Outage: What Happened?


You Might Also Like