Skip to main content

The Definitive AWS Outage Report 2025: Reliability Analytics and Cascade Impact

· 23 min read
Hrishikesh Barua
Founder, IncidentHub
IncidentHub

Introduction

Amazon Web Services remains one of the most popular cloud providers, with 200+ services in 39 regions across the world. Like all providers, they have their share of outages.

In 2025, IncidentHub detected 38 AWS outages, of which the one on October 20th had the most widespread impact affecting hundreds of SaaS providers simultaneously. Payments were disrupted, students lost access to classrooms, developer tooling degraded, and some IT teams experienced alerting gaps.

In this post we look at the reliability of AWS in 2025 based on their own publicly available status page data aggregated by IncidentHub, with a deeper analysis of the cascading impact of the October 20th outage.

Amazon Web Services Reliability in 2025

Methodology

We collected and analyzed the uptime of all AWS services and regions for a period of 1 year between 1st January 2025 and 31st December 2025. In this period, IncidentHub detected 38 outages across AWS products and regions. To detect cascading outages, we filtered out incident reports from SaaS vendors who acknowledged the cause as AWS on their status pages.

For this report, an "outage" is defined as an incident listed on AWS's status page that impacts or disrupts at least one AWS service. Each AWS status page incident was counted as a single outage, regardless of the number of services or regions listed.

We analyzed only 2025 incident data here.

A brief note on how IncidentHub collects outage data

IncidentHub - a status page aggregator - monitors public status page periodically across hundreds of SaaS and Cloud vendors. It detects outages, maintenance events, and changes in services and regions automatically. The end result is an aggregated dashboard of vendors - a single status page for all third-party service status pages.


We wanted to keep the analysis relevant to practitioners - actual users who rely on AWS. We focused on these aspects:

  • Frequency of outages
  • Duration of outages
  • Outages by Service and Region
  • Is Reliability Improving, Staying the Same, or Getting Worse?
  • Outage Dependency Mapping - Which other SaaS providers were affected?

Frequency of Outages

AWS had at least one outage every month except for the last two in 2025. For a platform as large as AWS, this is not surprising. However, the real measure of reliability is the duration and the impact of the outages.

AWS Outages by Month 2025

Duration of Outages

The shortest outage lasted around 14 minutes, whereas the longest was around 15 hours.

AWS Outages Duration by Month 2025

Average monthly MTTR was higher in Q3 than in Q1 and Q2. The October 20th outage being an outlier drove up the MTTR to be almost 12x that of January's in Q4.

MonthMTTR (minutes)
Jan70.64
Feb116.26
Mar36.50
Apr76.00
May66.26
Jun65.27
Jul133.18
Aug63.96
Sep122.66
Oct882.50

Outages by Service

There is usually no correlation between the number of affected services and the duration of the outage in most of the outages, except for the one on Feb 13th and on Oct 20th.

Services Affected per Outage

AWS Outages by Service 2025

Services Which had the Highest Number of Outages

AWS Services with Highest Number of Outages 2025

Regions with the Highest Number of Outages

The AWS us-east-1 region recorded the highest number of outages in 2025. This has been variously attributed to it being the oldest, busiest region, as well as many control plane services being hosted there. E.g. the IAM service, Cloudfront, and Route 53's control planes are in us-east-1.

AWS Regions with Highest Number of Outages 2025

However, for the 20th October 2025 outage, the number of affected services in us-east-1 was high because a core service - DynamoDB - on which other AWS services depend, was affected in us-east-1 due to a race condition in its DNS management system as Amazon explains in their detailed summary. DynamoDB runs in other AWS regions too - so this outage could have theoretically happened in other regions as well. The AWS team rolled out the fix for the race condition to the first region by October 24th and by October 28th they had it across all regions worldwide.

Is Reliability Improving, Staying the Same, or Getting Worse?

A quick look at service-wise outages for some of the top-10 services affected shows variable trends. However, overall for AWS as a whole, the average monthly outage duration increased in Q3 compared to Q1 and Q2, but the number of services affected decreased (except for the Oct 20th outlier).

EC2 Amazon Elastic Compute Cloud Outages 2025
ECS Amazon Elastic Container Service Outages 2025
EKS Amazon Elastic Kubernetes Service Outages 2025
ELB Amazon Elastic Load Balancing Outages 2025
SageMaker Amazon Sagemaker Outages 2025

Outage Dependency Mapping

The AWS outage of Oct 20th was one of their biggest outages in 2025. In distributed systems, failures in one part of the system can result in cascading failures in other parts of the system. The same principle applies to SaaS providers and their dependencies. A lot of key AWS services were affected, and as a result, many SaaS providers. Since many SaaS providers use AWS directly and also other SaaS providers which in turn use AWS, the overall impact multiplied rapidly.

IncidentHub detected 400+ other SaaS outages in the same timespan as the AWS outage of the 20th - out of which 197 SaaS providers acknowledged the cause as AWS. The subsequent analysis takes only those 197 SaaS providers into account, although it's highly likely the blast radius was much larger.

While most SaaS providers either talked about how the AWS outage affected them, or did not mention AWS at all, Cloudflare explicitly mentioned that they were not affected by the AWS outage in any way. This is a good example of being upfront in user communication. Thousands of services depend on Cloudflare and these kind of declarations make it easier to debug issues.

Source: Cloudflare status page.


Note: The numbers are smaller than what somebody would expect the expected blast radius to be because:

  1. There are services that IncidentHub does not monitor yet.
  2. Not all affected services monitored by IncidentHub acknowledged the cause as AWS.

Cascading Outages Timeline

The graph below shows the number of outages in SaaS providers over time who acknowledged the cause as AWS. After AWS resolved the issue, it took time for some services to recover - this is expected due to their validating recovery measures and their reliance on direct or indirect dependencies which were themselves recovering.

AWS Outage Cascading Outages Timeline 2025

Dependency Outages by SaaS Type

The top SaaS categories affected included Cybersecurity tools, Developer tooling, Communication and Collaboration tools, Education Technology Platforms, and CRM, Marketing, and Customer Support.

Notably, Observability and Incident Management tools also appear in the distribution - the very tools that teams rely on to detect outages.

AWS Outage Dependency Mapping by Type 2025

Let's look at some of these categories in more detail.

Observability and Incident Management Providers

Observability data ingestion was affected in DataDog, Dynatrace and New Relic. As a result, monitoring systems in SaaS that use such data for alerting and incident management were affected.

Incident Management software were also affected - some directly, some indirectly. Efforts to move to another AWS region were made in some cases but it took hours. On call notifications were delayed or not sent at all.

SaaSImpactDuration
StatusHubSMS delivery issues22 hours and 49 minutes
PagerDutyDelayed Notifications in US Region6 hours and 24 minutes
OpsgenieAtlassian Cloud Services impacted15 hours and 58 minutes
Incident.ioEscalation delays7 hours and 0 minutes
Better StackDelayed email notifications due to AWS outage1 hour and 58 minutes
BugSnagAWS outage impacting Smartbear ID logins and email notifications48 minutes
Grafana CloudGrafana K6: Some Test Runs May Not Start Due to AWS Outage4 hours and 51 minutes
HoneycombDelays in SLO, Service Maps processing25 hours and 24 minutes
DataDog IntegrationsSeveral Web Integrations affected due to Vendors' outage in US1-east33 hours and 51 minutes
DynatraceAccessibility and login issues with Dynatrace UI8 hours and 29 minutes
New RelicCloud and Synthetics Data Ingest12 hours and 53 minutes
AxiomSystem issues3 hours and 36 minutes
Sumo LogicProblem with Tracing Collection, Authentication, Billing and Account Management, CSE Processing Pipeline and CSE APIs14 hours and 30 minutes

Essentially, during the outage, visibility into your systems was impaired if you were dependent on SRE/Ops tools that use AWS in some way.

Developer tools

Developer tooling took a significant hit, with several platforms reporting outages well in excess of 20 hours - disrupting CI/CD pipelines, code review, and feature flag management simultaneously.

Outages in artifact repositories and container registries affected downstream services like managed Kubernetes platforms that depend on them.

SaaSImpactDuration
GitHubCopilot2 hours 30 minutes
GitLabPackage Registry~1 hour
Quay Container RegistryWrites disabled14 hours and 55 minutes
Docker HubMultiple services affected~4 hours
GitBookPublic content loading22 hours 15 minutes
PostmanIncreased error rates~15 hours
LaunchDarklyElevated Latencies and Delays26 hours and 35 minutes
Bitbucket (and other Atlassian services)Delays and missing notifications23 hours and 42 minutes
CircleCIJob and pipeline failures, UI and API errors15 hours and 5 minutes
CodefreshBuild retries9 hours and 37 minutes
SonarQube CloudEndpoint request failures3 hours and 11 minutes
Cursor IDEService degradation12 hours and 51 minutes

Infrastructure and Hosting Providers

Core infra services like some DNS providers were affected - leading to DNS propagation delays. Other cloud providers who depend on SaaS that use AWS saw impact on some of their services.

Hosting platforms Netlify and Render were down for 13+ hours, affecting websites running on them. DigitalOcean reported that their managed Kubernetes platform was affected due to Docker Hub issues, which were due to AWS, for 19+ hours.

SaaSImpactDuration
HostingerPayment Processing Service Disruption11 hours and 30 minutes
WPEngineChat & Phone Support4 hours and 30 minutes
RailwayDeployments using Dockerhub are currently failing2 hours and 34 minutes
EngineYardSlowness, timeouts, or trouble accessing some parts of platform and services11 hours and 32 minutes
RenderNew database creation, backups, support tools13 hours and 18 minutes
NetlifyUI actions, outgoing emails from Netlify, builds, functions15 hours and 58 minutes
DigitalOceanMultiple Services Disruption19 hours and 20 minutes
Fly.ioDeployment failures1 hour and 11 minutes
Shockbyte[Shockbyte Panel] Email Provider Outage2 hours and 43 minutes

IT Operations and MSP Tools

IT Ops and MSP tools were significantly affected, with several tools managing remote devices and endpoints remaining down for over 13 hours.

SaaSImpactDuration
NinjaOneMultiple third party providers continue to be impacted by cloud service outage - including SMS messaging8 hours and 4 minutes
Commvault Cloud (Metallic)Service Interruption13 hours and 23 minutes
JamfJamf: US-East-1 Disruption11 hours and 55 minutes
SpiceworksCloud Help Desk outage7 minutes
KaseyaDatto RMM - Concord, Vidal - Service Disruption (error messages, agent disconnections)13 hours and 1 minute
Auvik NetworksPartial service degradation1 hour and 6 minutes
HaloEmail processing and scheduled actions3 hours and 28 minutes

Education Technology Platforms

More than 13 education technology platforms reported outages, some lasting well over a day.

SaaSImpactDuration
PowerSchoolMultiple PowerSchool Products - Users are unable to access application27 hours and 33 minutes
Blackboard by AnthologyLearn SaaS - US-EAST-1 Region - Multiple Sites Inaccessible17 hours and 33 minutes
TurnitinTurnitin Service Incident - 20th October 202521 hours and 46 minutes
Imagine LearningMultiple Products - Some Assessments and Activities Not Scoring Correctly7 hours and 25 minutes
RenaissanceRenaissance programs22 hours and 28 minutes
HMHIntermittent Service Degradation23 hours and 37 minutes
RemindIssues with accessing or using Remind web & mobile apps6 hours and 12 minutes
InstructureSome users may encounter errors when accessing Canvas17 hours and 57 minutes
CleverUsers unable to login to Clever35 hours and 52 minutes
Great MindsContent/Assessment interactives, SSO failures, Platform latency19 hours and 43 minutes
Savvas Learning CompanySavvas Realize Performance Issues11 hours and 55 minutes
PearsonPearson Online Classroom: Degraded Performance7 hours and 10 minutes
Ellucian CloudServices degraded14 hours and 38 minutes

Payments

Outages in payments providers mean lost revenue, and there were plenty of them.

SaaSImpactDuration
Kraken Digital Asset ExchangeUS Dollar (USD) Deposits via Plaid Unavailable3 hours and 44 minutes
BluefinPhone System Outage8 hours and 15 minutes
Coinbase CommerceSite Performance - Login, Trading, Transactions17 hours and 25 minutes
Coinbase PrimeSite Performance - Login, Trading, Transactions17 hours and 25 minutes
PaddleIssue affecting Checkouts and Order Processing4 hours and 53 minutes
TebexPayment Decline Errors4 hours and 28 minutes

Although there were services which acknowledged that they managed to failover to another region, the lag in recovery across many SaaS vendors suggests that region-level failover is not straightforward in practice.

Regional Outage with Global Impact

Was this a regional AWS failure that also took down global services? Yes, in some cases, by extension. The AWS failure itself remained regional.

This happened due to second and third, and more, order effects:

System Issues

The scope of impact is increasing as more IAM credentials expire and are unable to refresh, leading to additional service disruptions. Additionally authenticating to our EU console isn't working, as our SAML partner is experiencing issues.

From the Axiom status page.


Dashboard access and support request submission issues in multiple regions

We are currently investigating an issue with dashboards loading and customer support in multiple regions.

From the Qualtrics status page.


We are affected by AWS outage in us-east-1

We are currently investigating intermittent failures when creating new services in us-east-1 region. If you already have a running service, connections should continue to work as expected.

https://console.cloud.timescale.com/ is operational, although we’ve received some reports of certain assets not loading properly.

This issue appears to be related to an ongoing AWS service outage, which is also being reported on the AWS Service Health Dashboard.

At this time, we are not aware of any further impact. We’ll provide an update as soon as more information becomes available. 2 Affected Services:

  • Regions / US East (N. Virginia) / us-east-1
  • Global Services / Console & APIs

From the TigerCloud status page.


Outage Chains

It's not straightforward to map a complete dependency graph of all the services that were affected by this outage, but we did manage to uncover some interesting aspects.

Here are 3 chains we uncovered, each showing how a vendor's outage was not triggered by AWS directly, but by another vendor's outage that depended on AWS - illustrated by incidents at StatusHub and Railway.

AWS -> Twilio -> StatusHub

Twilio We are currently investigating elevated latency and timeout errors for Twilio Rest API, impacting the Multiple Twilio services. Our engineering team is actively working on the issue, and we will provide another update in 60 minutes or as soon as more information becomes available.

StatusHub Due to major outage affecting our SMS partner, SMS delivery is currently affected and messages may not be delivered.

AWS -> Docker Hub -> Railway

Docker Hub Docker is continuing to experience service disruption as a result of issues with an upstream service provider. We are actively working to remediate where possible.

Railway Builds and deploys are taking longer than usual as Dockerhub recovers after their recent outage.


Upstream dependencies cannot be avoided. Even with a multi-cloud approach - which itself is not straightforward and may not be feasible architecturally or financially for everyone - key dependencies can remain tied to single providers.

Surviving AWS Outages

Reconsider if us-east-1 is necessary for your workloads

The us-east-1 region is the oldest and busiest in AWS, and it saw the highest number of outages in 2025. Even though the biggest one in 2025 could have occurred in any region, us-east-1 still leads in the total number of outages. You have no control over the control plane and global services that run there, so you can avoid us-east-1 if your workloads can run in other regions.

Audit your own dependency chain

First level dependencies for AWS are easy to find. Second and further level ones are not so easy. However, listing down all your dependencies is a great first step.

Monitor third-party dependencies, not just AWS directly

Tools like status page aggregators can monitor third-party dependencies seamlessly, without the need to manually check each and every status page or account for any differences in their structure or notifications. Monitoring the specific components you use (e.g. EC2 in AWS) is necessary to keep your alerts relevant.

Accept that multi-cloud is not a silver bullet

Multi-cloud sounds like an attractive proposition until you realize that:

  • It may not be feasible for your organization either architecturally or financially.
  • Your other SaaS dependencies may not be multi-cloud and thus won't be as resilient.

The first step is knowing when your dependencies are down.

Conclusion

2025 showed us that an infrastructure provider outage can cascade across hundreds of dependent services and that was true for AWS too. 38 outages were recorded across 200+ products and 39 regions. Most AWS outages in 2025 remained confined to a single region. The October 20th outage led to many downstream SaaS providers experiencing outages too.

Layers of downstream dependencies amplify a single provider's outage. Monitoring your SaaS dependencies is more crucial than ever to stay ahead of the impact such outages can have on your business.

FAQ

How many AWS outages were there in 2025?

In 2025, IncidentHub detected 38 outages across AWS services and regions.

What caused the AWS outage on October 20th 2025?

The AWS outage on October 20th 2025 was attributed by AWS to a failure in DynamoDB's automatic DNS management system.

What is the highest MTTR for AWS outages in 2025?

The highest MTTR for AWS outages in 2025 was 882.50 minutes.

Which AWS services were affected by the AWS outage on October 20th 2025?

The AWS outage on October 20th 2025 affected more than 140 AWS services.

Is us-east-1 really less reliable than other AWS regions?

us-east-1 recorded the most outages in 2025. This may be influenced by its age and service density, even though the biggest one in 2025 could have occurred in any region. You have no control over the control plane and global services that run there, you can avoid us-east-1 if your workloads can run in other regions.

How long did the AWS October 2025 outage last?

The AWS October 2025 outage lasted around 15 hours.

Does multi-cloud actually protect against AWS outages?

Not necessarily. Your other SaaS dependencies may not be multi-cloud and thus won't be as resilient.

How can I monitor AWS outages and their impact on my SaaS tools?

A status page aggregator like IncidentHub can help you monitor AWS outages and their impact on your SaaS tools seamlessly, without the need to manually check each and every status page. You only need to monitor the specific components (in AWS and other SaaS) and IncidentHub can take care of that too.

Sign up for an IncidentHub account and stay on top of all your SaaS dependencies


Photo by Scott Rodgerson on Unsplash

IncidentHub is not affiliated with any of the services and vendors mentioned in this article. All logos and company names are trademarks or registered trademarks of their respective holders. Amazon Web Services and AWS are trademarks of Amazon.com, Inc. This report is independent and not affiliated with or endorsed by Amazon.

This article was first published on the IncidentHub blog.