The Definitive AWS Outage Report 2025: Reliability Analytics and Cascade Impact

February 27, 2026 · 23 min read

Founder, IncidentHub

Introduction

Amazon Web Services remains one of the most popular cloud providers, with 200+ services in 39 regions across the world. Like all providers, they have their share of outages.

In 2025, IncidentHub detected 38 AWS outages, of which the one on October 20th had the most widespread impact affecting hundreds of SaaS providers simultaneously. Payments were disrupted, students lost access to classrooms, developer tooling degraded, and some IT teams experienced alerting gaps.

In this post we look at the reliability of AWS in 2025 based on their own publicly available status page data aggregated by IncidentHub, with a deeper analysis of the cascading impact of the October 20th outage.

Introduction
Methodology
Frequency of Outages
Duration of Outages
Outages by Service
Is Reliability Improving, Staying the Same, or Getting Worse?
Outage Dependency Mapping
- Cascading Outages Timeline
- Dependency Outages by SaaS Type
Regional Outage with Global Impact
- Outage Chains
Surviving AWS Outages
Conclusion
FAQ

Methodology

We collected and analyzed the uptime of all AWS services and regions for a period of 1 year between 1st January 2025 and 31st December 2025. In this period, IncidentHub - status page aggregator - detected 38 outages across AWS products and regions. To detect cascading outages, we filtered out incident reports from SaaS vendors who acknowledged the cause as AWS on their status pages.

For this report, an "outage" is defined as an incident listed on AWS's status page that impacts or disrupts at least one AWS service. Each AWS status page incident was counted as a single outage, regardless of the number of services or regions listed.

We analyzed only 2025 incident data here.

A brief note on how IncidentHub collects outage data

IncidentHub - a status page aggregator - monitors public status page periodically across hundreds of SaaS and Cloud vendors. It detects outages, maintenance events, and changes in services and regions automatically. The end result is an aggregated dashboard of vendors - a single status page for all third-party service status pages.

We wanted to keep the analysis relevant to practitioners - actual users who rely on AWS. We focused on these aspects:

Frequency of outages
Duration of outages
Outages by Service and Region
Is Reliability Improving, Staying the Same, or Getting Worse?
Outage Dependency Mapping - Which other SaaS providers were affected?

Frequency of Outages

AWS had at least one outage every month except for the last two in 2025. For a platform as large as AWS, this is not surprising. However, the real measure of reliability is the duration and the impact of the outages.

Duration of Outages

The shortest outage lasted around 14 minutes, whereas the longest was around 15 hours.

Average monthly MTTR was higher in Q3 than in Q1 and Q2. The October 20th outage being an outlier drove up the MTTR to be almost 12x that of January's in Q4.

Month	MTTR (minutes)
Jan	70.64
Feb	116.26
Mar	36.50
Apr	76.00
May	66.26
Jun	65.27
Jul	133.18
Aug	63.96
Sep	122.66
Oct	882.50

Outages by Service

There is usually no correlation between the number of affected services and the duration of the outage in most of the outages, except for the one on Feb 13th and on Oct 20th.

Services Affected per Outage

Services Which had the Highest Number of Outages

AWS Services with Highest Number of Outages 2025

Regions with the Highest Number of Outages

The AWS us-east-1 region recorded the highest number of outages in 2025. This has been variously attributed to it being the oldest, busiest region, as well as many control plane services being hosted there. E.g. the IAM service, Cloudfront, and Route 53's control planes are in us-east-1.

AWS Regions with Highest Number of Outages 2025

However, for the 20th October 2025 outage, the number of affected services in us-east-1 was high because a core service - DynamoDB - on which other AWS services depend, was affected in us-east-1 due to a race condition in its DNS management system as Amazon explains in their detailed summary. DynamoDB runs in other AWS regions too - so this outage could have theoretically happened in other regions as well. The AWS team rolled out the fix for the race condition to the first region by October 24th and by October 28th they had it across all regions worldwide.

Is Reliability Improving, Staying the Same, or Getting Worse?

A quick look at service-wise outages for some of the top-10 services affected shows variable trends. However, overall for AWS as a whole, the average monthly outage duration increased in Q3 compared to Q1 and Q2, but the number of services affected decreased (except for the Oct 20th outlier).

EC2

Amazon Elastic Compute Cloud Outages 2025

ECS

Amazon Elastic Container Service Outages 2025

EKS

Amazon Elastic Kubernetes Service Outages 2025

ELB

Amazon Elastic Load Balancing Outages 2025

SageMaker Amazon Sagemaker Outages 2025

Outage Dependency Mapping

The AWS outage of Oct 20th was one of their biggest outages in 2025. In distributed systems, failures in one part of the system can result in cascading failures in other parts of the system. The same principle applies to SaaS providers and their dependencies. A lot of key AWS services were affected, and as a result, many SaaS providers. Since many SaaS providers use AWS directly and also other SaaS providers which in turn use AWS, the overall impact multiplied rapidly.

IncidentHub detected 400+ other SaaS outages in the same timespan as the AWS outage of the 20th - out of which 197 SaaS providers acknowledged the cause as AWS. The subsequent analysis takes only those 197 SaaS providers into account, although it's highly likely the blast radius was much larger.

While most SaaS providers either talked about how the AWS outage affected them, or did not mention AWS at all, Cloudflare explicitly mentioned that they were not affected by the AWS outage in any way. This is a good example of being upfront in user communication. Thousands of services depend on Cloudflare and these kind of declarations make it easier to debug issues.

Source: Cloudflare status page.

Note: The numbers are smaller than what somebody would expect the expected blast radius to be because:

There are services that IncidentHub does not monitor yet.
Not all affected services monitored by IncidentHub acknowledged the cause as AWS.

Cascading Outages Timeline

The graph below shows the number of outages in SaaS providers over time who acknowledged the cause as AWS. After AWS resolved the issue, it took time for some services to recover - this is expected due to their validating recovery measures and their reliance on direct or indirect dependencies which were themselves recovering.

Dependency Outages by SaaS Type

The top SaaS categories affected included Cybersecurity tools, Developer tooling, Communication and Collaboration tools, Education Technology Platforms, and CRM, Marketing, and Customer Support.

Notably, Observability and Incident Management tools also appear in the distribution - the very tools that teams rely on to detect outages.

AWS Outage Dependency Mapping by Type 2025

Let's look at some of these categories in more detail.

Observability and Incident Management Providers

Observability data ingestion was affected in DataDog, Dynatrace and New Relic. As a result, monitoring systems in SaaS that use such data for alerting and incident management were affected.

Incident Management software were also affected - some directly, some indirectly. Efforts to move to another AWS region were made in some cases but it took hours. On call notifications were delayed or not sent at all.

SaaS	Impact	Duration
StatusHub	SMS delivery issues	22 hours and 49 minutes
PagerDuty	Delayed Notifications in US Region	6 hours and 24 minutes
Opsgenie	Atlassian Cloud Services impacted	15 hours and 58 minutes
Incident.io	Escalation delays	7 hours and 0 minutes
Better Stack	Delayed email notifications due to AWS outage	1 hour and 58 minutes
BugSnag	AWS outage impacting Smartbear ID logins and email notifications	48 minutes
Grafana Cloud	Grafana K6: Some Test Runs May Not Start Due to AWS Outage	4 hours and 51 minutes
Honeycomb	Delays in SLO, Service Maps processing	25 hours and 24 minutes
DataDog Integrations	Several Web Integrations affected due to Vendors' outage in US1-east	33 hours and 51 minutes
Dynatrace	Accessibility and login issues with Dynatrace UI	8 hours and 29 minutes
New Relic	Cloud and Synthetics Data Ingest	12 hours and 53 minutes
Axiom	System issues	3 hours and 36 minutes
Sumo Logic	Problem with Tracing Collection, Authentication, Billing and Account Management, CSE Processing Pipeline and CSE APIs	14 hours and 30 minutes

Essentially, during the outage, visibility into your systems was impaired if you were dependent on SRE/Ops tools that use AWS in some way.

Developer tools

Developer tooling took a significant hit, with several platforms reporting outages well in excess of 20 hours - disrupting CI/CD pipelines, code review, and feature flag management simultaneously.

Outages in artifact repositories and container registries affected downstream services like managed Kubernetes platforms that depend on them.

SaaS	Impact	Duration
GitHub	Copilot	2 hours 30 minutes
GitLab	Package Registry	~1 hour
Quay Container Registry	Writes disabled	14 hours and 55 minutes
Docker Hub	Multiple services affected	~4 hours
GitBook	Public content loading	22 hours 15 minutes
Postman	Increased error rates	~15 hours
LaunchDarkly	Elevated Latencies and Delays	26 hours and 35 minutes
Bitbucket (and other Atlassian services)	Delays and missing notifications	23 hours and 42 minutes
CircleCI	Job and pipeline failures, UI and API errors	15 hours and 5 minutes
Codefresh	Build retries	9 hours and 37 minutes
SonarQube Cloud	Endpoint request failures	3 hours and 11 minutes
Cursor IDE	Service degradation	12 hours and 51 minutes

Infrastructure and Hosting Providers

Core infra services like some DNS providers were affected - leading to DNS propagation delays. Other cloud providers who depend on SaaS that use AWS saw impact on some of their services.

Hosting platforms Netlify and Render were down for 13+ hours, affecting websites running on them. DigitalOcean reported that their managed Kubernetes platform was affected due to Docker Hub issues, which were due to AWS, for 19+ hours.

SaaS	Impact	Duration
Hostinger	Payment Processing Service Disruption	11 hours and 30 minutes
WPEngine	Chat & Phone Support	4 hours and 30 minutes
Railway	Deployments using Dockerhub are currently failing	2 hours and 34 minutes
EngineYard	Slowness, timeouts, or trouble accessing some parts of platform and services	11 hours and 32 minutes
Render	New database creation, backups, support tools	13 hours and 18 minutes
Netlify	UI actions, outgoing emails from Netlify, builds, functions	15 hours and 58 minutes
DigitalOcean	Multiple Services Disruption	19 hours and 20 minutes
Fly.io	Deployment failures	1 hour and 11 minutes
Shockbyte	[Shockbyte Panel] Email Provider Outage	2 hours and 43 minutes

IT Operations and MSP Tools

IT Ops and MSP tools were significantly affected, with several tools managing remote devices and endpoints remaining down for over 13 hours.

SaaS	Impact	Duration
NinjaOne	Multiple third party providers continue to be impacted by cloud service outage - including SMS messaging	8 hours and 4 minutes
Commvault Cloud (Metallic)	Service Interruption	13 hours and 23 minutes
Jamf	Jamf: US-East-1 Disruption	11 hours and 55 minutes
Spiceworks	Cloud Help Desk outage	7 minutes
Kaseya	Datto RMM - Concord, Vidal - Service Disruption (error messages, agent disconnections)	13 hours and 1 minute
Auvik Networks	Partial service degradation	1 hour and 6 minutes
Halo	Email processing and scheduled actions	3 hours and 28 minutes

Education Technology Platforms

More than 13 education technology platforms reported outages, some lasting well over a day.

SaaS	Impact	Duration
PowerSchool	Multiple PowerSchool Products - Users are unable to access application	27 hours and 33 minutes
Blackboard by Anthology	Learn SaaS - US-EAST-1 Region - Multiple Sites Inaccessible	17 hours and 33 minutes
Turnitin	Turnitin Service Incident - 20th October 2025	21 hours and 46 minutes
Imagine Learning	Multiple Products - Some Assessments and Activities Not Scoring Correctly	7 hours and 25 minutes
Renaissance	Renaissance programs	22 hours and 28 minutes
HMH	Intermittent Service Degradation	23 hours and 37 minutes
Remind	Issues with accessing or using Remind web & mobile apps	6 hours and 12 minutes
Instructure	Some users may encounter errors when accessing Canvas	17 hours and 57 minutes
Clever	Users unable to login to Clever	35 hours and 52 minutes
Great Minds	Content/Assessment interactives, SSO failures, Platform latency	19 hours and 43 minutes
Savvas Learning Company	Savvas Realize Performance Issues	11 hours and 55 minutes
Pearson	Pearson Online Classroom: Degraded Performance	7 hours and 10 minutes
Ellucian Cloud	Services degraded	14 hours and 38 minutes

Payments

Outages in payments providers mean lost revenue, and there were plenty of them.

SaaS	Impact	Duration
Kraken Digital Asset Exchange	US Dollar (USD) Deposits via Plaid Unavailable	3 hours and 44 minutes
Bluefin	Phone System Outage	8 hours and 15 minutes
Coinbase Commerce	Site Performance - Login, Trading, Transactions	17 hours and 25 minutes
Coinbase Prime	Site Performance - Login, Trading, Transactions	17 hours and 25 minutes
Paddle	Issue affecting Checkouts and Order Processing	4 hours and 53 minutes
Tebex	Payment Decline Errors	4 hours and 28 minutes

Although there were services which acknowledged that they managed to failover to another region, the lag in recovery across many SaaS vendors suggests that region-level failover is not straightforward in practice.

Regional Outage with Global Impact

Was this a regional AWS failure that also took down global services? Yes, in some cases, by extension. The AWS failure itself remained regional.

This happened due to second and third, and more, order effects:

System Issues

The scope of impact is increasing as more IAM credentials expire and are unable to refresh, leading to additional service disruptions. Additionally authenticating to our EU console isn't working, as our SAML partner is experiencing issues.

From the Axiom status page.

Dashboard access and support request submission issues in multiple regions

We are currently investigating an issue with dashboards loading and customer support in multiple regions.

From the Qualtrics status page.

We are affected by AWS outage in us-east-1

We are currently investigating intermittent failures when creating new services in us-east-1 region. If you already have a running service, connections should continue to work as expected.

https://console.cloud.timescale.com/ is operational, although we’ve received some reports of certain assets not loading properly.

This issue appears to be related to an ongoing AWS service outage, which is also being reported on the AWS Service Health Dashboard.

At this time, we are not aware of any further impact. We’ll provide an update as soon as more information becomes available. 2 Affected Services:

Regions / US East (N. Virginia) / us-east-1
Global Services / Console & APIs

From the TigerCloud status page.

Outage Chains

It's not straightforward to map a complete dependency graph of all the services that were affected by this outage, but we did manage to uncover some interesting aspects.

Here are 3 chains we uncovered, each showing how a vendor's outage was not triggered by AWS directly, but by another vendor's outage that depended on AWS - illustrated by incidents at StatusHub and Railway.

AWS -> Twilio -> StatusHub

Twilio We are currently investigating elevated latency and timeout errors for Twilio Rest API, impacting the Multiple Twilio services. Our engineering team is actively working on the issue, and we will provide another update in 60 minutes or as soon as more information becomes available.

StatusHub Due to major outage affecting our SMS partner, SMS delivery is currently affected and messages may not be delivered.

AWS -> Docker Hub -> Railway

Docker Hub Docker is continuing to experience service disruption as a result of issues with an upstream service provider. We are actively working to remediate where possible.

Railway Builds and deploys are taking longer than usual as Dockerhub recovers after their recent outage.

Upstream dependencies cannot be avoided. Even with a multi-cloud approach - which itself is not straightforward and may not be feasible architecturally or financially for everyone - key dependencies can remain tied to single providers.

Surviving AWS Outages

Reconsider if us-east-1 is necessary for your workloads

The us-east-1 region is the oldest and busiest in AWS, and it saw the highest number of outages in 2025. Even though the biggest one in 2025 could have occurred in any region, us-east-1 still leads in the total number of outages. You have no control over the control plane and global services that run there, so you can avoid us-east-1 if your workloads can run in other regions.

Audit your own dependency chain

First level dependencies for AWS are easy to find. Second and further level ones are not so easy. However, listing down all your dependencies is a great first step.

Monitor third-party dependencies, not just AWS directly

Tools like status page aggregators can monitor third-party dependencies seamlessly, without the need to manually check each and every status page or account for any differences in their structure or notifications. Monitoring the specific components you use (e.g. EC2 in AWS) is necessary to keep your alerts relevant.

Accept that multi-cloud is not a silver bullet

Multi-cloud sounds like an attractive proposition until you realize that:

It may not be feasible for your organization either architecturally or financially.
Your other SaaS dependencies may not be multi-cloud and thus won't be as resilient.

The first step is knowing when your dependencies are down.

Conclusion

2025 showed us that an infrastructure provider outage can cascade across hundreds of dependent services and that was true for AWS too. 38 outages were recorded across 200+ products and 39 regions. Most AWS outages in 2025 remained confined to a single region. The October 20th outage led to many downstream SaaS providers experiencing outages too.

Layers of downstream dependencies amplify a single provider's outage. Monitoring your SaaS dependencies is more crucial than ever to stay ahead of the impact such outages can have on your business.

FAQ

How many AWS outages were there in 2025?

In 2025, IncidentHub detected 38 outages across AWS services and regions.

What caused the AWS outage on October 20th 2025?

The AWS outage on October 20th 2025 was attributed by AWS to a failure in DynamoDB's automatic DNS management system.

What is the highest MTTR for AWS outages in 2025?

The highest MTTR for AWS outages in 2025 was 882.50 minutes.

Which AWS services were affected by the AWS outage on October 20th 2025?

The AWS outage on October 20th 2025 affected more than 140 AWS services.

Is us-east-1 really less reliable than other AWS regions?

us-east-1 recorded the most outages in 2025. This may be influenced by its age and service density, even though the biggest one in 2025 could have occurred in any region. You have no control over the control plane and global services that run there, you can avoid us-east-1 if your workloads can run in other regions.

How long did the AWS October 2025 outage last?

The AWS October 2025 outage lasted around 15 hours.

Does multi-cloud actually protect against AWS outages?

Not necessarily. Your other SaaS dependencies may not be multi-cloud and thus won't be as resilient.

How can I monitor AWS outages and their impact on my SaaS tools?

A status page aggregator like IncidentHub can help you monitor AWS outages and their impact on your SaaS tools seamlessly, without the need to manually check each and every status page. You only need to monitor the specific components (in AWS and other SaaS) and IncidentHub can take care of that too.

Photo by Scott Rodgerson on Unsplash

IncidentHub is not affiliated with any of the services and vendors mentioned in this article. All logos and company names are trademarks or registered trademarks of their respective holders. Amazon Web Services and AWS are trademarks of Amazon.com, Inc. This report is independent and not affiliated with or endorsed by Amazon.

This article was first published on the IncidentHub blog.

Introduction​

Methodology​

Frequency of Outages​

Duration of Outages​

Outages by Service​

Services Affected per Outage​

Services Which had the Highest Number of Outages​

Regions with the Highest Number of Outages​

Is Reliability Improving, Staying the Same, or Getting Worse?​

Outage Dependency Mapping​

Cascading Outages Timeline​

Dependency Outages by SaaS Type​

Observability and Incident Management Providers​

Developer tools​

Infrastructure and Hosting Providers​

IT Operations and MSP Tools​

Education Technology Platforms​

Payments​

Regional Outage with Global Impact​

Outage Chains​

Surviving AWS Outages​

Reconsider if us-east-1 is necessary for your workloads​

Audit your own dependency chain​

Monitor third-party dependencies, not just AWS directly​

Accept that multi-cloud is not a silver bullet​

Conclusion​

FAQ​