<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>The IncidentHub Blog</title>
        <link>https://blog.incidenthub.cloud</link>
        <description>The IncidentHub Blog Blog</description>
        <lastBuildDate>Wed, 18 Mar 2026 00:00:00 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <item>
            <title><![CDATA[Product Update - March 2026]]></title>
            <link>https://blog.incidenthub.cloud/product-update-march-2026</link>
            <guid>https://blog.incidenthub.cloud/product-update-march-2026</guid>
            <pubDate>Wed, 18 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn about the new features and updates in IncidentHub for March 2026.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="introduction">Introduction<a href="https://blog.incidenthub.cloud/product-update-march-2026#introduction" class="hash-link" aria-label="Direct link to Introduction" title="Direct link to Introduction" translate="no">​</a></h2>
<p>IncidentHub's latest product updates focus on improving the public status page, adding integrations with ticketing systems, private status page ingestion, and making the notifications more useful to the end user. Some of these improvements are driven by user feedback.</p>
<p>Feedback is what makes the product better, and I am personally grateful to all our customers who have shared their feedback with us.</p>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/incidenthub-product-update-march-2026.webp" alt="Microsoft Teams Integration for IncidentHub Alerts">
<hr>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-march-2026#introduction" class="">Introduction</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-march-2026#maintenance-widget-on-the-status-page" class="">Maintenance Widget on the Status Page</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-march-2026#private-status-page-ingestion" class="">Private Status Page Ingestion</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-march-2026#ticketing-systems-integration---bolddesk-and-freshdesk" class="">Ticketing Systems Integration - BoldDesk and Freshdesk</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-march-2026#notification-improvements" class="">Notification Improvements</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-march-2026#making-it-easier-to-debug-webhook-errors" class="">Making it Easier to Debug Webhook Errors</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-march-2026#more-services" class="">More Services</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="maintenance-widget-on-the-status-page">Maintenance Widget on the Status Page<a href="https://blog.incidenthub.cloud/product-update-march-2026#maintenance-widget-on-the-status-page" class="hash-link" aria-label="Direct link to Maintenance Widget on the Status Page" title="Direct link to Maintenance Widget on the Status Page" translate="no">​</a></h2>
<p>Your public status page now has a maintenance widget that shows your upcoming maintenance. It shows up at the top and you can see at a glance how many such events are scheduled. Like all other user facing dashboards and notifications in IncidentHub, the events are always filtered based on your component filters.</p>
<p>By default it's collapsed. You can click on it to expand it and see the details of the maintenance events.</p>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/public-status-page-mwidget.webp" alt="IncidentHub Maintenance Widget">
<p>Expanded view:</p>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/public-status-page-mwidget-expanded.webp" alt="IncidentHub Maintenance Widget">
<br>
<br>
<p>Note that the top summary bar shows the count of outages and ongoing maintenances, whereas the new collapsible widget shows your upcoming maintenances. <a class="" href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance">Planning for upcoming maintenance</a> in your cloud providers is crucial for your business continuity planning.</p>
<div style="background-color:var(--ifm-color-emphasis-200);padding-top:15px;padding-bottom:15px;padding-left:20px;padding-right:20px;margin-top:10px;border-radius:10px;border-left:3px solid #60a5fa"><b style="text-align:justify"><p>IncidentHub's Component Filtering Philosophy</p></b><span style="text-align:justify"><p>You configure component filters once - and that reflects across all your dashboards, public status pages, Slack/MSTeams/Email notifications, ticketing systems, and historical trends graphs.</p><p>This prevents unnecessary alerts and outage indicators in your dashboards and notification channels.</p></span></div>
<br>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="private-status-page-ingestion">Private Status Page Ingestion<a href="https://blog.incidenthub.cloud/product-update-march-2026#private-status-page-ingestion" class="hash-link" aria-label="Direct link to Private Status Page Ingestion" title="Direct link to Private Status Page Ingestion" translate="no">​</a></h2>
<p>You can now ingest status data from private status pages. Your cloud provider might give you an SSO-protected URL to access your private status page since the services are specific to your account.
This is a powerful feature that allows you to track the status of your internal services and applications.</p>
<img style="max-width:100%;display:block;margin:0 auto;box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/private-status-page-ingestion-setup.webp" alt="IncidentHub Private Status Page Ingestion">
<br>
<p>As of this writing IncidentHub supports private data ingestion from Infor CloudSuite. We will be adding support for more services in the future.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="ticketing-systems-integration---bolddesk-and-freshdesk">Ticketing Systems Integration - BoldDesk and Freshdesk<a href="https://blog.incidenthub.cloud/product-update-march-2026#ticketing-systems-integration---bolddesk-and-freshdesk" class="hash-link" aria-label="Direct link to Ticketing Systems Integration - BoldDesk and Freshdesk" title="Direct link to Ticketing Systems Integration - BoldDesk and Freshdesk" translate="no">​</a></h2>
<p>Adding to our basket of ticketing system integrations which included <a href="https://docs.incidenthub.cloud/incidenthub-documentation/channels/integration-for-zendesk" target="_blank" rel="noopener noreferrer" class="">Zendesk</a>, we now support <a href="https://docs.incidenthub.cloud/incidenthub-documentation/channels/integration-for-bolddesk" target="_blank" rel="noopener noreferrer" class="">BoldDesk</a> and <a href="https://docs.incidenthub.cloud/incidenthub-documentation/channels/integration-for-freshdesk" target="_blank" rel="noopener noreferrer" class="">Freshdesk</a> too.</p>
<p>Ticketing systems notifications for third-party outages help your customer facing teams to stay on stop of third-party outages and provide timely updates to your customers without surrendering to noise.</p>
<p>The key difference between ticketing systems and other notification channels is that IncidentHub sends only the trigger (start) event for any outage to ticketing system irrespective of your notification settings. It does not send notifications for ongoing or future maintenances, or for intermediate outage updates including resolution. This is to reduce noise for your support teams.</p>
<img style="max-width:100%;display:block;margin:0 auto;box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/incidenthub-freshdesk-setup.webp" alt="IncidentHub Freshdesk Integration">
<br>
<img style="max-width:100%;display:block;margin:0 auto;box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/incidenthub-bolddesk-setup.webp" alt="IncidentHub BoldDesk Integration">
<br>
<p>These are part of our upcoming Premium tier which is planned to be publicly available soon but they are already available to some of our customers on an invite-only basis.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="notification-improvements">Notification Improvements<a href="https://blog.incidenthub.cloud/product-update-march-2026#notification-improvements" class="hash-link" aria-label="Direct link to Notification Improvements" title="Direct link to Notification Improvements" translate="no">​</a></h2>
<p>Choosing specific components in your monitored services is the way to avoid irrelevant alerts. Outages and maintenances are filtered against your component filters and only the matched notifications are sent.
If the outage/maintenance has other components also in addition to your selected ones, they will show up too in the notification. This can sometimes be confusing for the end user.</p>
<p>We have tweaked the notifications experience slightly so that your chosen components bubble up to the top of the alert.</p>
<p>E.g. if you choose Cloudflare Worker and Pages in Cloudflare, and an outage occurs which affects Workers, Pages, and Workers Builds, you will see Workers and Pages at the top of the alert.</p>
<img style="max-width:100%;display:block;margin:0 auto;box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/notifications-bubble-up.webp" alt="IncidentHub Notification Improvements">
<br>
<p>This is automatically done for all notification channels.</p>
<p>We are always eager to improving this further. Please let us know if you have any feedback on this using our support email address.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="making-it-easier-to-debug-webhook-errors">Making it Easier to Debug Webhook Errors<a href="https://blog.incidenthub.cloud/product-update-march-2026#making-it-easier-to-debug-webhook-errors" class="hash-link" aria-label="Direct link to Making it Easier to Debug Webhook Errors" title="Direct link to Making it Easier to Debug Webhook Errors" translate="no">​</a></h2>
<p>Currently, if you have a <a class="" href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook">webhook integration with IncidentHub</a>, and there is an error in the webhhook, IncidentHub sends out a notification email to the account owner.
While this is helpful as a notification, it does not help when you are debugging the integration. You can use the Test button to send a test alert - but if it fails, until now, there was no clear message on what went wrong.</p>
<p>Based on user feedback, we have added the ability to see the actual error. This happens in two cases:</p>
<ul>
<li class="">When you click "Send a test message" it will show you the error message in the popup.</li>
</ul>
<img style="max-width:100%;display:block;margin:0 auto;box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/webhook-test-error.webp" alt="IncidentHub Webhook Test Error Details">
<br>
<ul>
<li class="">When the webhook has failed previously in the background while trying to send notifications, there will be a red indicator with a "View Details" next to the webhook.</li>
</ul>
<img style="max-width:100%;display:block;margin:0 auto;box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/webhook-error.webp" alt="IncidentHub Webhook Error Indicator">
<br>
<p>It will show the last seen failure message and the log time.</p>
<img style="max-width:100%;display:block;margin:0 auto;box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/webhook-error-popup.webp" alt="IncidentHub Webhook Error Details">
<br>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="more-services">More Services<a href="https://blog.incidenthub.cloud/product-update-march-2026#more-services" class="hash-link" aria-label="Direct link to More Services" title="Direct link to More Services" translate="no">​</a></h2>
<p>We have added support for more services including telecom providers, electricity/utility companies, crypto exchanges, and MSP tools. This adds to our constantly growing list of supported services.</p>
<hr>
<p>IncidentHub is not affiliated with any of the services and vendors mentioned in this article. All the logos and company names are trademarks or registered trademarks of their respective holders</p>
<p>This article was first published on the <a href="https://blog.incidenthub.cloud/product-update-march-2026" target="_blank" rel="noopener noreferrer" class="">IncidentHub blog</a>.</p>]]></content:encoded>
            <category>Product Updates</category>
        </item>
        <item>
            <title><![CDATA[The Definitive AWS Outage Report 2025: Reliability Analytics and Cascade Impact]]></title>
            <link>https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability</link>
            <guid>https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability</guid>
            <pubDate>Fri, 27 Feb 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Analysis of AWS outages in 2025, including the October 20th incident that cascaded across hundreds of SaaS providers simultaneously. Frequency, duration, service-wise analysis and cascade impact.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="introduction">Introduction<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#introduction" class="hash-link" aria-label="Direct link to Introduction" title="Direct link to Introduction" translate="no">​</a></h2>
<p>Amazon Web Services remains one of the most popular cloud providers, with 200+ services in 39 regions across the world. Like all providers, they have their share of outages.</p>
<p>In 2025, IncidentHub detected 38 AWS outages, of which the one on October 20th had the most widespread impact affecting hundreds of SaaS providers simultaneously. Payments were disrupted, students lost access to classrooms, developer tooling degraded, and some IT teams experienced alerting gaps.</p>
<p>In this post we look at the reliability of AWS in 2025 based on their own publicly available status page data aggregated by IncidentHub, with a deeper analysis of the cascading impact of the October 20th outage.</p>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/amazon-web-services-reliability-2025.webp" alt="Amazon Web Services Reliability in 2025">
<hr>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#introduction" class="">Introduction</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#methodology" class="">Methodology</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#frequency-of-outages" class="">Frequency of Outages</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#duration-of-outages" class="">Duration of Outages</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#outages-by-service" class="">Outages by Service</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#services-affected-per-outage" class="">Services Affected per Outage</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#services-which-had-the-highest-number-of-outages" class="">Services Which had the Highest Number of Outages</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#regions-with-the-highest-number-of-outages" class="">Regions with the Highest Number of Outages</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#is-reliability-improving-staying-the-same-or-getting-worse" class="">Is Reliability Improving, Staying the Same, or Getting Worse?</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#outage-dependency-mapping" class="">Outage Dependency Mapping</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#cascading-outages-timeline" class="">Cascading Outages Timeline</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#dependency-outages-by-saas-type" class="">Dependency Outages by SaaS Type</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#observability-and-incident-management-providers" class="">Observability and Incident Management Providers</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#developer-tools" class="">Developer tools</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#infrastructure-and-hosting-providers" class="">Infrastructure and Hosting Providers</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#it-operations-and-msp-tools" class="">IT Operations and MSP Tools</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#education-technology-platforms" class="">Education Technology Platforms</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#payments" class="">Payments</a></li>
</ul>
</li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#regional-outage-with-global-impact" class="">Regional Outage with Global Impact</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#outage-chains" class="">Outage Chains</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#surviving-aws-outages" class="">Surviving AWS Outages</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#reconsider-if-us-east-1-is-necessary-for-your-workloads" class="">Reconsider if us-east-1 is necessary for your workloads</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#audit-your-own-dependency-chain" class="">Audit your own dependency chain</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#monitor-third-party-dependencies-not-just-aws-directly" class="">Monitor third-party dependencies, not just AWS directly</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#accept-that-multi-cloud-is-not-a-silver-bullet" class="">Accept that multi-cloud is not a silver bullet</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#conclusion" class="">Conclusion</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#faq" class="">FAQ</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="methodology">Methodology<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#methodology" class="hash-link" aria-label="Direct link to Methodology" title="Direct link to Methodology" translate="no">​</a></h2>
<p>We collected and analyzed the uptime of all AWS services and regions for a period of 1 year between 1st January 2025 and 31st December 2025. In this period, IncidentHub detected 38 outages across AWS products and regions. To detect cascading outages, we filtered out incident reports from SaaS vendors who acknowledged the cause as AWS on their status pages.</p>
<p>For this report, an "outage" is defined as an incident listed on AWS's status page that impacts or disrupts at least one AWS service. Each AWS status page incident was counted as a single outage, regardless of the number of services or regions listed.</p>
<p>We analyzed only 2025 incident data here.</p>
<div style="background-color:var(--ifm-color-emphasis-200);padding-top:15px;padding-bottom:8px;padding-left:20px;padding-right:20px;margin-top:10px;border-radius:10px;border-left:3px solid #60a5fa"><b style="text-align:justify"><p>A brief note on how IncidentHub collects outage data</p></b><span style="text-align:justify"><p>IncidentHub - a status page aggregator - monitors public status page periodically across hundreds of SaaS and Cloud vendors. It detects outages, maintenance events, and changes in services and regions automatically. The end result is an aggregated dashboard of vendors - a single status page for all third-party service status pages.</p></span></div>
<br>
<p>We wanted to keep the analysis relevant to practitioners - actual users who rely on AWS. We focused on these aspects:</p>
<ul>
<li class="">Frequency of outages</li>
<li class="">Duration of outages</li>
<li class="">Outages by Service and Region</li>
<li class="">Is Reliability Improving, Staying the Same, or Getting Worse?</li>
<li class="">Outage Dependency Mapping - Which other SaaS providers were affected?</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="frequency-of-outages">Frequency of Outages<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#frequency-of-outages" class="hash-link" aria-label="Direct link to Frequency of Outages" title="Direct link to Frequency of Outages" translate="no">​</a></h2>
<p>AWS had at least one outage every month except for the last two in 2025. For a platform as large as AWS, this is not surprising. However, the real measure of reliability is the duration and the impact of the outages.</p>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/aws-outages---2025-line.webp" alt="AWS Outages by Month 2025">
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="duration-of-outages">Duration of Outages<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#duration-of-outages" class="hash-link" aria-label="Direct link to Duration of Outages" title="Direct link to Duration of Outages" translate="no">​</a></h2>
<p>The shortest outage lasted around 14 minutes, whereas the longest was around 15 hours.</p>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/aws-outages-by-duration---2025-lollipop.webp" alt="AWS Outages Duration by Month 2025">
<br>
<br>
<p>Average monthly MTTR was higher in Q3 than in Q1 and Q2. The October 20th outage being an outlier drove up the MTTR to be almost 12x that of January's in Q4.</p>
<table><thead><tr><th>Month</th><th>MTTR (minutes)</th></tr></thead><tbody><tr><td>Jan</td><td>70.64</td></tr><tr><td>Feb</td><td>116.26</td></tr><tr><td>Mar</td><td>36.50</td></tr><tr><td>Apr</td><td>76.00</td></tr><tr><td>May</td><td>66.26</td></tr><tr><td>Jun</td><td>65.27</td></tr><tr><td>Jul</td><td>133.18</td></tr><tr><td>Aug</td><td>63.96</td></tr><tr><td>Sep</td><td>122.66</td></tr><tr><td>Oct</td><td>882.50</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="outages-by-service">Outages by Service<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#outages-by-service" class="hash-link" aria-label="Direct link to Outages by Service" title="Direct link to Outages by Service" translate="no">​</a></h2>
<p>There is usually no correlation between the number of affected services and the duration of the outage in most of the outages, except for the one on Feb 13th and on Oct 20th.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="services-affected-per-outage">Services Affected per Outage<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#services-affected-per-outage" class="hash-link" aria-label="Direct link to Services Affected per Outage" title="Direct link to Services Affected per Outage" translate="no">​</a></h3>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/aws-services-affected-per-outage---2025-lollipop.webp" alt="AWS Outages by Service 2025">
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="services-which-had-the-highest-number-of-outages">Services Which had the Highest Number of Outages<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#services-which-had-the-highest-number-of-outages" class="hash-link" aria-label="Direct link to Services Which had the Highest Number of Outages" title="Direct link to Services Which had the Highest Number of Outages" translate="no">​</a></h3>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/aws-top-10-services-with-most-outages---2025-bar.webp" alt="AWS Services with Highest Number of Outages 2025">
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="regions-with-the-highest-number-of-outages">Regions with the Highest Number of Outages<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#regions-with-the-highest-number-of-outages" class="hash-link" aria-label="Direct link to Regions with the Highest Number of Outages" title="Direct link to Regions with the Highest Number of Outages" translate="no">​</a></h3>
<p>The AWS us-east-1 region recorded the highest number of outages in 2025. This has been variously attributed to it being the oldest, busiest region, as well as many control plane services being hosted there. E.g. the <a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#id_credentials_access-keys_region" target="_blank">IAM service</a>, <a href="https://aws.amazon.com/blogs/networking-and-content-delivery/creating-disaster-recovery-mechanisms-using-amazon-route-53/" target="_blank">Cloudfront</a>, and <a href="https://aws.amazon.com/blogs/networking-and-content-delivery/creating-disaster-recovery-mechanisms-using-amazon-route-53/" target="_blank">Route 53</a>'s control planes are in us-east-1.</p>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/aws-outages-by-region---2025-pie.webp" alt="AWS Regions with Highest Number of Outages 2025">
<br>
<br>
<p>However, for the 20th October 2025 outage, the number of affected services in us-east-1 was high because a core service - DynamoDB - on which other AWS services depend, was affected in us-east-1 due to a race condition in its DNS management system as Amazon explains in their <a href="https://aws.amazon.com/message/101925/" target="_blank">detailed summary</a>. DynamoDB runs in other AWS regions too - so this outage could have theoretically happened in other regions as well. The AWS team rolled out the fix for the race condition to the first region by October 24th and by October 28th they had it across all <a href="https://www.youtube.com/watch?v=YZUNNzLDWb8" target="_blank">regions worldwide</a>.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="is-reliability-improving-staying-the-same-or-getting-worse">Is Reliability Improving, Staying the Same, or Getting Worse?<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#is-reliability-improving-staying-the-same-or-getting-worse" class="hash-link" aria-label="Direct link to Is Reliability Improving, Staying the Same, or Getting Worse?" title="Direct link to Is Reliability Improving, Staying the Same, or Getting Worse?" translate="no">​</a></h2>
<p>A quick look at service-wise outages for some of the top-10 services affected shows variable trends. However, overall for AWS as a whole, the average monthly outage duration increased in Q3 compared to Q1 and Q2, but the number of services affected decreased (except for the Oct 20th outlier).</p>
<b style="text-align:center;display:block">EC2</b>
<img style="width:70%;border-radius:10px;border:1px;display:block;margin:0 auto" src="https://cdn.incidenthub.cloud/blog/amazon-elastic-compute-cloud-outages---2025-bar.webp" alt="Amazon Elastic Compute Cloud Outages 2025">
<br>
<b style="text-align:center;display:block">ECS</b>
<img style="width:70%;border-radius:10px;border:1px;display:block;margin:0 auto" src="https://cdn.incidenthub.cloud/blog/amazon-elastic-container-service-outages---2025-bar.webp" alt="Amazon Elastic Container Service Outages 2025">
<br>
<b style="text-align:center;display:block">EKS</b>
<img style="width:70%;border-radius:10px;border:1px;display:block;margin:0 auto" src="https://cdn.incidenthub.cloud/blog/amazon-elastic-kubernetes-service-outages---2025-bar.webp" alt="Amazon Elastic Kubernetes Service Outages 2025">
<br>
<b style="text-align:center;display:block">ELB</b>
<img style="width:70%;border-radius:10px;border:1px;display:block;margin:0 auto" src="https://cdn.incidenthub.cloud/blog/amazon-elastic-load-balancing-outages---2025-bar.webp" alt="Amazon Elastic Load Balancing Outages 2025">
<br>
<b style="text-align:center;display:block">SageMaker</b>
<img style="width:70%;border-radius:10px;border:1px;display:block;margin:0 auto" src="https://cdn.incidenthub.cloud/blog/amazon-sagemaker-outages---2025-bar.webp" alt="Amazon Sagemaker Outages 2025">
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="outage-dependency-mapping">Outage Dependency Mapping<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#outage-dependency-mapping" class="hash-link" aria-label="Direct link to Outage Dependency Mapping" title="Direct link to Outage Dependency Mapping" translate="no">​</a></h2>
<p>The AWS outage of Oct 20th was one of their biggest outages in 2025. In distributed systems, failures in one part of the system can result in cascading failures in other parts of the system. The same principle applies to SaaS providers and their dependencies. A lot of key AWS services were affected, and as a result, many SaaS providers. Since many SaaS providers use AWS directly and also other SaaS providers which in turn use AWS, the overall impact multiplied rapidly.</p>
<p>IncidentHub detected 400+ other SaaS outages in the same timespan as the AWS outage of the 20th - out of which 197 SaaS providers acknowledged the cause as AWS. The subsequent analysis takes only those 197 SaaS providers into account, although it's highly likely the blast radius was much larger.</p>
<div style="background-color:var(--ifm-color-emphasis-200);padding-top:15px;padding-bottom:4px;padding-left:20px;padding-right:20px;margin-top:10px;border-radius:10px;border-left:3px solid #60a5fa"><b style="text-align:justify"></b><span style="text-align:justify"><p>While most SaaS providers either talked about how the AWS outage affected them, or did not mention AWS at all, Cloudflare explicitly mentioned that they were not affected by the AWS outage in any way. This is a good example of being upfront in user communication. Thousands of services depend on Cloudflare and these kind of declarations make it easier to debug issues.</p><p>Source: <a href="https://www.cloudflarestatus.com/incidents/whxhl75k4hzd" target="_blank">Cloudflare status page</a>.</p></span></div>
<br>
<p>Note: The numbers are smaller than what somebody would expect the expected blast radius to be because:</p>
<ol>
<li class="">There are services that IncidentHub does not monitor yet.</li>
<li class="">Not all affected services monitored by IncidentHub acknowledged the cause as AWS.</li>
</ol>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="cascading-outages-timeline">Cascading Outages Timeline<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#cascading-outages-timeline" class="hash-link" aria-label="Direct link to Cascading Outages Timeline" title="Direct link to Cascading Outages Timeline" translate="no">​</a></h3>
<p>The graph below shows the number of outages in SaaS providers over time who acknowledged the cause as AWS. After AWS resolved the issue, it took time for some services to recover - this is expected due to their validating recovery measures and their reliance on direct or indirect dependencies which were themselves recovering.</p>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/aws-20th-oct-2025---dependency-outages-line.webp" alt="AWS Outage Cascading Outages Timeline 2025">
<br>
<br>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="dependency-outages-by-saas-type">Dependency Outages by SaaS Type<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#dependency-outages-by-saas-type" class="hash-link" aria-label="Direct link to Dependency Outages by SaaS Type" title="Direct link to Dependency Outages by SaaS Type" translate="no">​</a></h3>
<p>The top SaaS categories affected included Cybersecurity tools, Developer tooling, Communication and Collaboration tools, Education Technology Platforms, and CRM, Marketing, and Customer Support.</p>
<p>Notably, Observability and Incident Management tools also appear in the distribution - the very tools that teams rely on to detect outages.</p>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/aws-20th-oct-2025---dependency-outages-by-type-pie.webp" alt="AWS Outage Dependency Mapping by Type 2025">
<br>
<br>
<p>Let's look at some of these categories in more detail.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="observability-and-incident-management-providers">Observability and Incident Management Providers<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#observability-and-incident-management-providers" class="hash-link" aria-label="Direct link to Observability and Incident Management Providers" title="Direct link to Observability and Incident Management Providers" translate="no">​</a></h4>
<p>Observability data ingestion was affected in <a href="https://status.datadoghq.com/incidents/vjgmxjdf0ps5" target="_blank"> DataDog</a>, <a href="https://status.io/pages/incident/546d8cb6af8407b6730000cb/68f5ec661980c748ef84fb76" target="_blank"> Dynatrace</a> and <a href="https://status.newrelic.com/incidents/pxp00j5cv6sv" target="_blank"> New Relic</a>. As a result, monitoring systems in SaaS that use such data for alerting and incident management were affected.</p>
<p>Incident Management software were also affected - some directly, some <a href="https://status.incident.io/incidents/01K80FCADM21VZ92TMQ7J2E4CG" target="_blank">indirectly</a>. Efforts to <a href="https://status.incident.io/incidents/01K80B0Y8QAYPVC2KNZ0FPA2PQ" target="_blank">move</a> to another AWS region were made in some cases but it took hours. On call notifications were <a href="https://status.datadoghq.com/incidents/ww8rhx2j8mfz" target="_blank">delayed</a> or not sent at all.</p>
<table><thead><tr><th>SaaS</th><th>Impact</th><th>Duration</th></tr></thead><tbody><tr><td>StatusHub</td><td>SMS delivery issues</td><td>22 hours and 49 minutes</td></tr><tr><td>PagerDuty</td><td>Delayed Notifications in US Region</td><td>6 hours and 24 minutes</td></tr><tr><td>Opsgenie</td><td>Atlassian Cloud Services impacted</td><td>15 hours and 58 minutes</td></tr><tr><td>Incident.io</td><td>Escalation delays</td><td>7 hours and 0 minutes</td></tr><tr><td>Better Stack</td><td>Delayed email notifications due to AWS outage</td><td>1 hour and 58 minutes</td></tr><tr><td>BugSnag</td><td>AWS outage impacting Smartbear ID logins and email notifications</td><td>48 minutes</td></tr><tr><td>Grafana Cloud</td><td>Grafana K6: Some Test Runs May Not Start Due to AWS Outage</td><td>4 hours and 51 minutes</td></tr><tr><td>Honeycomb</td><td>Delays in SLO, Service Maps processing</td><td>25 hours and 24 minutes</td></tr><tr><td>DataDog Integrations</td><td>Several Web Integrations affected due to Vendors' outage in US1-east</td><td>33 hours and 51 minutes</td></tr><tr><td>Dynatrace</td><td>Accessibility and login issues with  Dynatrace UI</td><td>8 hours and 29 minutes</td></tr><tr><td>New Relic</td><td>Cloud and Synthetics Data Ingest</td><td>12 hours and 53 minutes</td></tr><tr><td>Axiom</td><td>System issues</td><td>3 hours and 36 minutes</td></tr><tr><td>Sumo Logic</td><td>Problem with Tracing Collection, Authentication, Billing and Account Management, CSE Processing Pipeline and CSE APIs</td><td>14 hours and 30 minutes</td></tr></tbody></table>
<p>Essentially, during the outage, visibility into your systems was impaired if you were dependent on SRE/Ops tools that use AWS in some way.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="developer-tools">Developer tools<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#developer-tools" class="hash-link" aria-label="Direct link to Developer tools" title="Direct link to Developer tools" translate="no">​</a></h4>
<p>Developer tooling took a significant hit, with several platforms reporting outages well in excess of 20 hours - disrupting CI/CD pipelines, code review, and feature flag management simultaneously.</p>
<p>Outages in artifact repositories and container registries affected downstream services like managed Kubernetes platforms that depend on them.</p>
<table><thead><tr><th>SaaS</th><th>Impact</th><th>Duration</th></tr></thead><tbody><tr><td>GitHub</td><td>Copilot</td><td>2 hours 30 minutes</td></tr><tr><td>GitLab</td><td>Package Registry</td><td>~1 hour</td></tr><tr><td>Quay Container Registry</td><td>Writes disabled</td><td>14 hours and 55 minutes</td></tr><tr><td>Docker Hub</td><td>Multiple services affected</td><td>~4 hours</td></tr><tr><td>GitBook</td><td>Public content loading</td><td>22 hours 15 minutes</td></tr><tr><td>Postman</td><td>Increased error rates</td><td>~15 hours</td></tr><tr><td>LaunchDarkly</td><td>Elevated Latencies and Delays</td><td>26 hours and 35 minutes</td></tr><tr><td>Bitbucket (and other Atlassian services)</td><td>Delays and missing notifications</td><td>23 hours and 42 minutes</td></tr><tr><td>CircleCI</td><td>Job and pipeline failures, UI and API errors</td><td>15 hours and 5 minutes</td></tr><tr><td>Codefresh</td><td>Build retries</td><td>9 hours and 37 minutes</td></tr><tr><td>SonarQube Cloud</td><td>Endpoint request failures</td><td>3 hours and 11 minutes</td></tr><tr><td>Cursor IDE</td><td>Service degradation</td><td>12 hours and 51 minutes</td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="infrastructure-and-hosting-providers">Infrastructure and Hosting Providers<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#infrastructure-and-hosting-providers" class="hash-link" aria-label="Direct link to Infrastructure and Hosting Providers" title="Direct link to Infrastructure and Hosting Providers" translate="no">​</a></h4>
<p>Core infra services like some DNS providers were <a href="https://status.io/pages/incident/5f80d63ea1c48e04c1dfa100/68f5ed57245df0465a50afa9" target="_blank">affected</a> - leading to DNS propagation delays.
Other cloud providers who depend on SaaS that use AWS saw impact on some of <a href="https://status.digitalocean.com/incidents/sxlrj088l11b" target="_blank">their services</a>.</p>
<p>Hosting platforms Netlify and Render were down for 13+ hours, affecting websites running on them. DigitalOcean reported that their managed Kubernetes platform was affected due to Docker Hub issues, which were due to AWS, for 19+ hours.</p>
<table><thead><tr><th>SaaS</th><th>Impact</th><th>Duration</th></tr></thead><tbody><tr><td>Hostinger</td><td>Payment Processing Service Disruption</td><td>11 hours and 30 minutes</td></tr><tr><td>WPEngine</td><td>Chat &amp; Phone Support</td><td>4 hours and 30 minutes</td></tr><tr><td>Railway</td><td>Deployments using Dockerhub are currently failing</td><td>2 hours and 34 minutes</td></tr><tr><td>EngineYard</td><td>Slowness, timeouts, or trouble accessing some parts of platform and services</td><td>11 hours and 32 minutes</td></tr><tr><td>Render</td><td>New database creation, backups, support tools</td><td>13 hours and 18 minutes</td></tr><tr><td>Netlify</td><td>UI actions, outgoing emails from Netlify, builds, functions</td><td>15 hours and 58 minutes</td></tr><tr><td>DigitalOcean</td><td>Multiple Services Disruption</td><td>19 hours and 20 minutes</td></tr><tr><td>Fly.io</td><td>Deployment failures</td><td>1 hour and 11 minutes</td></tr><tr><td>Shockbyte</td><td>[Shockbyte Panel] Email Provider Outage</td><td>2 hours and 43 minutes</td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="it-operations-and-msp-tools">IT Operations and MSP Tools<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#it-operations-and-msp-tools" class="hash-link" aria-label="Direct link to IT Operations and MSP Tools" title="Direct link to IT Operations and MSP Tools" translate="no">​</a></h4>
<p>IT Ops and MSP tools were significantly affected, with several tools managing remote devices and endpoints remaining down for over 13 hours.</p>
<table><thead><tr><th>SaaS</th><th>Impact</th><th>Duration</th></tr></thead><tbody><tr><td>NinjaOne</td><td>Multiple third party providers continue to be impacted by cloud service outage - including SMS messaging</td><td>8 hours and 4 minutes</td></tr><tr><td>Commvault Cloud (Metallic)</td><td>Service Interruption</td><td>13 hours and 23 minutes</td></tr><tr><td>Jamf</td><td>Jamf: US-East-1 Disruption</td><td>11 hours and 55 minutes</td></tr><tr><td>Spiceworks</td><td>Cloud Help Desk outage</td><td>7 minutes</td></tr><tr><td>Kaseya</td><td>Datto RMM - Concord, Vidal - Service Disruption (error messages, agent disconnections)</td><td>13 hours and 1 minute</td></tr><tr><td>Auvik Networks</td><td>Partial service degradation</td><td>1 hour and 6 minutes</td></tr><tr><td>Halo</td><td>Email processing and scheduled actions</td><td>3 hours and 28 minutes</td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="education-technology-platforms">Education Technology Platforms<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#education-technology-platforms" class="hash-link" aria-label="Direct link to Education Technology Platforms" title="Direct link to Education Technology Platforms" translate="no">​</a></h4>
<p>More than 13 education technology platforms reported outages, some lasting well over a day.</p>
<table><thead><tr><th>SaaS</th><th>Impact</th><th>Duration</th></tr></thead><tbody><tr><td>PowerSchool</td><td>Multiple PowerSchool Products - Users are unable to access application</td><td>27 hours and 33 minutes</td></tr><tr><td>Blackboard by Anthology</td><td>Learn SaaS - US-EAST-1 Region - Multiple Sites Inaccessible</td><td>17 hours and 33 minutes</td></tr><tr><td>Turnitin</td><td>Turnitin Service Incident - 20th October 2025</td><td>21 hours and 46 minutes</td></tr><tr><td>Imagine Learning</td><td>Multiple Products - Some Assessments and Activities Not Scoring Correctly</td><td>7 hours and 25 minutes</td></tr><tr><td>Renaissance</td><td>Renaissance programs</td><td>22 hours and 28 minutes</td></tr><tr><td>HMH</td><td>Intermittent Service Degradation</td><td>23 hours and 37 minutes</td></tr><tr><td>Remind</td><td>Issues with accessing or using Remind web &amp; mobile apps</td><td>6 hours and 12 minutes</td></tr><tr><td>Instructure</td><td>Some users may encounter errors when accessing Canvas</td><td>17 hours and 57 minutes</td></tr><tr><td>Clever</td><td>Users unable to login to Clever</td><td>35 hours and 52 minutes</td></tr><tr><td>Great Minds</td><td>Content/Assessment interactives, SSO failures, Platform latency</td><td>19 hours and 43 minutes</td></tr><tr><td>Savvas Learning Company</td><td>Savvas Realize Performance Issues</td><td>11 hours and 55 minutes</td></tr><tr><td>Pearson</td><td>Pearson Online Classroom: Degraded Performance</td><td>7 hours and 10 minutes</td></tr><tr><td>Ellucian Cloud</td><td>Services degraded</td><td>14 hours and 38 minutes</td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="payments">Payments<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#payments" class="hash-link" aria-label="Direct link to Payments" title="Direct link to Payments" translate="no">​</a></h4>
<p>Outages in payments providers mean lost revenue, and there were plenty of them.</p>
<table><thead><tr><th>SaaS</th><th>Impact</th><th>Duration</th></tr></thead><tbody><tr><td>Kraken Digital Asset Exchange</td><td>US Dollar (USD) Deposits via Plaid Unavailable</td><td>3 hours and 44 minutes</td></tr><tr><td>Bluefin</td><td>Phone System Outage</td><td>8 hours and 15 minutes</td></tr><tr><td>Coinbase Commerce</td><td>Site Performance - Login, Trading, Transactions</td><td>17 hours and 25 minutes</td></tr><tr><td>Coinbase Prime</td><td>Site Performance - Login, Trading, Transactions</td><td>17 hours and 25 minutes</td></tr><tr><td>Paddle</td><td>Issue affecting Checkouts and Order Processing</td><td>4 hours and 53 minutes</td></tr><tr><td>Tebex</td><td>Payment Decline Errors</td><td>4 hours and 28 minutes</td></tr></tbody></table>
<p>Although there were services which acknowledged that they managed to <a href="https://status.io/pages/incident/5ea188b3144baf049b282591/68f65e2ba63ca248d9b88dfb" target="_blank">failover</a> to another region, the lag in recovery across many SaaS vendors suggests that region-level failover is not straightforward in practice.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="regional-outage-with-global-impact">Regional Outage with Global Impact<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#regional-outage-with-global-impact" class="hash-link" aria-label="Direct link to Regional Outage with Global Impact" title="Direct link to Regional Outage with Global Impact" translate="no">​</a></h2>
<p>Was this a regional AWS failure that also took down global services? Yes, in some cases, by extension. The AWS failure itself remained regional.</p>
<p>This happened due to second and third, and more, order effects:</p>
<div style="background-color:var(--ifm-color-emphasis-200);padding-top:15px;padding-bottom:8px;padding-left:20px;padding-right:20px;margin-top:10px;border-radius:10px;border-left:3px solid #60a5fa"><b style="text-align:justify"><p>System Issues</p></b><span style="text-align:justify"><p>The scope of impact is increasing as more IAM credentials expire and are unable to refresh, leading to additional service disruptions.
Additionally authenticating to our EU console isn't working, as our SAML partner is experiencing issues.</p><p>From the <a href="https://status.axiom.co/incidents/01K809TNJ50FXN5EXPB3X30EDS" target="_blank">Axiom status page</a>.</p></span></div>
<br>
<div style="background-color:var(--ifm-color-emphasis-200);padding-top:15px;padding-bottom:8px;padding-left:20px;padding-right:20px;margin-top:10px;border-radius:10px;border-left:3px solid #60a5fa"><b style="text-align:justify"><p>Dashboard access and support request submission issues in multiple regions</p></b><span style="text-align:justify"><p>We are currently investigating an issue with dashboards loading and customer support in multiple regions.</p><p>From the <a href="https://status.qualtrics.com/incidents/rwnkdm9clkb4" target="_blank">Qualtrics status page</a>.</p></span></div>
<br>
<div style="background-color:var(--ifm-color-emphasis-200);padding-top:15px;padding-bottom:8px;padding-left:20px;padding-right:20px;margin-top:10px;border-radius:10px;border-left:3px solid #60a5fa"><b style="text-align:justify"><p>We are affected by AWS outage in us-east-1</p></b><span style="text-align:justify"><p>We are currently investigating intermittent failures when creating new services in us-east-1 region.
If you already have a running service, connections should continue to work as expected.</p><p><code>https://console.cloud.timescale.com/</code> is operational, although we’ve received some reports of certain assets not loading properly.</p><p>This issue appears to be related to an ongoing AWS service outage, which is also being reported on the AWS Service Health Dashboard.</p><p>At this time, we are not aware of any further impact.
We’ll provide an update as soon as more information becomes available.
2 Affected Services:</p><ul>
<li class="">Regions / US East (N. Virginia) / us-east-1</li>
<li class="">Global Services / Console &amp; APIs</li>
</ul><p>From the <a href="https://status.timescale.com/issues/68f5f407a877241fc9596746" target="_blank">TigerCloud status page</a>.</p></span></div>
<br>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="outage-chains">Outage Chains<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#outage-chains" class="hash-link" aria-label="Direct link to Outage Chains" title="Direct link to Outage Chains" translate="no">​</a></h3>
<p>It's not straightforward to map a complete dependency graph of all the services that were affected by this outage, but we did manage to uncover some interesting aspects.</p>
<p>Here are 3 chains we uncovered, each showing how a vendor's outage was not triggered by AWS directly, but by another vendor's outage that depended on AWS - illustrated by incidents at <a href="https://status.statushub.io/incidents/13652" target="_blank">StatusHub</a> and <a href="https://status.railway.com/cmgz0phe101toc4aunvhguk9t" target="_blank">Railway</a>.</p>
<div style="background-color:var(--ifm-color-emphasis-200);padding-top:15px;padding-bottom:8px;padding-left:20px;padding-right:20px;margin-top:10px;border-radius:10px;border-left:3px solid #60a5fa"><b style="text-align:justify"><p>AWS -&gt; Twilio -&gt; StatusHub</p></b><span style="text-align:justify"><p><strong>Twilio</strong>
<em>We are currently investigating elevated latency and timeout errors for Twilio Rest API, impacting the Multiple Twilio services. Our engineering team is actively working on the issue, and we will provide another update in 60 minutes or as soon as more information becomes available.</em></p><p><strong>StatusHub</strong>
<em>Due to major outage affecting our SMS partner, SMS delivery is currently affected and messages may not be delivered.</em></p></span></div>
<div style="background-color:var(--ifm-color-emphasis-200);padding-top:15px;padding-bottom:8px;padding-left:20px;padding-right:20px;margin-top:10px;border-radius:10px;border-left:3px solid #60a5fa"><b style="text-align:justify"><p>AWS -&gt; Docker Hub -&gt; Railway</p></b><span style="text-align:justify"><p><strong>Docker Hub</strong>
<em>Docker is continuing to experience service disruption as a result of issues with an upstream service provider. We are actively working to remediate where possible.</em></p><p><strong>Railway</strong>
<em>Builds and deploys are taking longer than usual as Dockerhub recovers after their recent outage.</em></p></span></div>
<br>
<p>Upstream dependencies cannot be avoided. Even with a multi-cloud approach - which itself is not straightforward and may not be feasible architecturally or financially for everyone - key dependencies can remain tied to <a href="https://blog.pragmaticengineer.com/downdetector-and-the-real-cost-of-no-upstream-dependencies/" target="_blank">single providers</a>.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="surviving-aws-outages">Surviving AWS Outages<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#surviving-aws-outages" class="hash-link" aria-label="Direct link to Surviving AWS Outages" title="Direct link to Surviving AWS Outages" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="reconsider-if-us-east-1-is-necessary-for-your-workloads">Reconsider if us-east-1 is necessary for your workloads<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#reconsider-if-us-east-1-is-necessary-for-your-workloads" class="hash-link" aria-label="Direct link to Reconsider if us-east-1 is necessary for your workloads" title="Direct link to Reconsider if us-east-1 is necessary for your workloads" translate="no">​</a></h3>
<p>The us-east-1 region is the oldest and busiest in AWS, and it saw the highest number of outages in 2025. Even though the biggest one in 2025 could have occurred in any region, us-east-1 still leads in the total number of outages. You have no control over the control plane and global services that run there, so you can avoid us-east-1 if your workloads can run in other regions.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="audit-your-own-dependency-chain">Audit your own dependency chain<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#audit-your-own-dependency-chain" class="hash-link" aria-label="Direct link to Audit your own dependency chain" title="Direct link to Audit your own dependency chain" translate="no">​</a></h3>
<p>First level dependencies for AWS are easy to find. Second and further level ones are not so easy. However, listing down all your dependencies is a great first step.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="monitor-third-party-dependencies-not-just-aws-directly">Monitor third-party dependencies, not just AWS directly<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#monitor-third-party-dependencies-not-just-aws-directly" class="hash-link" aria-label="Direct link to Monitor third-party dependencies, not just AWS directly" title="Direct link to Monitor third-party dependencies, not just AWS directly" translate="no">​</a></h3>
<p>Tools like status page aggregators can monitor third-party dependencies seamlessly, without the need to manually check each and every status page or account for any differences in their structure or notifications. Monitoring the specific components you use (e.g. EC2 in AWS) is necessary to keep your alerts relevant.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="accept-that-multi-cloud-is-not-a-silver-bullet">Accept that multi-cloud is not a silver bullet<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#accept-that-multi-cloud-is-not-a-silver-bullet" class="hash-link" aria-label="Direct link to Accept that multi-cloud is not a silver bullet" title="Direct link to Accept that multi-cloud is not a silver bullet" translate="no">​</a></h3>
<p>Multi-cloud sounds like an attractive proposition until you realize that:</p>
<ul>
<li class="">It may not be feasible for your organization either architecturally or financially.</li>
<li class="">Your other SaaS dependencies may not be multi-cloud and thus won't be as resilient.</li>
</ul>
<p>The first step is knowing when your dependencies are down.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="conclusion">Conclusion<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>2025 showed us that an infrastructure provider outage can cascade across hundreds of dependent services and that was true for AWS too.
38 outages were recorded across 200+ products and 39 regions. Most AWS outages in 2025 remained confined to a single region.
The October 20th outage led to many downstream SaaS providers experiencing outages too.</p>
<p>Layers of downstream dependencies amplify a single provider's outage. Monitoring your SaaS dependencies is more crucial than ever to stay ahead of the impact such outages can have on your business.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="faq">FAQ<a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability#faq" class="hash-link" aria-label="Direct link to FAQ" title="Direct link to FAQ" translate="no">​</a></h2>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>How many AWS outages were there in 2025?</summary><div><div class="collapsibleContent_i85q"><p></p><p>In 2025, IncidentHub detected 38 outages across AWS services and regions.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>What caused the AWS outage on October 20th 2025?</summary><div><div class="collapsibleContent_i85q"><p></p><p>The AWS outage on October 20th 2025 was attributed by AWS to a failure in DynamoDB's automatic DNS management system.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>What is the highest MTTR for AWS outages in 2025?</summary><div><div class="collapsibleContent_i85q"><p></p><p>The highest MTTR for AWS outages in 2025 was 882.50 minutes.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>Which AWS services were affected by the AWS outage on October 20th 2025?</summary><div><div class="collapsibleContent_i85q"><p></p><p>The AWS outage on October 20th 2025 affected more than 140 AWS services.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>Is us-east-1 really less reliable than other AWS regions?</summary><div><div class="collapsibleContent_i85q"><p></p><p>us-east-1 recorded the most outages in 2025. This may be influenced by its age and service density, even though the biggest one in 2025 could have occurred in any region. You have no control over the control plane and global services that run there, you can avoid us-east-1 if your workloads can run in other regions.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>How long did the AWS October 2025 outage last?</summary><div><div class="collapsibleContent_i85q"><p></p><p>The AWS October 2025 outage lasted around 15 hours.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>Does multi-cloud actually protect against AWS outages?</summary><div><div class="collapsibleContent_i85q"><p></p><p>Not necessarily. Your other SaaS dependencies may not be multi-cloud and thus won't be as resilient.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>How can I monitor AWS outages and their impact on my SaaS tools?</summary><div><div class="collapsibleContent_i85q"><p></p><p>A status page aggregator like IncidentHub can help you monitor AWS outages and their impact on your SaaS tools seamlessly, without the need to manually check each and every status page. You only need to monitor the specific components (in AWS and other SaaS) and IncidentHub can take care of that too.</p><p></p></div></div></details>
<div style="background-color:var(--ifm-color-emphasis-200);padding-top:15px;padding-bottom:2px;padding-left:20px;padding-right:20px;margin-top:10px;border-radius:10px;border-left:3px solid #60a5fa"><span style="text-align:center"><p>Sign up for an IncidentHub account and stay on top of all your SaaS dependencies</p></span></div>
<br>
<p>Photo by <a href="https://unsplash.com/@scottrodgerson?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Scott Rodgerson</a> on <a href="https://unsplash.com/photos/a-bunch-of-blue-wires-connected-to-each-other-PSpf_XgOM5w?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></p>
<p><em>IncidentHub is not affiliated with any of the services and vendors mentioned in this article. All logos and company names are trademarks or registered trademarks of their respective holders. Amazon Web Services and AWS are trademarks of Amazon.com, Inc. This report is independent and not affiliated with or endorsed by Amazon</em>.</p>
<p>This article was first published on the <a href="https://blog.incidenthub.cloud/definitive-aws-outage-report-2025-reliability" target="_blank" rel="noopener noreferrer" class="">IncidentHub blog</a>.</p>]]></content:encoded>
            <category>Reliability Index</category>
            <category>Uptime</category>
            <category>Outages</category>
            <category>Cloud</category>
        </item>
        <item>
            <title><![CDATA[How to Monitor SaaS Status in 2026 : A Complete Guide]]></title>
            <link>https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide</link>
            <guid>https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide</guid>
            <pubDate>Wed, 14 Jan 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[A practical guide to monitoring SaaS and Cloud vendor status in 2026 with a focus on status pages, status page aggregators, alert filtering, and implementation workflows.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="introduction">Introduction<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#introduction" class="hash-link" aria-label="Direct link to Introduction" title="Direct link to Introduction" translate="no">​</a></h2>
<p><em>This is an updated and expanded version of the <a class="" href="https://blog.incidenthub.cloud/How-To-Monitor-Public-Status-Pages-of-Cloud-Providers-a-Step-by-Step-Approach">older guide</a>.</em></p>
<p>According to the <a rel="noopener noreferrer nofollow" href="https://www.bettercloud.com/resources/state-of-saas/" target="_blank">2025 State of SaaS report</a>, organizations use an average of 106 SaaS apps.</p>
<p>Staying on top of your SaaS vendors' status is as important as monitoring your own services. The Cloudflare, AWS, Azure, and Google Cloud outages in 2025 were strong reminders of this fact.
As a result of these <a class="" href="https://blog.incidenthub.cloud/major-cloud-outages-2025">outages</a>, many SaaS services that depend on them experienced outages as well, leading to a cascading effect which took down hundreds of vendors and affected thousands of users. Cloud and SaaS outages are not isolated any longer but usually happen in groups due to the dependency chain.</p>
<p>This article aims to be a comprehensive guide on how to monitor the uptime status of your SaaS and Cloud vendors.</p>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/monitoring-saas-status-2026.webp" alt="Monitoring SaaS Status in 2026 - A Complete Guide">
<hr>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#introduction" class="">Introduction</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#monitoring-saas-and-cloud-vendors-status" class="">Monitoring SaaS and Cloud Vendors' Status</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#how-to-monitor-multiple-saas-status-pages---quick-overview" class="">How to Monitor Multiple SaaS Status Pages - Quick Overview</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#why-status-pages-remain-the-gold-standard" class="">Why Status Pages Remain the Gold Standard</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#manual-monitoring-of-public-status-pages" class="">Manual Monitoring of Public Status Pages</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#locate-the-public-status-pages" class="">Locate the Public Status Pages</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#understand-the-status-page-structure" class="">Understand the Status Page Structure</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#configure-notifications-if-available" class="">Configure Notifications if Available</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#notification-challenges" class="">Notification Challenges</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#subscribing-to-status-page-rss-feeds" class="">Subscribing to Status Page RSS Feeds</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#running-your-own-status-page-monitor" class="">Running Your Own Status Page Monitor</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#using-a-status-page-aggregator" class="">Using a Status Page Aggregator</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#key-benefits-of-managed-status-page-aggregators" class="">Key Benefits of Managed Status Page Aggregators</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#status-page-aggregator-best-practices" class="">Status Page Aggregator Best Practices</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#set-up-component-filtering" class="">Set up Component Filtering</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#fine-tune-alerts-by-type" class="">Fine-tune Alerts by Type</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#fine-tune-alerts-by-lifecycle" class="">Fine-tune Alerts by Lifecycle</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#integrate-with-your-teams-workflow" class="">Integrate with Your Team's Workflow</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#include-third-party-status-in-your-incident-response-plan" class="">Include Third-party Status in Your Incident Response Plan</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#periodically-review-your-organizations-service-usage" class="">Periodically Review Your Organization's Service Usage</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#make-the-aggregated-status-page-easily-accessible" class="">Make the Aggregated Status Page Easily Accessible</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#enable-upcoming-maintenance-notifications" class="">Enable Upcoming Maintenance Notifications</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#advanced-techniques-with-a-status-page-aggregator" class="">Advanced Techniques with a Status Page Aggregator</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#automate-ticket-creation" class="">Automate Ticket Creation</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#setup-multiple-teams" class="">Setup Multiple Teams</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#custom-integration-using-apis-and-webhooks" class="">Custom Integration Using APIs and Webhooks</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#historical-trend-analysis" class="">Historical Trend Analysis</a></li>
</ul>
</li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#choosing-your-monitoring-approach" class="">Choosing Your Monitoring Approach</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#decision-table" class="">Decision Table</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#a-status-page-aggregator-implementation-workflow" class="">A Status Page Aggregator Implementation Workflow</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#implementation-workflow-overview" class="">Implementation Workflow Overview</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#discovery" class="">Discovery</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#team-setup" class="">Team Setup</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#monitoring-configuration" class="">Monitoring Configuration</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#alerting-configuration" class="">Alerting Configuration</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#a-single-status-page-for-many-status-pages" class="">A Single Status Page for Many Status Pages</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#integration-into-incident-response-plan" class="">Integration Into Incident Response Plan</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#common-status-page-aggregator-implementation-pitfalls" class="">Common Status Page Aggregator Implementation Pitfalls</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#starting-with-too-broad-settings" class="">Starting With Too Broad Settings</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#accessibility-and-visibility" class="">Accessibility and Visibility</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#stale-list-of-monitored-services" class="">Stale List of Monitored Services</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#non-status-page-methods" class="">Non-Status Page Methods</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#using-third-party-telemetry-data-sites" class="">Using Third Party Telemetry Data Sites</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#using-crowdsourced-information-sites" class="">Using Crowdsourced Information Sites</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#using-social-media" class="">Using Social Media</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#conclusion" class="">Conclusion</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#faq" class="">FAQ</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="monitoring-saas-and-cloud-vendors-status">Monitoring SaaS and Cloud Vendors' Status<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#monitoring-saas-and-cloud-vendors-status" class="hash-link" aria-label="Direct link to Monitoring SaaS and Cloud Vendors' Status" title="Direct link to Monitoring SaaS and Cloud Vendors' Status" translate="no">​</a></h2>
<p>There are several ways to monitor SaaS and Cloud vendors' status. The most reliable one is to monitor their public status pages. Status pages remain an important source of truth as they are directly managed by vendors. There are other, supplementary sources of information apart from status pages that you can refer to, but they are neither comprehensive nor always reliable.</p>
<p>We will look at each of them in turn.</p>
<p>If you want a quick way to decide which approach fits your situation, jump to the <a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#decision-table" class="">decision table later in this guide</a>.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="how-to-monitor-multiple-saas-status-pages---quick-overview">How to Monitor Multiple SaaS Status Pages - Quick Overview<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#how-to-monitor-multiple-saas-status-pages---quick-overview" class="hash-link" aria-label="Direct link to How to Monitor Multiple SaaS Status Pages - Quick Overview" title="Direct link to How to Monitor Multiple SaaS Status Pages - Quick Overview" translate="no">​</a></h3>
<p>To monitor multiple SaaS and cloud service status pages in one place, you have these options:</p>
<ol>
<li class="">Manually check each vendor's public status page (not scalable beyond a few vendors).</li>
<li class="">Subscribe to individual email/RSS notifications (no component filtering, fragmented alerts).</li>
<li class="">Build and maintain your own status page monitor (high effort, high maintenance).</li>
<li class="">Use a status page aggregator that combines multiple status pages into a single view (recommended).</li>
</ol>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-status-pages-remain-the-gold-standard">Why Status Pages Remain the Gold Standard<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#why-status-pages-remain-the-gold-standard" class="hash-link" aria-label="Direct link to Why Status Pages Remain the Gold Standard" title="Direct link to Why Status Pages Remain the Gold Standard" translate="no">​</a></h3>
<p>Status pages are directly managed by vendors, and are hence authoritative sources of information. They also have structured, parseable data that can be consumed by automation tools. They remain the contractually significant source for tracking SLAs.</p>
<p>In some cases, vendors may take longer to update their status pages to announce an incident that they are still investigating. This reflects their internal incident response process. While crowdsourced platforms and telemetry data sites may provide early warnings, such sources lack context (which components, regions are affected), accuracy, and official confirmation.</p>
<p>The reason status pages remain essential is because they provide structured incident data, regional and component level details, official timelines for SLAs, and upcoming and ongoing maintenance schedules.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="manual-monitoring-of-public-status-pages">Manual Monitoring of Public Status Pages<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#manual-monitoring-of-public-status-pages" class="hash-link" aria-label="Direct link to Manual Monitoring of Public Status Pages" title="Direct link to Manual Monitoring of Public Status Pages" translate="no">​</a></h2>
<p>Manual monitoring works only at a very small scale and is included here to establish a baseline for comparison.</p>
<p>The first step here is obviously to identify the SaaS and Cloud vendors that you use. Once you have listed them, drill down into the specific services and regions that you use. Note that not all methods we describe in this article support monitoring specific services and regions.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="locate-the-public-status-pages">Locate the Public Status Pages<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#locate-the-public-status-pages" class="hash-link" aria-label="Direct link to Locate the Public Status Pages" title="Direct link to Locate the Public Status Pages" translate="no">​</a></h3>
<p>Most third-party services have a publicly available status page. You can usually find the link on their company website or on their support portal. The status page is either directly managed by your SaaS vendor or outsourced to another service like Atlassian Statuspage, Instatus, Hund, StatusCast, StatusHub, etc. Many monitoring and incident management services like Incident.io and BetterStack also have public status pages as part of their observability and incident management products.</p>
<p>Large cloud platforms like Google Cloud, Amazon Web Services, Microsoft Azure, Hetzner etc. have their own status page software.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="understand-the-status-page-structure">Understand the Status Page Structure<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#understand-the-status-page-structure" class="hash-link" aria-label="Direct link to Understand the Status Page Structure" title="Direct link to Understand the Status Page Structure" translate="no">​</a></h3>
<p>There is no official standard for status page formats but most of them use a similar layout to present the most important information - the current status - first. Visual appeal, ease of use, reliability, timely and useful information, and accessibility are <a class="" href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page">important considerations for public status pages</a>.</p>
<p>Some status pages segregate ongoing incidents into maintenance and outages. <a class="" href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance">Upcoming maintenances</a> are listed so that users can prepare for them.</p>
<img style="max-width:70%;display:block;margin:0 auto;box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/zscaler-status-page-landing.webp" alt="Zscaler status page">
<p style="margin-top:10px;text-align:center;font-size:12px;color:#666">Screenshot from the <a rel="noopener noreferrer nofollow" href="https://trust.zscaler.com/" target="_blank">Zscaler status page</a>.</p>
<p>Common terms used to describe incident states are:</p>
<ul>
<li class="">Investigating</li>
<li class="">Identified</li>
<li class="">Monitoring</li>
<li class="">Resolved</li>
</ul>
<p>although these can differ from page to page.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="configure-notifications-if-available">Configure Notifications if Available<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#configure-notifications-if-available" class="hash-link" aria-label="Direct link to Configure Notifications if Available" title="Direct link to Configure Notifications if Available" translate="no">​</a></h3>
<p>Periodically visiting status pages to check is not a great practice and is impossible once you start to monitor more than a few. Instead, you can choose to sign up to receive notifications when there is an incident created, updated or resolved. Depending on your provider, status pages offer different modes of notification.</p>
<p>Some status pages have only one or two options, or none at all. E.g. The Salesforce status page has only email notifications:</p>
<img style="max-width:70%;display:block;margin:0 auto;box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/salesforce-status-email-notifications.webp" alt="Salesforce status page email notifications">
<p style="margin-top:10px;text-align:center;font-size:12px;color:#666">Screenshot from the <a rel="noopener noreferrer nofollow" href="https://status.salesforce.com/" target="_blank">Salesforce status page</a>.</p>
<p>If the status page is managed by someone other than your cloud provider, your cloud provider can choose to enable/disable some of the available notification options. For an example, both OneWelcome and Ongoing use Atlassian Statuspage but the notification options (as of this writing) are different.</p>
<img style="max-width:70%;display:block;margin:0 auto;box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/onewelcome-status-page.webp" alt="OneWelcome status page notifications">
<p style="margin-top:10px;text-align:center;font-size:12px;color:#666">Screenshot from the <a rel="noopener noreferrer nofollow" href="https://status.onewelcome.com/" target="_blank">OneWelcome status page</a>.</p>
<img style="max-width:70%;display:block;margin:0 auto;box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/ongoing-status-page.webp" alt="Ongoing status page notifications">
<p style="margin-top:10px;text-align:center;font-size:12px;color:#666">Screenshot from the <a rel="noopener noreferrer nofollow" href="https://status.ongoingwarehouse.com/uptime#" target="_blank">Ongoing status page</a>.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="notification-challenges">Notification Challenges<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#notification-challenges" class="hash-link" aria-label="Direct link to Notification Challenges" title="Direct link to Notification Challenges" translate="no">​</a></h4>
<p>Your notifications should be delivered in a way that ensures the right team receives the alerts. To be able to make it part of your team's workflow, the status page should support the notification channels that your team uses. Your vendor status pages won't have homogeneous notification options, which becomes an obvious hurdle here. They might not offer the option you need. See the section on <a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#using-a-status-page-aggregator" class="">Using a Status Page Aggregator</a> on how to mitigate this.</p>
<p>Also, there is no way to test these notifications until the next outage happens.</p>
<p>If you are manually monitoring status pages, you can check if the status page supports filtering by components and regions. If it does, use it so that your team is not flooded with unnecessary notifications.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="subscribing-to-status-page-rss-feeds">Subscribing to Status Page RSS Feeds<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#subscribing-to-status-page-rss-feeds" class="hash-link" aria-label="Direct link to Subscribing to Status Page RSS Feeds" title="Direct link to Subscribing to Status Page RSS Feeds" translate="no">​</a></h3>
<p>Some vendor status pages have RSS feeds that you can subscribe to. You can pipe the RSS feed into your Slack, Discord, or MS Teams channel. While this gives you a good way of getting outage information directly from the vendor, it has several drawbacks:</p>
<ul>
<li class="">Not all status pages have RSS feeds.</li>
<li class="">This approach lacks any filtering for components and regions. You will end up receiving every single outage and maintenance notification.</li>
<li class="">There is no way to search through historical data easily or look at ongoing incidents and maintenance.</li>
<li class="">There is no single view of all your services in a single place. The alerts will be in your Slack or Discord or MS Teams channel.</li>
<li class="">Some RSS feeds won't notify you when an incident is resolved.</li>
<li class="">This approach can break easily when your vendor changes status page providers as the RSS feed URL might change or get removed completely. Your team would stop receiving notifications and would not know about it until it's too late.</li>
</ul>
<p>In short, subscribing to RSS feeds does not hold up as a solution for staying on top of outages both because of the sheer variety of status pages that are out there and also because of the limitations of the RSS feeds themselves.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="running-your-own-status-page-monitor">Running Your Own Status Page Monitor<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#running-your-own-status-page-monitor" class="hash-link" aria-label="Direct link to Running Your Own Status Page Monitor" title="Direct link to Running Your Own Status Page Monitor" translate="no">​</a></h2>
<p>There are some open source tools that attempt to monitor third party status pages. However, they are not full-fledged solutions.</p>
<ul>
<li class=""><a rel="noopener noreferrer nofollow" href="https://github.com/metoro-io/statusphere" target="_blank">statusphere</a> : This is not being actively developed. As of this writing the last commit was in 2024.</li>
<li class=""><a rel="noopener noreferrer nofollow" href="https://github.com/DrDroidLab/status-page-aggregator" target="_blank">status-page-aggregator</a> : According to the README, it is a "A production-ready status monitoring dashboard that SRE teams can fork and customize for their specific vendor dependencies." However, it lacks out of the box notification support except for email. There is also no way to choose specific vendors to monitor.</li>
<li class=""><a rel="noopener noreferrer nofollow" href="https://github.com/yash492/statusy" target="_blank">statusy</a> : This repository is also archived and not being developed any longer.</li>
</ul>
<p>Such tools may not support all the services or features you need. There is also no guarantee of them being bug-free, or of getting support when you need it. Hosting your own monitor from an open source project has almost the same level of effort as developing and maintaining your own in-house tool. That comes with its own challenges of engineering cost, maintenance, new service support, reliability, and uptime. In addition, there are technical challenges in maintaining such a tool:</p>
<ul>
<li class="">Some status pages without any RSS feed or APIs need to be scraped. HTML scraping can easily break when the status page layout changes.</li>
<li class="">Many status pages have CAPTCHA/bot protection to prevent abusive behavior. If your status page monitor tool does not have the correct logic to prevent such behavior, it can easily get blocked from further data collection.</li>
<li class="">All status page providers enforce rate limiting for API requests. If your tool does not respect these limits, it will be blocked from further API calls.</li>
<li class="">Vendors moving to new status page providers can break your status page parsing as the API endpoints will change.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="using-a-status-page-aggregator">Using a Status Page Aggregator<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#using-a-status-page-aggregator" class="hash-link" aria-label="Direct link to Using a Status Page Aggregator" title="Direct link to Using a Status Page Aggregator" translate="no">​</a></h2>
<p>A status page aggregator periodically tracks multiple status pages and centralizes third-party dependency status in one page. It presents a single status page for multiple other status pages by combining and normalizing data from them. The data is fetched using different methods including RSS feeds, APIs, scraping, and webhooks.</p>
<img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/incidenthub-public-status-page.webp" alt="IncidentHub public status page">
<p style="margin-top:10px;text-align:center;font-size:12px;color:#666">Screenshot from an <a rel="noopener noreferrer nofollow" href="https://incidenthub.cloud/" target="_blank">IncidentHub public status page</a>.</p>
<p>IncidentHub is a status page aggregator built specifically for monitoring third-party SaaS and Cloud status pages.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="key-benefits-of-managed-status-page-aggregators">Key Benefits of Managed Status Page Aggregators<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#key-benefits-of-managed-status-page-aggregators" class="hash-link" aria-label="Direct link to Key Benefits of Managed Status Page Aggregators" title="Direct link to Key Benefits of Managed Status Page Aggregators" translate="no">​</a></h3>
<p>A managed status page aggregator:</p>
<ul>
<li class="">Normalizes external data sources (status pages, APIs, RSS feeds, webhooks) that differ in format and terminology.</li>
<li class="">Gives you one status page for multiple other status pages.</li>
<li class="">Adapts to changing status page formats, URLs, and providers.</li>
<li class="">Supports advanced alert filtering and notification options not available on the status page.</li>
<li class="">Lets you analyze historical data and availability trends.</li>
<li class="">Supports multiple users and teams which is useful for large organizations.</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="status-page-aggregator-best-practices">Status Page Aggregator Best Practices<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#status-page-aggregator-best-practices" class="hash-link" aria-label="Direct link to Status Page Aggregator Best Practices" title="Direct link to Status Page Aggregator Best Practices" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="set-up-component-filtering">Set up Component Filtering<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#set-up-component-filtering" class="hash-link" aria-label="Direct link to Set up Component Filtering" title="Direct link to Set up Component Filtering" translate="no">​</a></h4>
<p>An outage alert is relevant only if it directly affects your business in some way. Instead of receiving alerts for each and every incident from a vendor, you can select the specific components that your organization actually uses. This prevents alert fatigue and keeps your notifications relevant.</p>
<div style="background-color:var(--ifm-color-emphasis-200);padding-top:15px;padding-bottom:15px;padding-left:20px;padding-right:20px;margin-top:10px;border-radius:10px;border-left:3px solid #60a5fa"><b style="text-align:justify"><p>ℹ️ Real World Alerting Insights</p></b><span style="text-align:justify"><ul>
<li class="">Cloudflare, Salesforce, Amazon Web Services, and Google Cloud Platform are among the services where almost all users set up component filters.</li>
<li class="">Google Cloud Platform has more than 8000 region-service combinations.</li>
<li class="">Slack is the most popular channel (56%), followed by Email (24.7%), for vendor alerts.</li>
</ul><p><em>Source: IncidentHub monitoring data</em></p></span></div>
<br>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="fine-tune-alerts-by-type">Fine-tune Alerts by Type<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#fine-tune-alerts-by-type" class="hash-link" aria-label="Direct link to Fine-tune Alerts by Type" title="Direct link to Fine-tune Alerts by Type" translate="no">​</a></h4>
<p>You can further fine-tune alerts by type. You can set up your status page aggregator to send notifications only for outages, only for maintenances, or for both.
E.g. For some services like Twilio, there are a ton of maintenance events which might not be relevant for your needs even if they are filtered by component.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="fine-tune-alerts-by-lifecycle">Fine-tune Alerts by Lifecycle<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#fine-tune-alerts-by-lifecycle" class="hash-link" aria-label="Direct link to Fine-tune Alerts by Lifecycle" title="Direct link to Fine-tune Alerts by Lifecycle" translate="no">​</a></h4>
<p>Some third-party services are more critical than others for your business. You would want to closely follow every outage update for those until it is resolved. For not so critical services, it is okay to receive alerts only when the outage start and ends. Some status page aggregators let you toggle this.</p>
<div style="background-color:var(--ifm-color-emphasis-200);padding-top:15px;padding-bottom:15px;padding-left:20px;padding-right:20px;margin-top:10px;border-radius:10px;border-left:3px solid #60a5fa"><b style="text-align:justify"><p>💡 Which Settings Should I Choose for Alerts?</p></b><span style="text-align:justify"><p>Too many settings can be overwhelming.</p><p>Start with the basics - which components do you need to monitor? Set up component filters for those. This is the bare minimum.</p><p>Once you have that, decide which types of alerts you need - maintenance updates, outage updates, or both? If neither and you just need an aggregated status page, turn off both. The single status page reflects your component filters also.</p><p>After this, you can choose to fine-tune further at the lifecycle level - when do you want to be notified - when the outage starts, when it ends, or for all updates? This makes sense if you have categorized your services by criticality. If not, skip this.</p><p>At each stage, you can decide if you wish to go to the next stage of fine-tuning.</p></span></div>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="integrate-with-your-teams-workflow">Integrate with Your Team's Workflow<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#integrate-with-your-teams-workflow" class="hash-link" aria-label="Direct link to Integrate with Your Team's Workflow" title="Direct link to Integrate with Your Team's Workflow" translate="no">​</a></h4>
<p>The easiest way to onboard a new tool is to integrate it with your team's existing processes. A status page aggregator can send notifications to whatever your team uses - Slack, Discord, Email, Microsoft Teams, etc. to stay ahead of outages.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="include-third-party-status-in-your-incident-response-plan">Include Third-party Status in Your Incident Response Plan<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#include-third-party-status-in-your-incident-response-plan" class="hash-link" aria-label="Direct link to Include Third-party Status in Your Incident Response Plan" title="Direct link to Include Third-party Status in Your Incident Response Plan" translate="no">​</a></h4>
<p>Including third-party service status in your incident response plan helps correlate alerts from your own systems with alerts from third-party services. This can help your team triage incidents faster. E.g. when you receive an alert for your payment processing microservice that talks to Stripe, you can quickly check Stripe's status and rule out external issues.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="periodically-review-your-organizations-service-usage">Periodically Review Your Organization's Service Usage<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#periodically-review-your-organizations-service-usage" class="hash-link" aria-label="Direct link to Periodically Review Your Organization's Service Usage" title="Direct link to Periodically Review Your Organization's Service Usage" translate="no">​</a></h4>
<p>You can do this as part of your incident response plan review. Your organization may start using new services, stop using existing ones, or start using new services on the same vendor's platform.
This should be reviewed periodically to ensure that your status page aggregator's monitored services are in sync with your organization's usage.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="make-the-aggregated-status-page-easily-accessible">Make the Aggregated Status Page Easily Accessible<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#make-the-aggregated-status-page-easily-accessible" class="hash-link" aria-label="Direct link to Make the Aggregated Status Page Easily Accessible" title="Direct link to Make the Aggregated Status Page Easily Accessible" translate="no">​</a></h4>
<p>A centralized status page with all your services and vendors' status gives you at-a-glance visibility into the status of your third-party dependencies. You can display it on a large screen TV or monitor in your office. Many NOC teams do this. This can help your team stay on top of outages easily even if you have alerts enabled.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="enable-upcoming-maintenance-notifications">Enable Upcoming Maintenance Notifications<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#enable-upcoming-maintenance-notifications" class="hash-link" aria-label="Direct link to Enable Upcoming Maintenance Notifications" title="Direct link to Enable Upcoming Maintenance Notifications" translate="no">​</a></h4>
<p>Some status page aggregators let you enable upcoming maintenance notifications so that your team can prepare for them. This can avoid last-minute surprises when the maintenance is about to start.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="advanced-techniques-with-a-status-page-aggregator">Advanced Techniques with a Status Page Aggregator<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#advanced-techniques-with-a-status-page-aggregator" class="hash-link" aria-label="Direct link to Advanced Techniques with a Status Page Aggregator" title="Direct link to Advanced Techniques with a Status Page Aggregator" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="automate-ticket-creation">Automate Ticket Creation<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#automate-ticket-creation" class="hash-link" aria-label="Direct link to Automate Ticket Creation" title="Direct link to Automate Ticket Creation" translate="no">​</a></h4>
<p>Your customer support team can respond proactively to customer inquiries about outages in your products if they are aware of third-party outages that might be the root cause. If your status page aggregator integrates with platforms like Zendesk, it can automatically create tickets there and keep your support team informed.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="setup-multiple-teams">Setup Multiple Teams<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#setup-multiple-teams" class="hash-link" aria-label="Direct link to Setup Multiple Teams" title="Direct link to Setup Multiple Teams" translate="no">​</a></h4>
<p>In larger organizations, different teams may have different third-party dependencies. They may use different services, or different components from the same service.
The alerting needs may also be different - with each team being in, say, a different Slack channel. Such cases are best handled by creating multiple teams and segregating the aggregate status pages and alerts by team. Each team can also be managed by different admins and include different team members. This way, services, alerts and status page remain separated by team, while the overall data and settings remain under a single organization account.</p>
<div style="background-color:var(--ifm-color-emphasis-200);padding-top:15px;padding-bottom:15px;padding-left:20px;padding-right:20px;margin-top:10px;border-radius:10px;border-left:3px solid #60a5fa"><b style="text-align:justify"><p>📘 Real World Example - Multiple Teams at a FinTech Company</p></b><span style="text-align:justify"><p>A FinTech company wanted to monitor their third-party dependencies which included cloud providers, financial services providers, hosted databases, payment gateways, and other SaaS services.</p><p>The company had multiple teams - Data Platform, Infrastructure, Security. Each team wanted third-party vendor outage alerts in a separate Slack channel. The key cloud providers were AWS and Cloudflare.</p><p>The Data Platform team wanted to monitor Elasticsearch and Amazon RDS, whereas the Infrastructure team wanted to monitor Amazon S3, EC2, and other infrastructure related services.</p><p>This need was best met by creating teams - where each team in the status page aggregator mapped to an organization team. Each team ended up with:</p><ul>
<li class="">An independent list of monitored services with their own component filters and alert filters.</li>
<li class="">Their own aggregated status page, customized with their team's title.</li>
<li class="">Integration with their own Slack channel.</li>
</ul><p>The overall account remained under the control of an organization-level admin.</p><p><em>Source: IncidentHub customer case study</em></p></span></div>
<br>
<p>Advanced status page aggregators support this by letting you setup multiple teams and individual status pages and alerts channels for each.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="custom-integration-using-apis-and-webhooks">Custom Integration Using APIs and Webhooks<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#custom-integration-using-apis-and-webhooks" class="hash-link" aria-label="Direct link to Custom Integration Using APIs and Webhooks" title="Direct link to Custom Integration Using APIs and Webhooks" translate="no">​</a></h4>
<p>If you have custom-developed or legacy tools for alert notifications, or have a custom dashboard for incident management, the standard integrations in a status page aggregator may not be enough.
You can use the aggregator's APIs and webhooks instead to push the outage and maintenance updates to your tool/dashboard.</p>
<p>This preserves a seamless experience for your team where they continue to use their existing tools and can see their third-party dependencies' status in the same place.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="historical-trend-analysis">Historical Trend Analysis<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#historical-trend-analysis" class="hash-link" aria-label="Direct link to Historical Trend Analysis" title="Direct link to Historical Trend Analysis" translate="no">​</a></h4>
<p>You can track historical data for your dependencies to see how often they go down and identify patterns. This can also help you to:</p>
<ul>
<li class="">Validate SLA commitments for contract renewals.</li>
<li class="">Make data-driven decisions when you select vendors.</li>
<li class="">Identify services that are unreliable but critical and justify backup vendors for redundancy.</li>
</ul>
<div style="background-color:var(--ifm-color-emphasis-200);padding-top:15px;padding-bottom:15px;padding-left:20px;padding-right:20px;margin-top:10px;border-radius:10px;border-left:3px solid #60a5fa"><b style="text-align:justify"><p>ℹ️ 2025 Cloud and SaaS Outage Statistics</p></b><span style="text-align:justify"><ul>
<li class="">SaaS and Cloud outages were the highest in September, October, and November 2025. A lot of this was due to the cascading effect of the Cloudflare, Azure, and AWS outages.</li>
<li class="">IncidentHub detected more than 48k outages in 2025 across hundreds of SaaS and Cloud services.</li>
<li class="">Cloud and hosting provider outages accounted for around 22% of the total detected outages. This seemingly low percentage compared to others is because many SaaS services depend on these cloud and hosting providers for their infrastructure.</li>
</ul><p><em>Source: IncidentHub monitoring data</em></p></span></div>
<br>
<img style="max-width:100%;display:block;margin:0 auto;box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/cloud-outages-by-provider-type-2025.webp" alt="2025 outage statistics by vendor type">
<p style="margin-top:10px;text-align:center;font-size:12px;color:#666">Data source: IncidentHub monitoring</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="choosing-your-monitoring-approach">Choosing Your Monitoring Approach<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#choosing-your-monitoring-approach" class="hash-link" aria-label="Direct link to Choosing Your Monitoring Approach" title="Direct link to Choosing Your Monitoring Approach" translate="no">​</a></h2>
<p>Your approach depends largely on your scale and needs. However, given that most teams today - irrespective of size - depend on third-party services, a <a class="" href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator">status page aggregator</a> is usually a hassle-free option. It can scale with your needs.</p>
<p>If your needs are really simple, then you may not need an aggregator - e.g., if:</p>
<ul>
<li class="">You have one or two third-party dependencies.</li>
<li class="">All your status pages allow you to filter by components.</li>
<li class="">All your status pages support your alerting channels.</li>
<li class="">You are not interested in seeing an overall status page for all your status pages, and alerts in email/Slack/Discord are enough.</li>
<li class="">You don't need to fine-tune alerts by type and lifecycle.</li>
<li class="">No need for historical trend analysis.</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="decision-table">Decision Table<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#decision-table" class="hash-link" aria-label="Direct link to Decision Table" title="Direct link to Decision Table" translate="no">​</a></h3>
<table><thead><tr><th>Your Situation</th><th>Recommended Approach</th></tr></thead><tbody><tr><td>1-2 vendors, all have filtering and support your notification channel</td><td>Manual monitoring</td></tr><tr><td>3-5 vendors, mixed notification support</td><td>Consider aggregator</td></tr><tr><td>6+ vendors</td><td>Status page aggregator</td></tr><tr><td>Need historical analysis for any vendor count</td><td>Status page aggregator</td></tr><tr><td>Multiple teams with different dependencies</td><td>Status page aggregator</td></tr><tr><td>Need single status page view</td><td>Status page aggregator</td></tr><tr><td>Vendor status pages keep changing</td><td>Status page aggregator</td></tr><tr><td>Need fine-tuning of alerts by type and lifecycle</td><td>Status page aggregator</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="a-status-page-aggregator-implementation-workflow">A Status Page Aggregator Implementation Workflow<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#a-status-page-aggregator-implementation-workflow" class="hash-link" aria-label="Direct link to A Status Page Aggregator Implementation Workflow" title="Direct link to A Status Page Aggregator Implementation Workflow" translate="no">​</a></h2>
<p>A typical workflow for integrating a status page aggregator into your organization would have the high-level steps outlined below.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="implementation-workflow-overview">Implementation Workflow Overview<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#implementation-workflow-overview" class="hash-link" aria-label="Direct link to Implementation Workflow Overview" title="Direct link to Implementation Workflow Overview" translate="no">​</a></h3>
<img style="max-width:100%;display:block;margin:0 auto;box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/status-page-aggregator-implementation-workflow.webp" alt="Status page aggregator implementation workflow">
<p style="margin-top:10px;text-align:center;font-size:12px;color:#666">Status Page Aggregator Implementation Workflow diagram</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="discovery">Discovery<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#discovery" class="hash-link" aria-label="Direct link to Discovery" title="Direct link to Discovery" translate="no">​</a></h3>
<p>List down your third party services and their components/regions that you use. If you have multiple teams, you need to do this separately for each team.
If you have different levels of criticality for your services, you need to categorize them accordingly. These settings will be used to configure the alerts later - see the sections <a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#set-up-component-filtering" class="">Component filtering</a>, <a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#fine-tune-alerts-by-type" class="">Fine-tune Alerts by Type</a>, and <a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#fine-tune-alerts-by-lifecycle" class="">Fine-tune Alerts by Lifecycle</a>.</p>
<p>If you have services with a lot of components (e.g. like Cloudflare, Google Cloud Platform, etc), you can directly configure the aggregator instead of listing them down first. It is helpful, however, to have the list of vendors in one place. You may be able to get the vendor list from your compliance or security team. For smaller organizations, you might have to compile this yourself. Other sources can be:</p>
<ul>
<li class="">Billing and procurement systems.</li>
<li class="">IT asset inventory.</li>
<li class="">Team surveys.</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="team-setup">Team Setup<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#team-setup" class="hash-link" aria-label="Direct link to Team Setup" title="Direct link to Team Setup" translate="no">​</a></h3>
<p>Create individual teams in the aggregatorif you have more than one team and each team has different needs for dependencies and alerting. Each team can have one or more admins and team members if the aggregator supports it.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="monitoring-configuration">Monitoring Configuration<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#monitoring-configuration" class="hash-link" aria-label="Direct link to Monitoring Configuration" title="Direct link to Monitoring Configuration" translate="no">​</a></h3>
<p>If you have multiple teams, add the monitored services and components/regions for each team in the status page aggregator. You can further fine-tune the alert types if required. For a single team, the setup is much simpler.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="alerting-configuration">Alerting Configuration<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#alerting-configuration" class="hash-link" aria-label="Direct link to Alerting Configuration" title="Direct link to Alerting Configuration" translate="no">​</a></h3>
<p>Integrate the alerts with your team's existing tools - Slack, Discord, Email, Microsoft Teams, etc. Some status page aggregators like IncidentHub let you send a test message to ensure that your integration is working fine.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="a-single-status-page-for-many-status-pages">A Single Status Page for Many Status Pages<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#a-single-status-page-for-many-status-pages" class="hash-link" aria-label="Direct link to A Single Status Page for Many Status Pages" title="Direct link to A Single Status Page for Many Status Pages" translate="no">​</a></h3>
<p>Set up the single status page, or one for each team (if you have multiple teams). This will be the single point of truth for your team to see the status of all their services.
Depending on the status page aggregator, you can whitelabel the status page with your own logo, brand colors, custom domain, and other branding elements. You can also add a password to the status page to prevent unauthorized access.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="integration-into-incident-response-plan">Integration Into Incident Response Plan<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#integration-into-incident-response-plan" class="hash-link" aria-label="Direct link to Integration Into Incident Response Plan" title="Direct link to Integration Into Incident Response Plan" translate="no">​</a></h3>
<p>Include third-party status monitoring in your incident response plan. Your on-call engineers should be able to access and understand the aggregated status page and the alerts. Make this a part of your on-call training and awareness programs.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="common-status-page-aggregator-implementation-pitfalls">Common Status Page Aggregator Implementation Pitfalls<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#common-status-page-aggregator-implementation-pitfalls" class="hash-link" aria-label="Direct link to Common Status Page Aggregator Implementation Pitfalls" title="Direct link to Common Status Page Aggregator Implementation Pitfalls" translate="no">​</a></h2>
<p>Here are some common pitfalls to avoid.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="starting-with-too-broad-settings">Starting With Too Broad Settings<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#starting-with-too-broad-settings" class="hash-link" aria-label="Direct link to Starting With Too Broad Settings" title="Direct link to Starting With Too Broad Settings" translate="no">​</a></h3>
<p>Enabling all alerts can overwhelm your team and easily lead to a state where they start to ignore alerts. Choose your filters and other settings carefully so that you are not flooded with unnecessary alerts. It might take a few iterations to get this right.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="accessibility-and-visibility">Accessibility and Visibility<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#accessibility-and-visibility" class="hash-link" aria-label="Direct link to Accessibility and Visibility" title="Direct link to Accessibility and Visibility" translate="no">​</a></h3>
<p>The on-call teams, developers, Ops and IT teams should be able to use this information to triage ongoing outages. The dashboard and alert channels should be easily accessible and visible.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="stale-list-of-monitored-services">Stale List of Monitored Services<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#stale-list-of-monitored-services" class="hash-link" aria-label="Direct link to Stale List of Monitored Services" title="Direct link to Stale List of Monitored Services" translate="no">​</a></h3>
<p>Services, regions, vendors get added/removed/changed frequently. The list of monitored services should be updated regularly to avoid stale information as part of your incident response plan reviews.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="non-status-page-methods">Non-Status Page Methods<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#non-status-page-methods" class="hash-link" aria-label="Direct link to Non-Status Page Methods" title="Direct link to Non-Status Page Methods" translate="no">​</a></h2>
<p>These methods are included here for completeness. They are not as comprehensive as the status page monitoring approaches and are not recommended for serious use cases.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="using-third-party-telemetry-data-sites">Using Third Party Telemetry Data Sites<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#using-third-party-telemetry-data-sites" class="hash-link" aria-label="Direct link to Using Third Party Telemetry Data Sites" title="Direct link to Using Third Party Telemetry Data Sites" translate="no">​</a></h3>
<p>DataDog, the observability and monitoring platform, recently announced a site called <a rel="noopener noreferrer nofollow" href="https://updog.ai/" target="_blank">updog.ai</a>. It curates data from DataDog's own customers who monitor their infrastructure using DataDog. While this can be a limited early warning system, it does not suffice as a comprehensive monitoring solution:</p>
<ul>
<li class="">It is limited to the vendors that DataDog's customers monitor. E.g. cloud vendors like AWS are present, but IT management software like Kaseya are not. This is understandable as the data mostly comes from users who use cloud and SaaS services to run their applications. As a user you have no control over the telemetry - which is necessarily constrained by which services, regions, or components of a given cloud service DataDog's customers monitor.</li>
<li class="">There is no way to monitor specific services and regions and view them on a single dashboard. E.g., you can filter AWS services but you have to visit each service's page on the Updog website to see its status.</li>
<li class="">There is no way to receive notifications from the Updog website when a service is down.</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="using-crowdsourced-information-sites">Using Crowdsourced Information Sites<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#using-crowdsourced-information-sites" class="hash-link" aria-label="Direct link to Using Crowdsourced Information Sites" title="Direct link to Using Crowdsourced Information Sites" translate="no">​</a></h3>
<p>There are numerous sites like Downdetector which rely on user reports to aggregate information about service outages. By definition, outage reports on such sites are not comprehensive and neither do they provide a holistic view of the outage:</p>
<ul>
<li class="">User reports can have false positives. E.g. ISP issues interpreted as service outages.</li>
<li class="">If users don't report an outage for a specific region, or a service, that's a gap in the coverage.</li>
</ul>
<p>Such sites also do not provide a way to receive notifications when a service is down, or let you create your own single status page.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="using-social-media">Using Social Media<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#using-social-media" class="hash-link" aria-label="Direct link to Using Social Media" title="Direct link to Using Social Media" translate="no">​</a></h3>
<p>Social media like X, Bluesky, Mastodon, Reddit, Hacker News etc. are the first place people often check for outage reports or post asking for information. Such reports can be helpful to know that something might be off with a service, but they cannot be relied upon for the same reasons as the previous point about crowdsourced information.</p>
<p>Some services have an official social media account, but most don't. For example, Microsoft 365 has an official X account where they announce service outages.</p>
<img style="max-width:70%;display:block;margin:0 auto;box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/microsoft-365-status-x.webp" alt="Microsoft 365 status page">
<p style="margin-top:10px;text-align:center;font-size:12px;color:#666">Screenshot from the <a rel="noopener noreferrer nofollow" href="https://x.com/MSFT365Status/" target="_blank">Microsoft 365 status account on X</a>.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="conclusion">Conclusion<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>2025 saw a number of major cloud outages, especially in the infrastructure layer, demonstrating the impact that a few providers can have on other services and users. It is more critical than ever to monitor your third-party dependencies' status. A managed status page aggregator is the most comprehensive way to do this.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="faq">FAQ<a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide#faq" class="hash-link" aria-label="Direct link to FAQ" title="Direct link to FAQ" translate="no">​</a></h2>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>How do I monitor multiple status pages at once?</summary><div><div class="collapsibleContent_i85q"><p></p><p>You can monitor multiple status pages at once by using a status page aggregator.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>Should I monitor internal tools like Slack, Jira, or Asana?</summary><div><div class="collapsibleContent_i85q"><p></p><p>Yes. Outages in these tools can disrupt team communication and collaboration, and ultimately affect your business.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>How do I prevent alert fatigue from status pages?</summary><div><div class="collapsibleContent_i85q"><p></p><p>You can prevent alert fatigue from status pages by configuring your status page aggregator to send alerts only for the components and regions that you need to monitor.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>How do I know which components to monitor for each vendor?</summary><div><div class="collapsibleContent_i85q"><p></p><p>Start with what you know that you use. For cloud providers, check your infrastructure configuration   which AWS services appear in your console and which regions host your resources. For SaaS platforms, review which features your team actively uses. When in doubt, monitor broader categories initially, then narrow down as you understand it more.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>Can I monitor the same vendor for different teams with different needs using a status page aggregator?</summary><div><div class="collapsibleContent_i85q"><p></p><p>Yes. For example, your infrastructure team would want to monitor AWS EC2, while your Data Platform team would want to monitor AWS RDS.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>What if a vendor doesn't have a public status page?</summary><div><div class="collapsibleContent_i85q"><p></p><p>Some smaller vendors don't maintain public status pages. For critical vendors without public status pages, this should be a consideration during vendor evaluation. You can request they implement one, or you'll need to rely on direct support channels during incidents.</p><p></p><details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>What happens when my vendor changes status page providers?</summary><div><div class="collapsibleContent_i85q"><p></p><p>With manual monitoring, you will need to track such changes and reconfigure everything. A managed aggregator handles these transitions automatically and you will not lose any monitoring coverage.</p><p></p></div></div></details></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>Should I monitor maintenance windows?</summary><div><div class="collapsibleContent_i85q"><p></p><p>Yes, especially for critical services. Knowing about scheduled maintenance helps you plan around potential disruptions to your customers and users. For non-critical services where maintenance doesn't impact your customers and users, you can disable maintenance notifications.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>Can I monitor vendors that don't use standard status page platforms?</summary><div><div class="collapsibleContent_i85q"><p></p><p>Yes. Modern status page aggregators can monitor various formats - RSS feeds, custom status pages, API endpoints, and also standard platforms like Atlassian Statuspage. The aggregator handles the complexity of different formats and presents everything in a consistent view.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>My vendor had an incident but I didn't receive an alert. What happened?</summary><div><div class="collapsibleContent_i85q"><p></p><p>Check if the incident affected components or regions you're monitoring. Many incidents are isolated to specific features or geographic regions. If it affected something you use, verify your component filters and notification settings are configured correctly.</p><p></p></div></div></details>
<hr>
<p><em>IncidentHub is not affiliated with any of the services and vendors mentioned in this article. All logos and company names are trademarks or registered trademarks of their respective holders</em></p>
<p>This article was first published on the <a href="https://blog.incidenthub.cloud/monitoring-saas-status-2026-complete-guide" target="_blank" rel="noopener noreferrer" class="">IncidentHub blog</a>.</p>]]></content:encoded>
            <category>Monitoring</category>
            <category>SaaS</category>
            <category>Status Pages</category>
            <category>Status Page Aggregators</category>
        </item>
        <item>
            <title><![CDATA[Major Cloud Outages of 2025]]></title>
            <link>https://blog.incidenthub.cloud/major-cloud-outages-2025</link>
            <guid>https://blog.incidenthub.cloud/major-cloud-outages-2025</guid>
            <pubDate>Fri, 12 Dec 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[A list of the major cloud outages in 2025.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="cloud-outages-and-their-impact">Cloud Outages and Their Impact<a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#cloud-outages-and-their-impact" class="hash-link" aria-label="Direct link to Cloud Outages and Their Impact" title="Direct link to Cloud Outages and Their Impact" translate="no">​</a></h2>
<p>Cloud outages in 2025 ranged from minor ones affecting some sections of users, to major ones affecting hundreds or thousands of users. Services like Cloudflare and AWS on which many other services
depend experienced outages that affected many due to the cascading effect.</p>
<p>Let's look at some of the major cloud outages in 2025.</p>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/major-cloud-outages-in-2025.webp" alt="Major Cloud Outages in 2025">
<hr>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#cloud-outages-and-their-impact" class="">Cloud Outages and Their Impact</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#major-cloud-outages-in-2025" class="">Major Cloud Outages in 2025</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#microsoft-azure---8-january" class="">Microsoft Azure - 8 January</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#slack---26-february" class="">Slack - 26 February</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#pagerduty---26-february" class="">PagerDuty - 26 February</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#openai---8-april" class="">OpenAI - 8 April</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#zoom---april-16" class="">Zoom - April 16</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#google-cloud-platform---12-june" class="">Google Cloud Platform - 12 June</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#microsoft-azure---6-september" class="">Microsoft Azure - 6 September</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#google-workspace---18-september" class="">Google Workspace - 18 September</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#amazon-web-services---october-20" class="">Amazon Web Services - October 20</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#microsoft-azure---29-october" class="">Microsoft Azure - 29 October</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#cloudflare---november-18" class="">Cloudflare - November 18</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#openai-26-november" class="">OpenAI 26 November</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#cloudflare---december-5" class="">Cloudflare - December 5</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#the-impact-of-infrastructure-provider-outages" class="">The Impact of Infrastructure Provider Outages</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#using-a-state-page-aggregator-to-monitor-cloud-outages" class="">Using A State Page Aggregator to Monitor Cloud Outages</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#conclusion" class="">Conclusion</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="major-cloud-outages-in-2025">Major Cloud Outages in 2025<a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#major-cloud-outages-in-2025" class="hash-link" aria-label="Direct link to Major Cloud Outages in 2025" title="Direct link to Major Cloud Outages in 2025" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="microsoft-azure---8-january">Microsoft Azure - 8 January<a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#microsoft-azure---8-january" class="hash-link" aria-label="Direct link to Microsoft Azure - 8 January" title="Direct link to Microsoft Azure - 8 January" translate="no">​</a></h3>
<p>A networking configuration change in East US2 caused "connectivity issues, prolonged timeouts, connection drops, and resource allocation failures" across multiple
Azure services. Loss of indexing data in the Azure PubSub service - which is used by the networking control plane to communicate between control entities and agents on individual hosts - caused
networking configuration to not be delivered to the agents. The outage lasted around 50 hours.</p>
<p>It's noteworthy that the RCA report mentions that "Services that were configured to be zonally redundant and leveraging VNet integration may have experienced impact across multiple zones." The RCA also mentions a step to prevent this
in future outages.</p>
<p>Link to incident: <a rel="noopener noreferrer nofollow" href="https://azure.status.microsoft/status/history/?trackingId=PLP3-1W8" target="_blank"></a><a href="https://azure.status.microsoft/status/history/?trackingId=PLP3-1W8" target="_blank" rel="noopener noreferrer" class="">https://azure.status.microsoft/status/history/?trackingId=PLP3-1W8</a></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="slack---26-february">Slack - 26 February<a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#slack---26-february" class="hash-link" aria-label="Direct link to Slack - 26 February" title="Direct link to Slack - 26 February" translate="no">​</a></h3>
<p>An outage in the Slack Events API caused custom applications, integrations, and bots to not work as expected. This outage was traced to the mitigation steps taken to resolve an earlier incident. This lasted around 25 hours.</p>
<p>Link to incident: <a rel="noopener noreferrer nofollow" href="https://slack-status.com/2025-02/d41e4bfd1ccae26a" target="_blank"></a><a href="https://slack-status.com/2025-02/d41e4bfd1ccae26a" target="_blank" rel="noopener noreferrer" class="">https://slack-status.com/2025-02/d41e4bfd1ccae26a</a></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="pagerduty---26-february">PagerDuty - 26 February<a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#pagerduty---26-february" class="hash-link" aria-label="Direct link to PagerDuty - 26 February" title="Direct link to PagerDuty - 26 February" translate="no">​</a></h3>
<p>Slack integration for PagerDuty was affected due to an outage in Slack (see above). Users who depend on PagerDuty applications for Slack were unable to receive notifications. With <a class="" href="https://blog.incidenthub.cloud/The-Rising-Role-of-Slack-in-Incident-Management">Slack</a> becoming more and more popular as a central tool in incident management, this outage disrupted the workflow for many teams, including those who did not use the PagerDuty application.</p>
<p>This is a classic example of an incident notification tool itself being affected by an incident - in this case, on a dependent provider.
PagerDuty notes in its future improvement points that it will work on "Improving our monitoring to ensure that we can respond faster to issues with our integration partners" and also to figure out other ways of improving the user experience in such cases.</p>
<p>Link to incident: <a rel="noopener noreferrer nofollow" href="https://status.pagerduty.com/posts/details/PB5WBMB" target="_blank"></a><a href="https://status.pagerduty.com/posts/details/PB5WBMB" target="_blank" rel="noopener noreferrer" class="">https://status.pagerduty.com/posts/details/PB5WBMB</a></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="openai---8-april">OpenAI - 8 April<a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#openai---8-april" class="hash-link" aria-label="Direct link to OpenAI - 8 April" title="Direct link to OpenAI - 8 April" translate="no">​</a></h3>
<p>OpenAI ran into capacity issues while handling requests for Sora, its sophisticated video generation model. To mitigate the problem, OpenAI rolled out Sora capabilities in the same order that users signed up for it. The issue was officially marked
as resolved on 30th April - around 22 days after it was reported.</p>
<p>Link to incident: <a rel="noopener noreferrer nofollow" href="https://status.openai.com/incidents/01JRB888M6TJVDDKGCA1YZ3ZHT" target="_blank"></a><a href="https://status.openai.com/incidents/01JRB888M6TJVDDKGCA1YZ3ZHT" target="_blank" rel="noopener noreferrer" class="">https://status.openai.com/incidents/01JRB888M6TJVDDKGCA1YZ3ZHT</a></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="zoom---april-16">Zoom - April 16<a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#zoom---april-16" class="hash-link" aria-label="Direct link to Zoom - April 16" title="Direct link to Zoom - April 16" translate="no">​</a></h3>
<p>This started as a domain resolution failure when the zoom.us domain was blocked by its registrar. The TLD nameservers stopped resolving the domain as well as all its subdomains.
This outage officially lasted around 1 hour 47 minutes, and received a lot of publicity due to Zoom's widespread usage.</p>
<p>Link to incident: <a rel="noopener noreferrer nofollow" href="https://www.zoomstatus.com/incidents/pw9r9vnq5rvk" target="_blank"></a><a href="https://www.zoomstatus.com/incidents/pw9r9vnq5rvk" target="_blank" rel="noopener noreferrer" class="">https://www.zoomstatus.com/incidents/pw9r9vnq5rvk</a></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="google-cloud-platform---12-june">Google Cloud Platform - 12 June<a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#google-cloud-platform---12-june" class="hash-link" aria-label="Direct link to Google Cloud Platform - 12 June" title="Direct link to Google Cloud Platform - 12 June" translate="no">​</a></h3>
<p>Affecting mainly Google Cloud services and a handful of Google Workspace services, this outage was caused by a bad automated update to Google Cloud's quota check system. The update propagated globally, affecting external API requests and impacting many
products.</p>
<p>The incident lasted around 3 hours. 76 different Google Cloud services were affected.</p>
<p>Link to incident: <a rel="noopener noreferrer nofollow" href="https://status.cloud.google.com/incidents/ow5i3PPK96RduMcb1SsW" target="_blank"></a><a href="https://status.cloud.google.com/incidents/ow5i3PPK96RduMcb1SsW" target="_blank" rel="noopener noreferrer" class="">https://status.cloud.google.com/incidents/ow5i3PPK96RduMcb1SsW</a></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="microsoft-azure---6-september">Microsoft Azure - 6 September<a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#microsoft-azure---6-september" class="hash-link" aria-label="Direct link to Microsoft Azure - 6 September" title="Direct link to Microsoft Azure - 6 September" translate="no">​</a></h3>
<p>Multiple undersea cable cuts in the Red Sea caused widespread disruption to global communications passing through Azure's network.</p>
<p>As a workaround, Microsoft Azure rerouted traffic through alternate paths.</p>
<p>Link to Hacker News discussion: <a rel="noopener noreferrer nofollow" href="https://news.ycombinator.com/item?id=45152773" target="_blank"></a><a href="https://news.ycombinator.com/item?id=45152773" target="_blank" rel="noopener noreferrer" class="">https://news.ycombinator.com/item?id=45152773</a></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="google-workspace---18-september">Google Workspace - 18 September<a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#google-workspace---18-september" class="hash-link" aria-label="Direct link to Google Workspace - 18 September" title="Direct link to Google Workspace - 18 September" translate="no">​</a></h3>
<p>A resource contention issue in Google's authentication system caused login failures across many Google services. It was mitigated by increasing the available capacity.</p>
<p>The outage lasted 1 hour 13 minutes. Due to the key nature of the affected service (authentication), users were unable to access other services, increasing the blast radius of the outage.</p>
<p>Link to incident: <a rel="noopener noreferrer nofollow" href="https://www.google.com/appsstatus/dashboard/incidents/5V5yK8N8heBKnmdqS1eW" target="_blank"></a><a href="https://www.google.com/appsstatus/dashboard/incidents/5V5yK8N8heBKnmdqS1eW" target="_blank" rel="noopener noreferrer" class="">https://www.google.com/appsstatus/dashboard/incidents/5V5yK8N8heBKnmdqS1eW</a></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="amazon-web-services---october-20">Amazon Web Services - October 20<a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#amazon-web-services---october-20" class="hash-link" aria-label="Direct link to Amazon Web Services - October 20" title="Direct link to Amazon Web Services - October 20" translate="no">​</a></h3>
<p>A race condition in Amazon's DynamoDB's DNS caused prolonged impact in other AWS services in us-east-1. Many AWS services depend on DynamoDB internally. 141 services in AWS were affected by the outage.</p>
<p>Link to post-incident write up: <a rel="noopener noreferrer nofollow" href="https://aws.amazon.com/message/101925/" target="_blank"></a><a href="https://aws.amazon.com/message/101925/" target="_blank" rel="noopener noreferrer" class="">https://aws.amazon.com/message/101925/</a><br>
<!-- -->Another interesting video on this <a rel="noopener noreferrer nofollow" href="https://www.youtube.com/watch?v=YZUNNzLDWb8" target="_blank"></a><a href="https://www.youtube.com/watch?v=YZUNNzLDWb8" target="_blank" rel="noopener noreferrer" class="">https://www.youtube.com/watch?v=YZUNNzLDWb8</a></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="microsoft-azure---29-october">Microsoft Azure - 29 October<a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#microsoft-azure---29-october" class="hash-link" aria-label="Direct link to Microsoft Azure - 29 October" title="Direct link to Microsoft Azure - 29 October" translate="no">​</a></h3>
<p>A series of customer configuration changes resulted in incompatible metadata being generated in Azure's Content Delivery Network. "This configuration (with the incompatible metadata) completed propagation to a majority of edge sites by 15:39 UTC.", according to the report. Although their internal config protection system caught the impact when it became visible, and stopped all new and inflight configuration change requests, the bad configuration was already processed by the edge server.</p>
<p>Link to incident: <a rel="noopener noreferrer nofollow" href="https://azure.status.microsoft/status/history/?trackingId=YKYN-BWZ" target="_blank"></a><a href="https://azure.status.microsoft/status/history/?trackingId=YKYN-BWZ" target="_blank" rel="noopener noreferrer" class="">https://azure.status.microsoft/status/history/?trackingId=YKYN-BWZ</a><br>
<!-- -->A video retrospective: <a rel="noopener noreferrer nofollow" href="https://www.youtube.com/watch?v=PHvIYrWkAJU" target="_blank"></a><a href="https://www.youtube.com/watch?v=PHvIYrWkAJU" target="_blank" rel="noopener noreferrer" class="">https://www.youtube.com/watch?v=PHvIYrWkAJU</a></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="cloudflare---november-18">Cloudflare - November 18<a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#cloudflare---november-18" class="hash-link" aria-label="Direct link to Cloudflare - November 18" title="Direct link to Cloudflare - November 18" translate="no">​</a></h3>
<p>Cloudflare's network experienced an outage starting at around 11:20 UTC. Since this affected Cloudflare's Sites and Services, it resulted in thousands of websites being rendered inaccessible, thus multiplying the impact.
A database permission change caused a larger than expected file to be fed into Cloudflare's Bot Management System, causing it to crash. This change was propagated all over the world, repeating the pattern we have seen in other outages where one bad
change is replicated across the global network.</p>
<p>Link to incident: <a rel="noopener noreferrer nofollow" href="https://www.cloudflarestatus.com/incidents/8gmgl950y3h7" target="_blank"></a><a href="https://www.cloudflarestatus.com/incidents/8gmgl950y3h7" target="_blank" rel="noopener noreferrer" class="">https://www.cloudflarestatus.com/incidents/8gmgl950y3h7</a><br>
<!-- -->Analysis: <a rel="noopener noreferrer nofollow" href="https://blog.cloudflare.com/18-november-2025-outage/" target="_blank"></a><a href="https://blog.cloudflare.com/18-november-2025-outage/" target="_blank" rel="noopener noreferrer" class="">https://blog.cloudflare.com/18-november-2025-outage/</a></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="openai-26-november">OpenAI 26 November<a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#openai-26-november" class="hash-link" aria-label="Direct link to OpenAI 26 November" title="Direct link to OpenAI 26 November" translate="no">​</a></h3>
<p>Both OpenAI APIs and ChatGPT were affected by control plane failures in some of OpenAI's GPU clusters. The failure was caused by a global change to Kuberenetes namespace labels.</p>
<p>Link to incident: <a rel="noopener noreferrer nofollow" href="https://status.openai.com/incidents/3pn6gclf1cjx" target="_blank"></a><a href="https://status.openai.com/incidents/3pn6gclf1cjx" target="_blank" rel="noopener noreferrer" class="">https://status.openai.com/incidents/3pn6gclf1cjx</a></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="cloudflare---december-5">Cloudflare - December 5<a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#cloudflare---december-5" class="hash-link" aria-label="Direct link to Cloudflare - December 5" title="Direct link to Cloudflare - December 5" translate="no">​</a></h3>
<p>Cloudflare's dashboard and related APIs were affected by a change that was deployed to mitigate a React Server Components vulnerability.</p>
<p>Link to incident: <a rel="noopener noreferrer nofollow" href="https://www.cloudflarestatus.com/incidents/lfrm31y6sw9q" target="_blank"></a><a href="https://www.cloudflarestatus.com/incidents/lfrm31y6sw9q" target="_blank" rel="noopener noreferrer" class="">https://www.cloudflarestatus.com/incidents/lfrm31y6sw9q</a><br>
<!-- -->A detailed write up: <a rel="noopener noreferrer nofollow" href="https://blog.cloudflare.com/5-december-2025-outage/" target="_blank"></a><a href="https://blog.cloudflare.com/5-december-2025-outage/" target="_blank" rel="noopener noreferrer" class="">https://blog.cloudflare.com/5-december-2025-outage/</a></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-impact-of-infrastructure-provider-outages">The Impact of Infrastructure Provider Outages<a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#the-impact-of-infrastructure-provider-outages" class="hash-link" aria-label="Direct link to The Impact of Infrastructure Provider Outages" title="Direct link to The Impact of Infrastructure Provider Outages" translate="no">​</a></h2>
<p>In the second half of 2025, 3 outages caused major disruption, demonstrating once again that the lower the service in the stack, the larger the blast radius of the outage. Cloudflare (twice), Azure, and Amazon Web Services affected
thousands of downstream cloud and SaaS services and customers.</p>
<p>An infrastructure outage can affect other services like dev tools and authentication providers. An authentication provider outage can affect other services like a remote monitoring tool in turn.</p>
<p>This is also reflected in the rise in outages in September, October, and November, when we saw Azure, AWS, and Cloudflare go down.</p>
<img style="border-radius:10px;border:1px solid var(--ifm-color-emphasis-600);background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)" src="https://cdn.incidenthub.cloud/blog/cloud-outage-numbers-2025-by-month.webp" alt="Cloud Outage Numbers in 2025 by Month">
<p><em>Data source: IncidentHub's monitoring of public status pages.</em></p>
<p>Overall, cloud provider outages remain the second highest category of outages in 2025, after cybersecurity providers.</p>
<img style="border-radius:10px;border:1px solid var(--ifm-color-emphasis-600);background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)" src="https://cdn.incidenthub.cloud/blog/outages-in-2025-by-service-type.webp" alt="Cloud Outage Numbers in 2025 by Service Type">
<p><em>Data source: IncidentHub's monitoring of public status pages.</em></p>
<p>There is also another pattern in infrastructure provider outages. Such services use their own internal distributed infrastructure to propagate changes globally to control plane software like agents in other regions. A bad change will propagate with the same speed as a good one:</p>
<div style="background-color:var(--ifm-color-emphasis-200);padding-top:15px;padding-bottom:4px;padding-left:20px;padding-right:20px;margin-top:10px;border-radius:10px;border-left:3px solid #60a5fa"><b style="text-align:left"><p>From the GCP June 12th report</p></b><span style="text-align:justify"><p>"Given the global nature of quota management, this metadata was replicated globally within seconds."</p></span></div>
<br>
<p>Thus, the impact of such bad changes is not local and spreads quickly.</p>
<p>Each outage remains an opportunity to to improve processes. Most large cloud providers are proactive in sharing detailed technical root causes of outages.</p>
<p>Last but not the least, an outage is an extremely stressful experience for the folks on-call, especially when there are thousands of other services and users affected. Mitigating an outage in a complex, globally distributed system is a Herculean task.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="using-a-state-page-aggregator-to-monitor-cloud-outages">Using A State Page Aggregator to Monitor Cloud Outages<a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#using-a-state-page-aggregator-to-monitor-cloud-outages" class="hash-link" aria-label="Direct link to Using A State Page Aggregator to Monitor Cloud Outages" title="Direct link to Using A State Page Aggregator to Monitor Cloud Outages" translate="no">​</a></h2>
<p>IncidentHub is a cloud-based status page aggregator service that monitors the availability of your third-party cloud services. Outage detection is automatic and happens in real-time. In 2025 IncidentHub tracked more
than 48000 outages across hundreds of SaaS and Cloud services.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="conclusion">Conclusion<a href="https://blog.incidenthub.cloud/major-cloud-outages-2025#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>2025 saw a number of major cloud outages, especially in the infrastructure layer, demonstrating the impact that a few providers can have on other services and users.</p>
<hr>
<p><em>IncidentHub is not affiliated with any of the services and vendors mentioned in this article. All logos and company names are trademarks or registered trademarks of their respective holders</em></p>
<p>This article was first published on the <a href="https://blog.incidenthub.cloud/major-cloud-outages-2025" target="_blank" rel="noopener noreferrer" class="">IncidentHub blog</a>.</p>]]></content:encoded>
            <category>Outages</category>
            <category>Cloud</category>
        </item>
        <item>
            <title><![CDATA[How to Receive Cloud Outage Alerts in Microsoft Teams]]></title>
            <link>https://blog.incidenthub.cloud/how-to-receive-cloud-outage-alerts-in-microsoft-teams</link>
            <guid>https://blog.incidenthub.cloud/how-to-receive-cloud-outage-alerts-in-microsoft-teams</guid>
            <pubDate>Wed, 26 Nov 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn how you can use your Microsoft Teams channel to receive timely outage alerts for your third-party services.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-impact-of-cloud-outages-on-your-business">The Impact of Cloud Outages on Your Business<a href="https://blog.incidenthub.cloud/how-to-receive-cloud-outage-alerts-in-microsoft-teams#the-impact-of-cloud-outages-on-your-business" class="hash-link" aria-label="Direct link to The Impact of Cloud Outages on Your Business" title="Direct link to The Impact of Cloud Outages on Your Business" translate="no">​</a></h2>
<p>Cloud outages like the recent ones at Cloudflare, Microsoft Azure, and AWS can have a significant impact on your business with downtime, lost revenue, and unhappy customers. They can also disrupt your team's ability to work effectively. To stay on top of such outages, your team needs to know about them in an easy and timely way.</p>
<p>In this article, we will see how to integrate IncidentHub cloud outage alerts with Microsoft Teams.</p>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/how-to-receive-cloud-outage-alerts-in-microsoft-teams.webp" alt="How to Receive Cloud Outage Alerts in Microsoft Teams">
<hr>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-receive-cloud-outage-alerts-in-microsoft-teams#the-impact-of-cloud-outages-on-your-business" class="">The Impact of Cloud Outages on Your Business</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-receive-cloud-outage-alerts-in-microsoft-teams#how-does-incidenthub-work" class="">How Does IncidentHub Work?</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-receive-cloud-outage-alerts-in-microsoft-teams#configuring-incidenthub-to-send-alerts-to-microsoft-teams" class="">Configuring IncidentHub to Send Alerts to Microsoft Teams</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-receive-cloud-outage-alerts-in-microsoft-teams#adding-an-integration-for-microsoft-teams" class="">Adding an integration for Microsoft Teams</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-receive-cloud-outage-alerts-in-microsoft-teams#creating-a-workflow-in-microsoft-teams" class="">Creating a Workflow in Microsoft Teams</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-receive-cloud-outage-alerts-in-microsoft-teams#connecting-incidenthub-with-the-workflow" class="">Connecting IncidentHub With the Workflow</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-receive-cloud-outage-alerts-in-microsoft-teams#fine-tuning-your-alerts" class="">Fine-tuning Your Alerts</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-receive-cloud-outage-alerts-in-microsoft-teams#understanding-the-alerts" class="">Understanding the Alerts</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-receive-cloud-outage-alerts-in-microsoft-teams#conclusion" class="">Conclusion</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="how-does-incidenthub-work">How Does IncidentHub Work?<a href="https://blog.incidenthub.cloud/how-to-receive-cloud-outage-alerts-in-microsoft-teams#how-does-incidenthub-work" class="hash-link" aria-label="Direct link to How Does IncidentHub Work?" title="Direct link to How Does IncidentHub Work?" translate="no">​</a></h2>
<p>IncidentHub is a cloud-based status page aggregator service that monitors the availability of your third-party cloud services by checking public status pages. It this by using a combination of API calls, webhooks, RSS feeds, and other methods.</p>
<p>Outage detection is automatic and happens in real-time. Users can choose to receive notifications for specific services in a tool of their choice which includes Email, Slack, Discord, Microsoft Teams, PagerDuty etc. It can easily integrate with Microsoft Teams to send alerts to your team's channel.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="configuring-incidenthub-to-send-alerts-to-microsoft-teams">Configuring IncidentHub to Send Alerts to Microsoft Teams<a href="https://blog.incidenthub.cloud/how-to-receive-cloud-outage-alerts-in-microsoft-teams#configuring-incidenthub-to-send-alerts-to-microsoft-teams" class="hash-link" aria-label="Direct link to Configuring IncidentHub to Send Alerts to Microsoft Teams" title="Direct link to Configuring IncidentHub to Send Alerts to Microsoft Teams" translate="no">​</a></h2>
<p>Once you have created an IncidentHub account you can choose the cloud services you want to monitor. This is covered in detail in the <a href="https://docs.incidenthub.cloud/incidenthub-documentation/services/monitoring-a-service" target="_blank" rel="noopener noreferrer" class="">IncidentHub documentation</a>. Next, we need to integrate IncidentHub with your Microsoft Teams channel so that it can send the notifications there.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="adding-an-integration-for-microsoft-teams">Adding an integration for Microsoft Teams<a href="https://blog.incidenthub.cloud/how-to-receive-cloud-outage-alerts-in-microsoft-teams#adding-an-integration-for-microsoft-teams" class="hash-link" aria-label="Direct link to Adding an integration for Microsoft Teams" title="Direct link to Adding an integration for Microsoft Teams" translate="no">​</a></h3>
<p>IncidentHub integrates with Microsoft Teams using Workflows. Let's do this step by step.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="creating-a-workflow-in-microsoft-teams">Creating a Workflow in Microsoft Teams<a href="https://blog.incidenthub.cloud/how-to-receive-cloud-outage-alerts-in-microsoft-teams#creating-a-workflow-in-microsoft-teams" class="hash-link" aria-label="Direct link to Creating a Workflow in Microsoft Teams" title="Direct link to Creating a Workflow in Microsoft Teams" translate="no">​</a></h4>
<p>→ Login to your Microsoft account and click on "Teams and channels" on the left sidebar in the Microsoft Teams app.</p>
<img style="border-radius:10px;border:1px;max-width:70%" src="https://cdn.incidenthub.cloud/blog/teams-workflow-menu.webp" alt="Microsoft Teams workflow menu">
<p><br>
<!-- -->→ Click on the 3 dots (...) next to the channel you wish to receive notifications in.</p>
<p>→ Click on "Workflows" in the menu.</p>
<p>→ In the popup window, search for "webhook" and choose the template "Send webhook alerts to a channel".</p>
<img style="border-radius:10px;border:1px;max-width:70%" src="https://cdn.incidenthub.cloud/blog/teams-workflow-webhook.webp" alt="Microsoft Teams workflow webhook template">
<p><br>
<!-- -->→ Put a descriptive name for the workflow.</p>
<img style="border-radius:10px;border:1px;max-width:70%" src="https://cdn.incidenthub.cloud/blog/teams-workflow-name.webp" alt="Microsoft Teams workflow name">
<p><br>
<!-- -->→ Confirm the Team and the Channel name.</p>
<img style="border-radius:10px;border:1px;max-width:70%" src="https://cdn.incidenthub.cloud/blog/teams-webhook-channel.webp" alt="Microsoft Teams workflow channel name">
<p><br>
<!-- -->→ Click on Add workflow.</p>
<p>→ Copy the HTTPS URL on the next screen. You will use it in the next steps.</p>
<img style="border-radius:10px;border:1px;max-width:70%" src="https://cdn.incidenthub.cloud/blog/teams-workflow-url.webp" alt="Microsoft Teams workflow URL">
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="connecting-incidenthub-with-the-workflow">Connecting IncidentHub With the Workflow<a href="https://blog.incidenthub.cloud/how-to-receive-cloud-outage-alerts-in-microsoft-teams#connecting-incidenthub-with-the-workflow" class="hash-link" aria-label="Direct link to Connecting IncidentHub With the Workflow" title="Direct link to Connecting IncidentHub With the Workflow" translate="no">​</a></h4>
<p>→ Login to your IncidentHub account and click on Channels -&gt; Add -&gt; Microsoft Teams.</p>
<p>→ Add a Name and and a Description.</p>
<p>→ Under "Microsoft Teams Workflow URL", paste the URL that you had copied earlier.</p>
<img style="border-radius:10px;border:1px;max-width:70%" src="https://cdn.incidenthub.cloud/blog/add-to-microsoft-teams.webp" alt="Microsoft Teams workflow URL input">
<p>→ To ensure the URL is valid and IncidentHub is able to connect to it, you can click on "Send a test message". This will send a test notification to your Microsoft Teams channel.</p>
<p>→ Click on "Save".</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="fine-tuning-your-alerts">Fine-tuning Your Alerts<a href="https://blog.incidenthub.cloud/how-to-receive-cloud-outage-alerts-in-microsoft-teams#fine-tuning-your-alerts" class="hash-link" aria-label="Direct link to Fine-tuning Your Alerts" title="Direct link to Fine-tuning Your Alerts" translate="no">​</a></h3>
<p>To reduce alert noise, fine-tune them by choosing only relevant alerts. You can do this by:</p>
<ol>
<li class="">Choosing specific components to monitor.</li>
<li class="">Choosing specific types of alerts to receive - outages, maintenances, or both. Services like Twilio, Cloudflare, and Salesforce have a ton of maintenance events and they can easily overwhelm your alert notifications.</li>
<li class="">Select the lifecycle events that you are interested in - beginning, end, or all updates.</li>
</ol>
<p>These topics are explained in detail in the <a href="https://docs.incidenthub.cloud/incidenthub-documentation/services/monitoring-a-service#adding-and-removing-components" target="_blank" rel="noopener noreferrer" class="">IncidentHub documentation</a>.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="understanding-the-alerts">Understanding the Alerts<a href="https://blog.incidenthub.cloud/how-to-receive-cloud-outage-alerts-in-microsoft-teams#understanding-the-alerts" class="hash-link" aria-label="Direct link to Understanding the Alerts" title="Direct link to Understanding the Alerts" translate="no">​</a></h3>
<p>A Microsoft Teams notification sent by IncidentHub looks like this:</p>
<img style="border-radius:10px;border:1px;max-width:70%" src="https://cdn.incidenthub.cloud/blog/teams-alert.webp" alt="Microsoft Teams notification for IncidentHub">
<p>The alert is sent as an AdaptiveCard and summarizes all the important information about the outage/maintenance. The colored bar on the left indicates the type of event.</p>
<div style="margin-top:16px;margin-bottom:16px;padding:16px 20px 2px;border:1px solid #e5e7eb;border-radius:8px;background-color:#f9fafb;box-shadow:0 1px 3px 0 rgba(0, 0, 0, 0.1)"><p><span style="display:inline-block;width:12px;height:12px;background-color:#EF4444;margin-right:8px;vertical-align:middle"></span> Outage triggered or updated<br></p><p><span style="display:inline-block;width:12px;height:12px;background-color:#22c55e;margin-right:8px;vertical-align:middle"></span> Outage resolved or maintenance completed<br></p><p><span style="display:inline-block;width:12px;height:12px;background-color:#3b82f6;margin-right:8px;vertical-align:middle"></span> Maintenance ongoing<br></p><p><span style="display:inline-block;width:12px;height:12px;background-color:#EAB308;margin-right:8px;vertical-align:middle"></span> Upcoming maintenance reminder</p></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="conclusion">Conclusion<a href="https://blog.incidenthub.cloud/how-to-receive-cloud-outage-alerts-in-microsoft-teams#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>IncidentHub's integration with Microsoft Teams is a powerful way to stay informed about cloud outages and maintenance events. It's easy to set up and can be customized to send only the relevant alerts to your team's channel.
<a href="https://incidenthub.cloud/" target="_blank" rel="noopener noreferrer" class="">Try it out</a>.</p>
<hr>
<p>Photo by <a href="https://unsplash.com/@hazelz?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Hazel Z</a> on <a href="https://unsplash.com/photos/a-computer-screen-with-a-cloud-shaped-object-on-top-of-it-FocSgUZ10JM?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></p>
<p>IncidentHub is not affiliated with any of the services and vendors mentioned in this article. All logos and company names are trademarks or registered trademarks of their respective holders</p>
<p>This article was first published on the <a href="https://blog.incidenthub.cloud/product-update-filter-alerts-ms-teams-custom-domains" target="_blank" rel="noopener noreferrer" class="">IncidentHub blog</a>.</p>]]></content:encoded>
            <category>Microsoft Teams</category>
            <category>Alerting</category>
        </item>
        <item>
            <title><![CDATA[Product Update - Turn Off Alerts, Use Microsoft Teams, and Custom Domains]]></title>
            <link>https://blog.incidenthub.cloud/product-update-filter-alerts-ms-teams-custom-domains</link>
            <guid>https://blog.incidenthub.cloud/product-update-filter-alerts-ms-teams-custom-domains</guid>
            <pubDate>Wed, 29 Oct 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[We have added several new features to IncidentHub to make it easier to fine tune your alerts, plus support for Microsoft Teams and custom domains.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="whats-new-in-incidenthub">What's New In IncidentHub?<a href="https://blog.incidenthub.cloud/product-update-filter-alerts-ms-teams-custom-domains#whats-new-in-incidenthub" class="hash-link" aria-label="Direct link to What's New In IncidentHub?" title="Direct link to What's New In IncidentHub?" translate="no">​</a></h2>
<p>Over the last few months IncidentHub has added several new features to make it easier to fine tune your alerts. IncidentHub now also integrates with Microsoft Teams and supports custom domains for your public status pages. Let's take a comprehensive look at what's new.</p>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/incidenthub-banner.webp" alt="IncidentHub Banner">
<hr>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-filter-alerts-ms-teams-custom-domains#whats-new-in-incidenthub" class="">What's New In IncidentHub?</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-filter-alerts-ms-teams-custom-domains#fine-tuning-alerts" class="">Fine-Tuning Alerts</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-filter-alerts-ms-teams-custom-domains#faster-component-filtering" class="">Faster Component Filtering</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-filter-alerts-ms-teams-custom-domains#microsoft-teams-integration-beta" class="">Microsoft Teams Integration (Beta)</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-filter-alerts-ms-teams-custom-domains#detailed-historical-data-upto-90-days" class="">Detailed Historical Data Upto 90 Days</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-filter-alerts-ms-teams-custom-domains#custom-domains-for-public-status-pages-beta" class="">Custom Domains for Public Status Pages (Beta)</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-filter-alerts-ms-teams-custom-domains#test-buttons-for-notification-channels" class="">Test Buttons for Notification Channels</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-filter-alerts-ms-teams-custom-domains#wrapping-up" class="">Wrapping Up</a></li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="fine-tuning-alerts">Fine-Tuning Alerts<a href="https://blog.incidenthub.cloud/product-update-filter-alerts-ms-teams-custom-domains#fine-tuning-alerts" class="hash-link" aria-label="Direct link to Fine-Tuning Alerts" title="Direct link to Fine-Tuning Alerts" translate="no">​</a></h3>
<p>You already could choose specific components when monitoring a service. Now on top of that, you can turn off specific types of event alerts for a service.
For most services, IncidentHub can detect outages as well as maintenance events. Some services have a ton of maintenance events and they can easily overwhelm your alert notifications.
You can turn off either maintenance alerts, or outage alerts, or both, or none. This setting is per-service.</p>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/maintenance-outage-alerts.webp" alt="Fine-tuning alerts">
<p><br>
<!-- -->The existing settings for lifecycle (start/end/updates) remain the same.</p>
<p>One of my worries while adding this feature was that it might make the user experience more complex.</p>
<p><em>Too many settings, which one should I choose?</em></p>
<p>However, this feature was more important than the lifecycle settings, so I had to add it. All settings in the Notifications tab have sane defaults. You can always change them later, depending on your needs, or you can leave them as-is.</p>
<p>Whatever you see in the Notifications tab is requested by customers with real use-cases. At the same time, I think it's also important to have a simple UI. I hope the current UI is a good balance between these two goals.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="faster-component-filtering">Faster Component Filtering<a href="https://blog.incidenthub.cloud/product-update-filter-alerts-ms-teams-custom-domains#faster-component-filtering" class="hash-link" aria-label="Direct link to Faster Component Filtering" title="Direct link to Faster Component Filtering" translate="no">​</a></h3>
<p>Salesforce and Google Cloud Platform - to take two examples - each have thousands of components.
This was a challenge to not just load on the UI but also for the user to choose by scrolling and selecting from the list.</p>
<p>The new UI has a search box where you can type and choose components. It's similar in behavior to the search box where you can search for services when you are adding a new monitored service.
Behind the scenes, the difference is that it uses <a href="https://tanstack.com/virtual/latest" target="_blank" rel="noopener noreferrer" class="">TanStack's react-virtual library</a>.</p>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/faster-component-selection.webp" alt="Easier and fastercomponent filtering">
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="microsoft-teams-integration-beta">Microsoft Teams Integration (Beta)<a href="https://blog.incidenthub.cloud/product-update-filter-alerts-ms-teams-custom-domains#microsoft-teams-integration-beta" class="hash-link" aria-label="Direct link to Microsoft Teams Integration (Beta)" title="Direct link to Microsoft Teams Integration (Beta)" translate="no">​</a></h3>
<p>IncidentHub now supports sending alerts to Microsoft Teams. This is a beta feature and is available for all paid plans on request. Since it's in beta, it's being rolled out in phases on an as-requested basis.
Under the hood it uses Adaptive Cards which support JSON based templating to push rich notifications to Teams channels. The benefit of the templating model is that you can separate data from the layout, and thus change the layout independently of the
notification content.</p>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/microsoft-teams-incidenthub.webp" alt="Microsoft Teams integration">
<p><br>
<!-- -->This is a screenshot from our production IncidentHub MS Teams channel which monitors IncidentHub's own dependencies:</p>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/microsoft-teams-incidenthub-alerts.webp" alt="Microsoft Teams integration screenshot">
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="detailed-historical-data-upto-90-days">Detailed Historical Data Upto 90 Days<a href="https://blog.incidenthub.cloud/product-update-filter-alerts-ms-teams-custom-domains#detailed-historical-data-upto-90-days" class="hash-link" aria-label="Direct link to Detailed Historical Data Upto 90 Days" title="Direct link to Detailed Historical Data Upto 90 Days" translate="no">​</a></h3>
<p>The previous version of the availability dashboard had just vertical bars indicating outage/non-outage days. The new page includes a list of the incidents that occurred in the last 90 days (30 days in free accounts).</p>
<p>The incidents and maintenance events that you will see in the list are automatically filtered by the components that you choose when adding the service. E.g. if you are monitoring GitHub webhooks only, and GitHub had outages in
other components, you will not see any red days on the availability page. This behaviour mimics that of the main dashboard (where you can see a summary of incidents) and the public status page (where you can see a summary view as well as the list of components affected, plus the historical view as bars).</p>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/historical-data.webp" alt="Detailed historical data">
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="custom-domains-for-public-status-pages-beta">Custom Domains for Public Status Pages (Beta)<a href="https://blog.incidenthub.cloud/product-update-filter-alerts-ms-teams-custom-domains#custom-domains-for-public-status-pages-beta" class="hash-link" aria-label="Direct link to Custom Domains for Public Status Pages (Beta)" title="Direct link to Custom Domains for Public Status Pages (Beta)" translate="no">​</a></h3>
<p>Custom domains are finally here for <a class="" href="https://blog.incidenthub.cloud/product-update-public-status-pages">public status pages</a>.</p>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/public-status-page-with-url.webp" alt="Custom domain for public status page">
<p><br>
<!-- -->This feature is in beta so there is no UI yet to configure it. You can request it by contacting <a href="mailto:support@incidenthub.cloud" target="_blank" rel="noopener noreferrer" class="">support@incidenthub.cloud</a>.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="test-buttons-for-notification-channels">Test Buttons for Notification Channels<a href="https://blog.incidenthub.cloud/product-update-filter-alerts-ms-teams-custom-domains#test-buttons-for-notification-channels" class="hash-link" aria-label="Direct link to Test Buttons for Notification Channels" title="Direct link to Test Buttons for Notification Channels" translate="no">​</a></h3>
<p>Slack and Microsoft Teams both have a "Test" button that you can click to check if the channel is working.
This is available before you add the channel and also after you have done so.</p>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/channel-test-buttons.webp" alt="Test button for Microsoft Teams">
<p><br>
<!-- -->The buttons will send a test message to the channel to verify that it is working.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="wrapping-up">Wrapping Up<a href="https://blog.incidenthub.cloud/product-update-filter-alerts-ms-teams-custom-domains#wrapping-up" class="hash-link" aria-label="Direct link to Wrapping Up" title="Direct link to Wrapping Up" translate="no">​</a></h2>
<p>What do you think? Let me know on <a href="https://x.com/Incident_Hub" target="_blank" rel="noopener noreferrer" class="">X</a> or <a href="https://bsky.app/profile/incidenthub.bsky.social" target="_blank" rel="noopener noreferrer" class="">Bluesky</a>.</p>
<p>This article was first published on the <a href="https://blog.incidenthub.cloud/product-update-filter-alerts-ms-teams-custom-domains" target="_blank" rel="noopener noreferrer" class="">IncidentHub blog</a>.</p>]]></content:encoded>
            <category>Product</category>
            <category>Microsoft Teams</category>
            <category>Custom Domains</category>
            <category>Alerting</category>
        </item>
        <item>
            <title><![CDATA[The 2025 Guide to Open Source Status Page Software]]></title>
            <link>https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software</link>
            <guid>https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software</guid>
            <pubDate>Wed, 15 Oct 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[This article lists the different open source status page software to consider for managing your status page in 2025.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="introduction">Introduction<a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#introduction" class="hash-link" aria-label="Direct link to Introduction" title="Direct link to Introduction" translate="no">​</a></h2>
<p><em>This is an updated version of the <a class="" href="https://blog.incidenthub.cloud/The-2024-Guide-to-Open-Source-Status-Page-Providers">2024 article</a></em>.</p>
<p>Maintaining transparent communication about service availability is crucial for businesses of all sizes. Status pages are an important part of your communication
strategy during times of outages and maintenance events.</p>
<p>You can <a class="" href="https://blog.incidenthub.cloud/Best-Practices-Choosing-Status-Page-Provider">choose</a> to go with a fully managed status page provider or host an open-source one yourself.</p>
<p>Open source status page software offer a cost-effective and customizable solution where you have complete control over the code, data, and presentation.
This guide explores the best available open source status page software in 2025 to help you choose the right tool for your needs.</p>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/public-status-page.webp" alt="Public Status Page Example">
<hr>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#introduction" class="">Introduction</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#list-of-open-source-status-page-software" class="">List of Open Source Status Page Software</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#1-cachet" class="">1. Cachet</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#2-statping-ng" class="">2. Statping-ng</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#3-cstate" class="">3. Cstate</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#4-upptime" class="">4. Upptime</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#5-vigil" class="">5. Vigil</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#6-gatus" class="">6. Gatus</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#7-statuspal" class="">7. StatusPal</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#8-uptime-kuma" class="">8. Uptime Kuma</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#9-oneuptime" class="">9. OneUptime</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#10-kener" class="">10. Kener</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#11-openstatus" class="">11. OpenStatus</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#12-uptimeflare" class="">12. UptimeFlare</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#choosing-the-right-open-source-status-page-software" class="">Choosing the Right Open Source Status Page Software</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#summary" class="">Summary</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#conclusion" class="">Conclusion</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="list-of-open-source-status-page-software">List of Open Source Status Page Software<a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#list-of-open-source-status-page-software" class="hash-link" aria-label="Direct link to List of Open Source Status Page Software" title="Direct link to List of Open Source Status Page Software" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-cachet">1. Cachet<a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#1-cachet" class="hash-link" aria-label="Direct link to 1. Cachet" title="Direct link to 1. Cachet" translate="no">​</a></h3>
<p>Cachet is a popular open source status page system built with PHP and Laravel. It offers a clean, minimalist design and a robust feature set.</p>
<p><strong>Key Features:</strong></p>
<ul>
<li class="">Easy installation and setup.</li>
<li class="">Metric graphs for visualizing performance.</li>
<li class="">Maintenance scheduling.</li>
<li class="">Multilingual support.</li>
<li class="">Metrics.</li>
<li class="">Service components.</li>
<li class="">Two-factor authentication.</li>
<li class="">JSON API for automation.</li>
<li class="">Subscriber notifications via email.</li>
</ul>
<p style="background-color:var(--ifm-color-emphasis-200);padding:10px;border-radius:10px;padding-bottom:1px;padding-top:20px;padding-left:20px;border-left:3px solid #60a5fa"></p><p><strong>Home Page</strong>: <a href="https://cachethq.io/" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://cachethq.io/" target="_blank" rel="noopener noreferrer" class="">https://cachethq.io/</a>
<br>
<strong>Source Code</strong>: <a href="https://github.com/cachethq/cachet" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://github.com/cachethq/cachet" target="_blank" rel="noopener noreferrer" class="">https://github.com/cachethq/cachet</a>
<br>
<strong>Demo Site</strong>: <a href="https://v3.cachethq.io/" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://v3.cachethq.io/" target="_blank" rel="noopener noreferrer" class="">https://v3.cachethq.io/</a>
<br>
<strong>Managed Version Available</strong>: No</p><p></p>
<p><strong>Live page examples</strong>:</p>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://status.gnome.org/" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">GNOME Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/cachet-gnome-status-page.webp" alt="Gnome Status Page"></div>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://status.eea.europa.eu/" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">EEA Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/cachet-eea-status-page.webp" alt="EEA Status Page"></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-statping-ng">2. Statping-ng<a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#2-statping-ng" class="hash-link" aria-label="Direct link to 2. Statping-ng" title="Direct link to 2. Statping-ng" translate="no">​</a></h3>
<p>Statping-ng is a Go-based status page that emphasizes simplicity and ease of use. It supports both SQLite and MySQL databases.
Statping-ng is an updated replacement of statping after development stopped on the original fork.</p>
<p><strong>Key Features:</strong></p>
<ul>
<li class="">Lightweight and fast.</li>
<li class="">Built-in monitoring capabilities for HTTP, TCP, UDP, gRPC, and ICMP.</li>
<li class="">Customizable response checks.</li>
<li class="">Customizable HTTP checks with headers and body.</li>
<li class="">Mobile-friendly design.</li>
<li class="">OAuth authentication fro GitHub, Google, and Slack.</li>
<li class="">Native mobile apps for Android and iOS.</li>
<li class="">Customizable themes.</li>
<li class="">Prometheus exporter for advanced monitoring.</li>
<li class="">Notifications on Slack, Email, and Twilio.</li>
<li class="">Plugin framework.</li>
</ul>
<p style="background-color:var(--ifm-color-emphasis-200);padding:10px;border-radius:10px;padding-bottom:1px;padding-top:20px;padding-left:20px;border-left:3px solid #60a5fa"></p><p><strong>Home Page</strong>: <a href="https://statping-ng.github.io/" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://statping-ng.github.io/" target="_blank" rel="noopener noreferrer" class="">https://statping-ng.github.io/</a>
<br>
<strong>Source Code</strong>: <a href="https://github.com/statping-ng/statping-ng" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://github.com/statping-ng/statping-ng" target="_blank" rel="noopener noreferrer" class="">https://github.com/statping-ng/statping-ng</a>
<br>
<strong>Demo Site</strong>: NA
<br>
<strong>Managed Version Available</strong>: No</p><p></p>
<p><strong>Live page examples</strong>:</p>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://status.jellyfin.org/service/repository" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">Jellyfin Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/statping-ng-jellyfin-status-page.webp" alt="Jellyfin Status Page"></div>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://status.bonifacelabs.net/" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">Boniface Labs Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/statping-ng-bonifacelabs-status-page.webp" alt="Boniface Labs Status Page"></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-cstate">3. Cstate<a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#3-cstate" class="hash-link" aria-label="Direct link to 3. Cstate" title="Direct link to 3. Cstate" translate="no">​</a></h3>
<p>Cstate is a Hugo-based static status page generator that emphasizes simplicity and performance.</p>
<p><strong>Key Features:</strong></p>
<ul>
<li class="">Fast page load times.</li>
<li class="">Fully customizable through Hugo templates.</li>
<li class="">Supports multiple content formats (YAML, JSON, TOML).</li>
<li class="">Multilingual support.</li>
<li class="">Automatic light/dark mode.</li>
<li class="">Easy deployment to Netlify or GitHub Pages</li>
</ul>
<p style="background-color:var(--ifm-color-emphasis-200);padding:10px;border-radius:10px;padding-bottom:1px;padding-top:20px;padding-left:20px;border-left:3px solid #60a5fa"></p><p><strong>Home Page</strong>: <a href="https://cstate.netlify.app/" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://cstate.netlify.app/" target="_blank" rel="noopener noreferrer" class="">https://cstate.netlify.app</a>
<br>
<strong>Source Code</strong>: <a href="https://github.com/cstate/cstate" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://github.com/cstate/cstate" target="_blank" rel="noopener noreferrer" class="">https://github.com/cstate/cstate</a>
<br>
<strong>Demo Site</strong>: <a href="https://cstate.mnts.lt/" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://cstate.mnts.lt/" target="_blank" rel="noopener noreferrer" class="">https://cstate.mnts.lt</a>
<br>
<strong>Managed Version Available</strong>: No</p><p></p>
<p><strong>Live page examples</strong>:</p>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://status.chocolatey.org/" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">Chocolatey Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/cstate-chocolatey-status-page.webp" alt="Chocolatey Status Page"></div>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://status.testing-farm.io/" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">Testing Farm Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/cstate-testing-farm-status-page.webp" alt="Testing Farm Status Page"></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-upptime">4. Upptime<a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#4-upptime" class="hash-link" aria-label="Direct link to 4. Upptime" title="Direct link to 4. Upptime" translate="no">​</a></h3>
<p>Upptime is a GitHub-powered open source uptime monitor and status page generator.</p>
<p><strong>Key Features:</strong></p>
<ul>
<li class="">No server required - runs entirely on GitHub Actions.</li>
<li class="">Scheduled maintenance support.</li>
<li class="">Workflows integrate with GitHub - issue creation, summary, response graph generation, response time calculation, etc.</li>
<li class="">Real-time notifications via GitHub Issues.</li>
<li class="">Custom domain and SSL support through GitHub Pages.</li>
<li class="">Graphs and badges for displaying uptime.</li>
<li class="">Integrates with various monitoring services.</li>
<li class="">Notifications via Slack, Telegram, Discord, Zulip, Microsoft Teams, Gotify, Email, SMS, and custom webhooks.</li>
</ul>
<p style="background-color:var(--ifm-color-emphasis-200);padding:10px;border-radius:10px;padding-bottom:1px;padding-top:20px;padding-left:20px;border-left:3px solid #60a5fa"></p><p><strong>Home Page</strong>: <a href="https://upptime.js.org/" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://upptime.js.org/" target="_blank" rel="noopener noreferrer" class="">https://upptime.js.org</a>
<br>
<strong>Source Code</strong>: <a href="https://github.com/upptime/upptime" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://github.com/upptime/upptime" target="_blank" rel="noopener noreferrer" class="">https://github.com/upptime/upptime</a>
<br>
<strong>Demo Site</strong>: <a href="https://demo.upptime.js.org/" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://demo.upptime.js.org/" target="_blank" rel="noopener noreferrer" class="">https://demo.upptime.js.org</a>
<br>
<strong>Managed Version Available</strong>: No, but powered by GitHub Actions.</p><p></p>
<p><strong>Live page examples</strong>:</p>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://status.opensourcepos.org/" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">Open Source Repos Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/upptime-opensourcerepos-status-page.webp" alt="Open Source Repos Status Page"></div>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://status.frai.se/" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">Frai.se Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/upptime-fraise-status-page.webp" alt="Frai.se Status Page"></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="5-vigil">5. Vigil<a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#5-vigil" class="hash-link" aria-label="Direct link to 5. Vigil" title="Direct link to 5. Vigil" translate="no">​</a></h3>
<p>Vigil is a lightweight status page written in Rust, designed for high performance and low resource usage.</p>
<p><strong>Key Features:</strong></p>
<ul>
<li class="">Simple configuration using TOML files.</li>
<li class="">Built-in monitoring capabilities for HTTP/TCP/ICMP, application services, and local services (e.g. on a different network away from the main server).</li>
<li class="">Support for push notifications.</li>
<li class="">Planned maintenance notices.</li>
<li class="">Integrates with Email, Twilio (SMS), Slack, Zulip, Telegram, Pushover, Gotify, XMPP, Matrix, Cisco Webex, and Webhooks.</li>
</ul>
<p style="background-color:var(--ifm-color-emphasis-200);padding:10px;border-radius:10px;padding-bottom:1px;padding-top:20px;padding-left:20px;border-left:3px solid #60a5fa"></p><p><strong>Home Page</strong>: <a href="https://crates.io/crates/vigil-server" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://crates.io/crates/vigil-server" target="_blank" rel="noopener noreferrer" class="">https://crates.io/crates/vigil-server</a>
<br>
<strong>Source Code</strong>: <a href="https://github.com/valeriansaliou/vigil" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://github.com/valeriansaliou/vigil" target="_blank" rel="noopener noreferrer" class="">https://github.com/valeriansaliou/vigil</a>
<br>
<strong>Demo Site</strong>: NA
<br>
<strong>Managed Version Available</strong>: Crisp offers a managed version of Vigil ported to their own infrastructure.</p><p></p>
<p><strong>Live page examples</strong>:</p>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://status.crisp.chat/" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">Crisp Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/vigil-crisp-status-page.webp" alt="Crisp Status Page"></div>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://status.autoinspector.ai/" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">Autoinspector Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/vigil-autoinspector-status-page.webp" alt="Autoinspector Status Page"></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="6-gatus">6. Gatus<a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#6-gatus" class="hash-link" aria-label="Direct link to 6. Gatus" title="Direct link to 6. Gatus" translate="no">​</a></h3>
<p>Gatus is a health dashboard and status page that monitors services and endpoints using HTTP, TCP, and other network protocols.</p>
<p><strong>Key Features:</strong></p>
<ul>
<li class="">Inbuilt health checks - HTTP, ICMP, TCP, DNS.</li>
<li class="">Condition-based rule evaluation of health query results.</li>
<li class="">Support for tunneling to monitor services on a different network.</li>
<li class="">Customizable alerting - supports various providers like  AWS SES, Discord, Email, Gitea, GitHub, GitLab, Google Chat, Gotify, HomeAssistant, IFTTT, Ilert, Incident.io, Line, Matrix, Mattermost, Messagebird, n8n, New Relic, Ntfy, Opsgenie, PagerDuty, Plivo, Pushover, Rocket.Chat, SendGrid, Signal, SIGNL4, Slack, Splunk, Squadcast, Teams, Teams Workflow, Telegram, Twilio, Vonage, Webex, Zapier, Zulip, etc.</li>
<li class="">Customizable alert thresholds.</li>
<li class="">Support for announcing planned maintenance.</li>
<li class="">Basic auth and OIDC support.</li>
<li class="">Metrics visualization.</li>
<li class="">Easy to configure using YAML.</li>
<li class="">Automatic badge generation for monitored endpoints.</li>
<li class="">Docker support for easy deployment.</li>
</ul>
<p style="background-color:var(--ifm-color-emphasis-200);padding:10px;border-radius:10px;padding-bottom:1px;padding-top:20px;padding-left:20px;border-left:3px solid #60a5fa"></p><p><strong>Home Page</strong>: <a href="https://gatus.io/" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://gatus.io/" target="_blank" rel="noopener noreferrer" class="">https://gatus.io/</a>
<br>
<strong>Source Code</strong>: <a href="https://github.com/TwiN/gatus" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://github.com/TwiN/gatus" target="_blank" rel="noopener noreferrer" class="">https://github.com/TwiN/gatus</a>
<br>
<strong>Demo Site</strong>: <a href="https://status.twin.sh/" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://status.twin.sh/" target="_blank" rel="noopener noreferrer" class="">https://status.twin.sh/</a>
<br>
<strong>Managed Version Available</strong>: Yes</p><p></p>
<p><strong>Live page examples:</strong></p>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://status.skobk.in/" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">Skobk Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/gatus-triluxds-status-page.webp" alt="TriluxDS Status Page"></div>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://defenseunicorns.gatus.io/" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">Defense Unicorns Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/gatus-defenseunicorns-status-page.webp" alt="Defense Unicorns Status Page"></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="7-statuspal">7. StatusPal<a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#7-statuspal" class="hash-link" aria-label="Direct link to 7. StatusPal" title="Direct link to 7. StatusPal" translate="no">​</a></h3>
<p>StatusPal is an Elixir-based status page system. The repository has not been updated in a while as of this writing but they also offer a managed version.</p>
<p><strong>Key Features:</strong></p>
<ul>
<li class="">Email and Tweet notifications.</li>
<li class="">Private and public status pages.</li>
<li class="">Component support.</li>
<li class="">API for integration with monitoring tools.</li>
<li class="">Multi-language support.</li>
<li class="">Scheduled maintenance announcements.</li>
</ul>
<p style="background-color:var(--ifm-color-emphasis-200);padding:10px;border-radius:10px;padding-bottom:1px;padding-top:20px;padding-left:20px;border-left:3px solid #60a5fa"></p><p><strong>Home Page</strong>: <a href="https://statuspal.io/" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://statuspal.io/" target="_blank" rel="noopener noreferrer" class="">https://statuspal.io/</a>
<br>
<strong>Source Code</strong>: <a href="https://github.com/statuspal/statuspal" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://github.com/statuspal/statuspal" target="_blank" rel="noopener noreferrer" class="">https://github.com/statuspal/statuspal</a>
<br>
<strong>Demo Site</strong>: NA
<br>
<strong>Managed Version Available</strong>: Yes</p><p></p>
<p><strong>Live page examples</strong>:</p>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://status.unity.com/" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">Unity Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/statuspal-unity-status-page.webp" alt="Unity Status Page"></div>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://ucheck.statuspal.io/" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">UCheck Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/statuspal-ucheck-status-page.webp" alt="UCheck Status Page"></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="8-uptime-kuma">8. Uptime Kuma<a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#8-uptime-kuma" class="hash-link" aria-label="Direct link to 8. Uptime Kuma" title="Direct link to 8. Uptime Kuma" translate="no">​</a></h3>
<p>Uptime Kuma is a monitoring tool with a built-in status page feature.</p>
<p><strong>Key Features:</strong></p>
<ul>
<li class="">Uptime monitoring for HTTP, TCP, DNS, Keywords, Ping, Steam Game Server, Docker Containers.</li>
<li class="">Multi-language support.</li>
<li class="">2-factor authentication</li>
<li class="">Docker support</li>
<li class="">Notifications via Telegram, Discord, Gotify, Slack, Pushover, Email (SMTP), etc</li>
<li class="">20-second intervals</li>
<li class="">Multiple status pages</li>
<li class="">2FA support</li>
</ul>
<p style="background-color:var(--ifm-color-emphasis-200);padding:10px;border-radius:10px;padding-bottom:1px;padding-top:20px;padding-left:20px;border-left:3px solid #60a5fa"></p><p><strong>Home Page</strong>: <a href="https://uptime.kuma.pet/" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://uptime.kuma.pet/" target="_blank" rel="noopener noreferrer" class="">https://uptime.kuma.pet/</a>
<br>
<strong>Source Code</strong>: <a href="https://github.com/louislam/uptime-kuma" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://github.com/louislam/uptime-kuma" target="_blank" rel="noopener noreferrer" class="">https://github.com/louislam/uptime-kuma</a>
<br>
<strong>Demo Site</strong>: <a href="https://uptime.kuma.pet/" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://uptime.kuma.pet/" target="_blank" rel="noopener noreferrer" class="">https://uptime.kuma.pet/</a>
<br>
<strong>Managed Version Available</strong>: No</p><p></p>
<p><strong>Live page examples:</strong></p>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://status.sidingsmedia.com/" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">Sidings Media Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/uptimekuma-sidingsmedia-status-page.webp" alt="Sidings Media Status Page"></div>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://status.bludood.com/" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">BluDood Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/uptimekuma-bludood-status-page.webp" alt="BluDood Status Page"></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="9-oneuptime">9. OneUptime<a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#9-oneuptime" class="hash-link" aria-label="Direct link to 9. OneUptime" title="Direct link to 9. OneUptime" translate="no">​</a></h3>
<p>An observability platform that also has status pages.</p>
<p><strong>Key Features:</strong></p>
<ul>
<li class="">Uptime monitoring.</li>
<li class="">Customizable public status pages.</li>
<li class="">Notifications via Email, SMS, Slack etc</li>
<li class="">Custom branding on your status page.</li>
<li class="">On-call policies, schedules and alerts.</li>
<li class="">Log management.</li>
<li class="">Incident management workflow.</li>
<li class="">Application Performance Monitoring.</li>
</ul>
<p style="background-color:var(--ifm-color-emphasis-200);padding:10px;border-radius:10px;padding-bottom:1px;padding-top:20px;padding-left:20px;border-left:3px solid #60a5fa"></p><p><strong>Home Page</strong>: <a href="https://oneuptime.com/" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://oneuptime.com/" target="_blank" rel="noopener noreferrer" class="">https://oneuptime.com/</a>
<br>
<strong>Source Code</strong>: <a href="https://github.com/OneUptime/oneuptime" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://github.com/OneUptime/oneuptime" target="_blank" rel="noopener noreferrer" class="">https://github.com/OneUptime/oneuptime</a>
<br>
<strong>Demo Site</strong>: NA
<br>
<strong>Managed Version Available</strong>: Yes</p><p></p>
<p><strong>Live page examples</strong>:</p>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://status.lookandplay.io/" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">Look &amp; Play Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/oneuptime-lookandplay-status-page.webp" alt="Look &amp; Play Status Page"></div>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://uptime.cloudiabot.com/status-page/37eee8f3-95bf-4744-9fb7-000624473260" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">Cloudiabot Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/oneuptime-cloudiabot-status-page.webp" alt="Cloudiabot Status Page"></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="10-kener">10. Kener<a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#10-kener" class="hash-link" aria-label="Direct link to 10. Kener" title="Direct link to 10. Kener" translate="no">​</a></h3>
<p>Monitoring and status page tool written in Node.js.</p>
<p><strong>Key Features</strong></p>
<ul>
<li class="">Monitoring support for HTTP, DNS, TCP, Ping, SQL, SSL, Heartbeats, Gamedig, etc.</li>
<li class="">Quick setup with Docker.</li>
<li class="">Dark/light mode support.</li>
<li class="">Notification support on Discord, Slack, Email, and Webhooks.</li>
<li class="">In-built support for SEO tools.</li>
<li class="">Timezone auto-adjustment.</li>
<li class="">Role based access control.</li>
<li class="">Badge and embded support.</li>
<li class="">API based incident creation.</li>
<li class="">Customizable branding.</li>
<li class="">i18n support.</li>
</ul>
<p style="background-color:var(--ifm-color-emphasis-200);padding:10px;border-radius:10px;padding-bottom:1px;padding-top:20px;padding-left:20px;border-left:3px solid #60a5fa"></p><p><strong>Home Page</strong>: <a href="https://kener.ing/" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://kener.ing/" target="_blank" rel="noopener noreferrer" class="">https://kener.ing/</a>
<br>
<strong>Source Code</strong>: <a href="https://github.com/rajnandan1/kener" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://github.com/rajnandan1/kener" target="_blank" rel="noopener noreferrer" class="">https://github.com/rajnandan1/kener</a>
<br>
<strong>Demo Site</strong>: <a href="https://kener.ing/" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://kener.ing/" target="_blank" rel="noopener noreferrer" class="">https://kener.ing/</a>
<br>
<strong>Managed Version Available</strong>: No</p><p></p>
<p><strong>Live page examples</strong>:</p>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://status.nocodb.com/" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">Nocodb Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/kener-nocodb-status-page.webp" alt="Nocodb Status Page"></div>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://status.rephealth.com/" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">Rephealth Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/kener-rephealth-status-page.webp" alt="Rephealth Status Page"></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="11-openstatus">11. OpenStatus<a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#11-openstatus" class="hash-link" aria-label="Direct link to 11. OpenStatus" title="Direct link to 11. OpenStatus" translate="no">​</a></h3>
<p>A performance monitoring platform written in Typescript and Go with public status pages.</p>
<p><strong>Key Features</strong></p>
<ul>
<li class="">API, DNS, Domain, SMTP, Ping etc monitoring - with latency, uptime, availability metrics.</li>
<li class="">OpenTelemetry support for metrics.</li>
<li class="">Password protection and custom domain support for status pages.</li>
<li class="">Scheduled maintenance support.</li>
<li class="">User notification support via Email, SMS, Discord, OpsGenie, PagerDuty, Slack, Ntfy, and Webhooks.</li>
</ul>
<p style="background-color:var(--ifm-color-emphasis-200);padding:10px;border-radius:10px;padding-bottom:1px;padding-top:20px;padding-left:20px;border-left:3px solid #60a5fa"></p><p><strong>Home Page</strong>: <a href="https://openstatus.dev/" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://openstatus.dev/" target="_blank" rel="noopener noreferrer" class="">https://openstatus.dev/</a>
<br>
<strong>Source Code</strong>: <a href="https://github.com/openstatusHQ/openstatus" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://github.com/openstatusHQ/openstatus" target="_blank" rel="noopener noreferrer" class="">https://github.com/openstatusHQ/openstatus</a>
<br>
<strong>Demo Site</strong>: NA
<br>
<strong>Managed Version Available</strong>: Yes</p><p></p>
<p><strong>Live page examples</strong>:</p>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://bds-status.birdeye.so/" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">Birdeye Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/openstatus-birdeye-status-page.webp" alt="Birdeye Status Page"></div>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://status.documenso.com/" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">Documenso Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/openstatus-documenso-status-page.webp" alt="Documenso Status Page"></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="12-uptimeflare">12. UptimeFlare<a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#12-uptimeflare" class="hash-link" aria-label="Direct link to 12. UptimeFlare" title="Direct link to 12. UptimeFlare" translate="no">​</a></h3>
<p>A serverless uptime monitoring tool with a built-in status page.</p>
<p><strong>Key Features</strong></p>
<ul>
<li class="">Serverless uptime monitoring.</li>
<li class="">Geo-specific checks.</li>
<li class="">HTTP/HTTPS/TCP port monitoring.</li>
<li class="">Customizable request methods, body, and headers, and custom status codes and response checks for HTTP.</li>
<li class="">Scheduled maintenance support.</li>
<li class="">Built-in status page with customizable branding.</li>
<li class="">Notifications via Email, SMS, Discord, OpsGenie, PagerDuty, Slack, Webhooks etc. using the Apprise library.</li>
<li class="">Password protection for status pages.</li>
<li class="">JSON API for status data.</li>
</ul>
<p style="background-color:var(--ifm-color-emphasis-200);padding:10px;border-radius:10px;padding-bottom:1px;padding-top:20px;padding-left:20px;border-left:3px solid #60a5fa"></p><p><strong>Home Page</strong>: <a href="https://uptimeflare.pages.dev/" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://uptimeflare.pages.dev/" target="_blank" rel="noopener noreferrer" class="">https://uptimeflare.pages.dev/</a>
<br>
<strong>Source Code</strong>: <a href="https://github.com/lyc8503/UptimeFlare" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://github.com/lyc8503/UptimeFlare" target="_blank" rel="noopener noreferrer" class="">https://github.com/lyc8503/UptimeFlare</a>
<br>
<strong>Demo Site</strong>: <a href="https://uptimeflare.pages.dev/" target="_blank" rel="noopener noreferrer nofollow"></a><a href="https://uptimeflare.pages.dev/" target="_blank" rel="noopener noreferrer" class="">https://uptimeflare.pages.dev/</a>
<br>
<strong>Managed Version Available</strong>: No</p><p></p>
<p><strong>Live page examples</strong>:</p>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://status.seiright.com/" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">Sei AI Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/uptimeflare-seiright-status-page.webp" alt="Sei AI Status Page"></div>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:8px;padding:16px;margin:16px 0;background-color:var(--ifm-color-emphasis-100);color:var(--ifm-font-color-base)"><a href="https://status.dockrelix.org/" target="_blank" rel="noopener noreferrer nofollow" style="color:var(--ifm-link-color)">Dockrelix Status Page</a><br><br><img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid var(--ifm-color-emphasis-300);border-radius:10px" src="https://cdn.incidenthub.cloud/blog/uptimeflare-dockrelix-status-page.webp" alt="Dockrelix Status Page"></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="choosing-the-right-open-source-status-page-software">Choosing the Right Open Source Status Page Software<a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#choosing-the-right-open-source-status-page-software" class="hash-link" aria-label="Direct link to Choosing the Right Open Source Status Page Software" title="Direct link to Choosing the Right Open Source Status Page Software" translate="no">​</a></h2>
<p>When selecting an open source status page software, consider the following factors:</p>
<ol>
<li class="">Ease of installation and maintenance.</li>
<li class="">Compatibility with your existing tech stack.</li>
<li class="">Customization options and flexibility.</li>
<li class="">Community activity and long-term support.</li>
<li class="">Integration capabilities with your monitoring tools.</li>
<li class="">Performance and scalability.</li>
<li class="">Notification and alerting options.</li>
<li class="">Historical data retention and display.</li>
<li class="">User management and access control.</li>
</ol>
<p>To elaborate on #5 - some of these status pages are part of a bigger monitoring toolkit where the status data
is drawn from the same toolkit's monitors. If the status page does not support data from other sources you might not be able to use only the status page feature in such tools.
Others are standalone status pages - where you can push incident events from anywhere you want. This is an important factor to keep in mind.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="summary">Summary<a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#summary" class="hash-link" aria-label="Direct link to Summary" title="Direct link to Summary" translate="no">​</a></h2>
<div style="border:1px solid var(--ifm-color-emphasis-300);border-radius:12px;overflow:hidden;margin:16px 0;background-color:var(--ifm-background-color)"><style>
.summary-table table {
  border-radius: 0 !important;
  margin: 0 !important;
}
.summary-table table tr:last-child td {
  border-bottom: none !important;
}
.summary-table table tr:first-child th {
  border-top: none !important;
}
</style><div class="summary-table"><table><thead><tr><th>Name</th><th>Homepage</th><th>Demo Page</th><th>Source Code</th><th>Examples</th></tr></thead><tbody><tr><td><strong>Cachet</strong></td><td><a href="https://cachethq.io/" target="_blank" rel="noopener noreferrer" class="">cachethq.io</a></td><td><a href="https://v3.cachethq.io/" target="_blank" rel="noopener noreferrer" class="">v3.cachethq.io</a></td><td><a href="https://github.com/cachethq/cachet" target="_blank" rel="noopener noreferrer" class="">GitHub</a></td><td><a href="https://status.gnome.org/" target="_blank" rel="noopener noreferrer" class="">GNOME</a>, <a href="https://status.eea.europa.eu/" target="_blank" rel="noopener noreferrer" class="">EEA</a></td></tr><tr><td><strong>Statping-ng</strong></td><td><a href="https://statping-ng.github.io/" target="_blank" rel="noopener noreferrer" class="">statping-ng.github.io</a></td><td>N/A</td><td><a href="https://github.com/statping-ng/statping-ng" target="_blank" rel="noopener noreferrer" class="">GitHub</a></td><td><a href="https://status.jellyfin.org/service/repository" target="_blank" rel="noopener noreferrer" class="">Jellyfin</a>, <a href="https://status.bonifacelabs.net/" target="_blank" rel="noopener noreferrer" class="">Boniface Labs</a></td></tr><tr><td><strong>CState</strong></td><td><a href="https://cstate.netlify.app/" target="_blank" rel="noopener noreferrer" class="">cstate.netlify.app</a></td><td><a href="https://cstate.mnts.lt/" target="_blank" rel="noopener noreferrer" class="">cstate.mnts.lt</a></td><td><a href="https://github.com/cstate/cstate" target="_blank" rel="noopener noreferrer" class="">GitHub</a></td><td><a href="https://status.chocolatey.org/" target="_blank" rel="noopener noreferrer" class="">Chocolatey</a>, <a href="https://status.testing-farm.io/" target="_blank" rel="noopener noreferrer" class="">Testing Farm</a></td></tr><tr><td><strong>Upptime</strong></td><td><a href="https://upptime.js.org/" target="_blank" rel="noopener noreferrer" class="">upptime.js.org</a></td><td><a href="https://demo.upptime.js.org/" target="_blank" rel="noopener noreferrer" class="">demo.upptime.js.org</a></td><td><a href="https://github.com/upptime/upptime" target="_blank" rel="noopener noreferrer" class="">GitHub</a></td><td><a href="https://status.opensourcepos.org/" target="_blank" rel="noopener noreferrer" class="">Open Source Repos</a>, <a href="https://status.frai.se/" target="_blank" rel="noopener noreferrer" class="">Frai.se</a></td></tr><tr><td><strong>Vigil</strong></td><td><a href="https://crates.io/crates/vigil-server" target="_blank" rel="noopener noreferrer" class="">crates.io/crates/vigil-server</a></td><td>N/A</td><td><a href="https://github.com/valeriansaliou/vigil" target="_blank" rel="noopener noreferrer" class="">GitHub</a></td><td><a href="https://status.crisp.chat/" target="_blank" rel="noopener noreferrer" class="">Crisp</a>, <a href="https://status.autoinspector.ai/" target="_blank" rel="noopener noreferrer" class="">Autoinspector</a></td></tr><tr><td><strong>Gatus</strong></td><td><a href="https://gatus.io/" target="_blank" rel="noopener noreferrer" class="">gatus.io</a></td><td><a href="https://status.twin.sh/" target="_blank" rel="noopener noreferrer" class="">status.twin.sh</a></td><td><a href="https://github.com/TwiN/gatus" target="_blank" rel="noopener noreferrer" class="">GitHub</a></td><td><a href="https://defenseunicorns.gatus.io/" target="_blank" rel="noopener noreferrer" class="">Defense Unicorns</a>, <a href="https://status.skobk.in/" target="_blank" rel="noopener noreferrer" class="">Skobk</a></td></tr><tr><td><strong>StatusPal</strong></td><td><a href="https://statuspal.io/" target="_blank" rel="noopener noreferrer" class="">statuspal.io</a></td><td>N/A</td><td><a href="https://github.com/statuspal/statuspal" target="_blank" rel="noopener noreferrer" class="">GitHub</a></td><td><a href="https://status.unity.com/" target="_blank" rel="noopener noreferrer" class="">Unity</a>, <a href="https://ucheck.statuspal.io/" target="_blank" rel="noopener noreferrer" class="">UCheck</a></td></tr><tr><td><strong>Uptime Kuma</strong></td><td><a href="https://uptime.kuma.pet/" target="_blank" rel="noopener noreferrer" class="">uptime.kuma.pet</a></td><td><a href="https://uptime.kuma.pet/" target="_blank" rel="noopener noreferrer" class="">uptime.kuma.pet</a></td><td><a href="https://github.com/louislam/uptime-kuma" target="_blank" rel="noopener noreferrer" class="">GitHub</a></td><td><a href="https://status.sidingsmedia.com/" target="_blank" rel="noopener noreferrer" class="">Sidings Media</a>, <a href="https://status.bludood.com/" target="_blank" rel="noopener noreferrer" class="">BluDood</a></td></tr><tr><td><strong>OneUptime</strong></td><td><a href="https://oneuptime.com/" target="_blank" rel="noopener noreferrer" class="">oneuptime.com</a></td><td>N/A</td><td><a href="https://github.com/OneUptime/oneuptime" target="_blank" rel="noopener noreferrer" class="">GitHub</a></td><td><a href="https://status.lookandplay.io/" target="_blank" rel="noopener noreferrer" class="">Look &amp; Play</a>, <a href="https://uptime.cloudiabot.com/status-page/37eee8f3-95bf-4744-9fb7-000624473260" target="_blank" rel="noopener noreferrer" class="">Cloudiabot</a></td></tr><tr><td><strong>Kener</strong></td><td><a href="https://kener.ing/" target="_blank" rel="noopener noreferrer" class="">kener.ing</a></td><td><a href="https://kener.ing/" target="_blank" rel="noopener noreferrer" class="">kener.ing</a></td><td><a href="https://github.com/rajnandan1/kener" target="_blank" rel="noopener noreferrer" class="">GitHub</a></td><td><a href="https://status.nocodb.com/" target="_blank" rel="noopener noreferrer" class="">Nocodb</a>, <a href="https://status.rephealth.com/" target="_blank" rel="noopener noreferrer" class="">Rephealth</a></td></tr><tr><td><strong>OpenStatus</strong></td><td><a href="https://openstatus.dev/" target="_blank" rel="noopener noreferrer" class="">openstatus.dev</a></td><td>N/A</td><td><a href="https://github.com/openstatusHQ/openstatus" target="_blank" rel="noopener noreferrer" class="">GitHub</a></td><td><a href="https://bds-status.birdeye.so/" target="_blank" rel="noopener noreferrer" class="">Birdeye</a>, <a href="https://status.documenso.com/" target="_blank" rel="noopener noreferrer" class="">Documenso</a></td></tr><tr><td><strong>UptimeFlare</strong></td><td><a href="https://uptimeflare.pages.dev/" target="_blank" rel="noopener noreferrer" class="">uptimeflare.pages.dev</a></td><td><a href="https://uptimeflare.pages.dev/" target="_blank" rel="noopener noreferrer" class="">uptimeflare.pages.dev</a></td><td><a href="https://github.com/lyc8503/UptimeFlare" target="_blank" rel="noopener noreferrer" class="">GitHub</a></td><td><a href="https://status.seiright.com/" target="_blank" rel="noopener noreferrer" class="">Sei AI</a>, <a href="https://status.dockrelix.org/" target="_blank" rel="noopener noreferrer" class="">Dockrelix</a></td></tr></tbody></table></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="conclusion">Conclusion<a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>Open source status page software offer a cost-effective way to keep your users informed about your service's health and performance. By choosing the right tool from this guide, you can enhance transparency, build trust with your users, and streamline your incident communication process.</p>
<p>This article was first published on the <a href="https://blog.incidenthub.cloud/The-2025-Guide-to-Open-Source-Status-Page-Software" target="_blank" rel="noopener noreferrer" class="">IncidentHub blog</a>.</p>]]></content:encoded>
            <category>Status Pages</category>
            <category>Monitoring</category>
        </item>
        <item>
            <title><![CDATA[Improving the Developer Experience by Monitoring Third-Party Outages]]></title>
            <link>https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages</link>
            <guid>https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages</guid>
            <pubDate>Tue, 19 Aug 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Third-party services form a key part of your software development stack. Read this article to find out how to integrate third-party monitoring.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="introduction">Introduction<a href="https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages#introduction" class="hash-link" aria-label="Direct link to Introduction" title="Direct link to Introduction" translate="no">​</a></h2>
<p>The role of third-party SaaS and cloud services in the modern software development stack needs no explanation.
Primarily due to the ease of setting and hooking them up together, they make the software development lifecycle (SDLC) much easier than it was 10 years ago. No more managing the overhead of installing, configuring, maintaining, backing up, and scaling of source code repos, virtual machines, and CI/CD systems. Some SaaS services don't have any in-house options, e.g. payment gateways, so you have to use them.</p>
<p>This dependency on third-party services also brings risks. The more such services in the chain, the more likely it is that a failure in one of them
will impact or even cripple your smoothly running development and deployment pipeline. These failures by extension will also impact your business and customers.</p>
<p>You have vetted and chosen reliable services. However, outages happen. The best you can do is to prepare for them and know when they occur. This article is about the knowing part.</p>
<img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/frustrated-developer.webp" alt="Frustrated developer">
<br>
<br>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages#introduction" class="">Introduction</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages#the-role-of-third-party-services" class="">The Role of Third-Party Services</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages#the-impact-of-third-party-outages" class="">The Impact of Third-Party Outages</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages#who-is-impacted" class="">Who is Impacted?</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages#monitoring-third-party-service-outages" class="">Monitoring Third-Party Service Outages</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages#how-can-i-monitor-third-party-dependencies-during-an-incident" class="">How Can I Monitor Third-Party Dependencies During an Incident?</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages#tracking-status-pages-manually" class="">Tracking Status Pages Manually</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages#tracking-status-pages-with-a-status-page-aggregator" class="">Tracking Status Pages with a Status Page Aggregator</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages#best-practices" class="">Best Practices</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages#conclusion" class="">Conclusion</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages#summary" class="">Summary</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-role-of-third-party-services">The Role of Third-Party Services<a href="https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages#the-role-of-third-party-services" class="hash-link" aria-label="Direct link to The Role of Third-Party Services" title="Direct link to The Role of Third-Party Services" translate="no">​</a></h2>
<p>A typical software development organization has many third-party services:</p>
<ul>
<li class="">Infrastructure<!-- -->
<ul>
<li class="">DNS</li>
<li class="">Cloud Provider</li>
<li class="">PaaS</li>
<li class="">Content Delivery Network</li>
</ul>
</li>
<li class="">Monitoring<!-- -->
<ul>
<li class="">Observability</li>
<li class="">On-call management</li>
</ul>
</li>
<li class="">Communication and Collaboration<!-- -->
<ul>
<li class="">Email</li>
<li class="">Chat</li>
<li class="">Office suites</li>
</ul>
</li>
<li class="">Development and Operations<!-- -->
<ul>
<li class="">Source Code Repositories</li>
<li class="">CI/CD</li>
<li class="">Artifact repositories</li>
<li class="">LLM APIs</li>
<li class="">Auth APIs</li>
<li class="">Artifact repositories</li>
<li class="">Project management</li>
</ul>
</li>
<li class="">Product-function related SaaS<!-- -->
<ul>
<li class="">Payment gateways</li>
<li class="">SMTP Providers</li>
</ul>
</li>
<li class="">Other product functions<!-- -->
<ul>
<li class="">Marketing</li>
<li class="">Customer support and ticketing</li>
<li class="">Analytics</li>
</ul>
</li>
</ul>
<p>It's not uncommon to have more than 100+ third-party services. According to the <a href="https://www.bettercloud.com/resources/state-of-saas/" target="_blank" rel="noopener noreferrer" class="">2025 State of SaaS report</a>, companies use
an average of 106 SaaS apps. Knowing when a service is unavailable is important to many folks, including your development and operations teams. Your product reliability depends directly on external services.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-impact-of-third-party-outages">The Impact of Third-Party Outages<a href="https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages#the-impact-of-third-party-outages" class="hash-link" aria-label="Direct link to The Impact of Third-Party Outages" title="Direct link to The Impact of Third-Party Outages" translate="no">​</a></h2>
<p>Recently, somebody asked me:</p>
<p style="background-color:var(--ifm-color-emphasis-200);padding:10px;padding-top:15px;padding-bottom:8px;padding-left:20px;border-radius:10px;border-left:3px solid #60a5fa"><i>"So what if you know that a service is down? What is the point of knowing if you cannot do anything about it?"</i></p>
<p>Sounds logical.</p>
<p>However, the question assumes that it is easy to determine which service is experiencing an outage. So let me break that down:</p>
<ul>
<li class="">It's hard to know which of your hundreds of dependencies has an outage. The problem of checking different status pages, support forums, social media, and so on explodes when you have to do it for 10+ services. If you have just 1 or 2, it's easy to check.</li>
<li class="">There <em>is</em> something you can do about it. Once you know that a particular service has an outage, you can save your team hours of debugging and troubleshooting. IncidentHub was born out of such personal experiences in my past roles where I doubled as a backend dev and Ops engineer. I could see the impact that a GitHub outage had on my dev team (PRs stuck, builds failing), and also the impact that a Slack outage had on my sales team (lost/delayed messages, anyone?). And nobody knew why. The status pages had the answers all along.</li>
</ul>
<p>The real cost here is the waste of developer time and customer escalations.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="who-is-impacted">Who is Impacted?<a href="https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages#who-is-impacted" class="hash-link" aria-label="Direct link to Who is Impacted?" title="Direct link to Who is Impacted?" translate="no">​</a></h3>
<p>Third-party services can directly impact you and your team.</p>
<ul>
<li class="">If you are a SaaS vendor, your product's reliability is dependent on the reliability of the third-party services it uses.</li>
<li class="">If you provide mobile/web app development services, your ability to deliver on time depends on the availability of your third-party vendors.</li>
</ul>
<p>Knowing which services are down - and when they are back up - is better than being in the dark about the fact that an external service has caused a problem.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="monitoring-third-party-service-outages">Monitoring Third-Party Service Outages<a href="https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages#monitoring-third-party-service-outages" class="hash-link" aria-label="Direct link to Monitoring Third-Party Service Outages" title="Direct link to Monitoring Third-Party Service Outages" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="how-can-i-monitor-third-party-dependencies-during-an-incident">How Can I Monitor Third-Party Dependencies During an Incident?<a href="https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages#how-can-i-monitor-third-party-dependencies-during-an-incident" class="hash-link" aria-label="Direct link to How Can I Monitor Third-Party Dependencies During an Incident?" title="Direct link to How Can I Monitor Third-Party Dependencies During an Incident?" translate="no">​</a></h3>
<p>If you have third-party incident tracking as part of your incident management process, you have already taken the first step.
The most common way to track third-party outages is to <a class="" href="https://blog.incidenthub.cloud/How-To-Monitor-Public-Status-Pages-of-Cloud-Providers-a-Step-by-Step-Approach">check their status pages</a>. But how do you track hundreds of status pages?</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="tracking-status-pages-manually">Tracking Status Pages Manually<a href="https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages#tracking-status-pages-manually" class="hash-link" aria-label="Direct link to Tracking Status Pages Manually" title="Direct link to Tracking Status Pages Manually" translate="no">​</a></h3>
<p>Manual monitoring of status pages is fraught with challenges:</p>
<ul>
<li class="">Status page providers can change and break existing subscriptions and notifications without warning.</li>
<li class="">Many status pages offer RSS feeds only without component/region filtering, forcing you to drown in alert noise.</li>
<li class="">Manual monitoring of 100+ status pages is not practical.</li>
<li class="">Some status pages lack feeds or subscription options.</li>
<li class="">Status page URLs can change, leaving you unaware of outages.</li>
<li class="">DIY solutions like pushing RSS feeds into Slack lack filtering capabilities, can break when status pages change, and are an ongoing maintenance burden on your teams.</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="tracking-status-pages-with-a-status-page-aggregator">Tracking Status Pages with a Status Page Aggregator<a href="https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages#tracking-status-pages-with-a-status-page-aggregator" class="hash-link" aria-label="Direct link to Tracking Status Pages with a Status Page Aggregator" title="Direct link to Tracking Status Pages with a Status Page Aggregator" translate="no">​</a></h3>
<p>A <a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator" target="_blank" rel="noopener noreferrer" class="">status page monitor/aggregator</a> can simplify the process of tracking status pages.</p>
<p>A status page aggregator like IncidentHub:</p>
<ul>
<li class="">Offers a single normalized view across cloud providers' status pages.</li>
<li class="">Hides the complexity of different status page formats.</li>
<li class="">Detects and adjusts to changing status page formats over time.</li>
<li class="">Let's you choose the notification mode you want for alerts.</li>
<li class="">Offers notification modes not available on the status page.</li>
<li class="">Let's you analyze historical data and availability trends.</li>
</ul>
<p>When you use a status page aggregator, you can choose to receive only those alerts that are relevant. Depending on your team's needs, you can push the alerts to any of the following:</p>
<ul>
<li class="">A <a class="" href="https://blog.incidenthub.cloud/Integrate-Incident-Alerts-Into-Your-Slack-Workspace">Slack</a> channel.</li>
<li class="">A <a class="" href="https://blog.incidenthub.cloud/Integrate-Incident-Alerts-With-Discord-Using-Webhooks">Discord</a> channel.</li>
<li class="">A custom <a class="" href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook">webhook</a> that integrates with your internal dashboard.</li>
<li class="">A team <a href="https://docs.incidenthub.cloud/incidenthub-documentation/channels/email-integration" target="_blank" rel="noopener noreferrer" class="">email</a> address.</li>
<li class="">A <a class="" href="https://blog.incidenthub.cloud/how-to-receive-cloud-outage-alerts-in-microsoft-teams">Microsoft Teams</a> channel.</li>
<li class="">A Zendesk workspace.</li>
<li class="">A custom integration using APIs.</li>
</ul>
<img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/slack-alerts-view.webp" alt="Public status page example">
<p><br>
<!-- -->Another option if you choose not to receive alerts is to put up a unified status page on a large screen TV or monitor. It shows you the overall status of all your third-party dependencies at a glance.</p>
<img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/public-status-page-developers.webp" alt="Public status page example">
<p><br>
<!-- -->Overall, a better developer experience.</p>
<p>Check out this short video tutorial on how to set up a unified status page with all your third-party services with a status page aggregator.</p>
<iframe style="border:1px solid #e0e0e0;border-radius:10px;width:100%;height:315px;margin:0 auto" src="https://www.youtube.com/embed/2auQnYW215M" frameborder="0" allow="autoplay; encrypted-media" title="IncidentHub Public Status Page Setup Tutorial"></iframe>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="best-practices">Best Practices<a href="https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages#best-practices" class="hash-link" aria-label="Direct link to Best Practices" title="Direct link to Best Practices" translate="no">​</a></h2>
<p>If you are monitoring third-party services using a status page aggregator like IncidentHub, there are a few best practices you should follow:</p>
<ul>
<li class="">Fine-tune your alerts so that your team is not overwhelmed by alert fatigue.</li>
<li class="">Check in with your team periodically to see if they are seeing value from the alerts. If not, a unified status page displayed prominently might be better suited for your team.</li>
<li class="">Periodically ensure that the list of monitored services and components is up to date.</li>
<li class="">Include third-party service outages in your incident response plan.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="conclusion">Conclusion<a href="https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>Third-party services form a key part of your software development stack. Knowing when a service is down is important to many folks, including your development and operations teams.
A status page aggregator like IncidentHub can help you track third-party outages and let's you focus on product development.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="summary">Summary<a href="https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages#summary" class="hash-link" aria-label="Direct link to Summary" title="Direct link to Summary" translate="no">​</a></h2>
<table><thead><tr><th>Aspect</th><th>Description</th></tr></thead><tbody><tr><td><strong>Problem</strong></td><td>Modern software development relies heavily on 100+ third-party services, making it difficult to track outages across all dependencies.</td></tr><tr><td><strong>Impact</strong></td><td>Third-party outages waste developer time and cause customer escalations when teams don't know which service is down.</td></tr><tr><td><strong>Manual Tracking Challenges</strong></td><td>Status page monitoring manually is impractical due to changing formats, lack of filtering, too many pages, and maintenance burden.</td></tr><tr><td><strong>Solution</strong></td><td>Status page aggregators like IncidentHub provide unified tracking across all third-party services.</td></tr><tr><td><strong>Benefits</strong></td><td>Single normalized view, automated alerts, historical data analysis, and customizable notification channels.</td></tr><tr><td><strong>Implementation</strong></td><td>Can be integrated via Slack, Discord, webhooks, or displayed on unified status dashboards.</td></tr></tbody></table>
<p style="background-color:var(--ifm-color-emphasis-200);padding:10px;border-radius:10px;border-left:3px solid #60a5fa"></p><p>You might also like:</p><ul><li><a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator">Top 6 Reasons Why You Need a Status Page Aggregator</a></li><li><a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator">Product Update - Public Status Pages</a></li><li><a href="https://blog.incidenthub.cloud/A-Step-by-Step-Guide-to-Checking-if-a-SaaS-is-Down">A Step by Step Guide to Checking if a SaaS is Down</a></li></ul><p></p>
<hr>
<p>This article was originally published on the <a href="https://blog.incidenthub.cloud/improving-developer-experience-by-monitoring-third-party-outages" target="_blank" rel="noopener noreferrer" class="">IncidentHub blog</a>.</p>
<p>All product names, company names, logos and trademarks are the property of their respective owners.</p>
<p>Photo credits: <a href="https://unsplash.com/@seogalaxy?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">SEO Galaxy</a> on <a href="https://unsplash.com/photos/a-woman-covering-her-face-while-looking-at-a-laptop-yusHnkBhF3Q?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Unsplash</a></p>]]></content:encoded>
            <category>Monitoring</category>
            <category>Software Development</category>
        </item>
        <item>
            <title><![CDATA[The Ultimate Guide to Incident Management Tools in 2025]]></title>
            <link>https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025</link>
            <guid>https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025</guid>
            <pubDate>Sat, 09 Aug 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Incident Management tools form a key part of your business operations. Read this article to find out which incident management tools you should look at in 2025.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="introduction">Introduction<a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#introduction" class="hash-link" aria-label="Direct link to Introduction" title="Direct link to Introduction" translate="no">​</a></h2>
<p><em>Last updated on September 2, 2025.</em></p>
<p>Incident management tools play a key role in helping organizations to effectively handle service outages. With so many incident management tools around with different feature sets, it's often difficult to find the one that is right for your needs. In this article, we attempt to make a list of incident management software available in 2025 with their features to help you arrive at the right one.</p>
<p>We have focused on tools that have incident management capabilities. We have left out many good tools which are focused only on incident response, or on monitoring and alert triggering, or on ticket management to avoid cluttering this article.</p>
<p>There are a few additions and removals compared to the <a class="" href="https://blog.incidenthub.cloud/The-Ultimate-List-of-Incident-Management-Tools-in-2024">2024 list</a>.</p>
<img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/ultimate-list-of-incident-management-tools-2025.webp" alt="Incident Management Tools">
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#introduction" class="">Introduction</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#benefits-of-using-an-incident-management-tool" class="">Benefits of Using an Incident Management Tool</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#how-to-choose-the-right-incident-management-tool" class="">How To Choose the Right Incident Management Tool</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#features" class="">Features</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#cost" class="">Cost</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#support" class="">Support</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#reliability" class="">Reliability</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#integration-with-your-workflow" class="">Integration With Your Workflow</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#ability-to-scale-with-your-growth" class="">Ability to Scale With Your Growth</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#documentation" class="">Documentation</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#ease-of-use" class="">Ease of Use</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#data-security" class="">Data Security</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#trends-in-incident-management-tools-in-2025" class="">Trends in Incident Management Tools in 2025</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#ai-ops" class="">AI Ops</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#focused-workflows" class="">Focused Workflows</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#list-of-incident-management-tools-in-2025" class="">List of Incident Management Tools in 2025</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#pagerduty" class="">PagerDuty</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#servicenow" class="">ServiceNow</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#splunk-on-call" class="">Splunk On-Call</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#grafana-cloud-irm" class="">Grafana Cloud IRM</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#ilert" class="">iLert</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#incidentio" class="">incident.io</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#firehydrant" class="">FireHydrant</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#squadcast" class="">Squadcast</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#better-stack" class="">Better Stack</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#rootly" class="">Rootly</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#xmatters" class="">xMatters</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#alertops" class="">AlertOps</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#conclusion" class="">Conclusion</a></li>
</ul>
<p style="background-color:var(--ifm-color-emphasis-100);padding-top:18px;padding-bottom:1px;margin-top:20px;text-align:center;vertical-align:middle;border-radius:10px"></p><p>Download a summary of this article as a <a href="https://cdn.incidenthub.cloud/ebooks/The-Ultimate-Guide-to-Incident-Management-Tools-in-2025.pdf" target="_blank" rel="noopener noreferrer" class="">PDF</a></p><p></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="benefits-of-using-an-incident-management-tool">Benefits of Using an Incident Management Tool<a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#benefits-of-using-an-incident-management-tool" class="hash-link" aria-label="Direct link to Benefits of Using an Incident Management Tool" title="Direct link to Benefits of Using an Incident Management Tool" translate="no">​</a></h2>
<ul>
<li class="">An incident management tool streamlines the incident management process by defining and automating workflows for your on-call teams. It can assist you in creating runbooks, alerting policies, escalation policies, and defining and managing on-call schedules.</li>
<li class="">Incident management software can integrate with your in-house observability stack. Your observability stack is a key source of incidents.</li>
<li class="">Incident management tools can integrate with third-party observability and uptime monitoring tools, which are another source of incidents.</li>
<li class="">They can also integrate with your existing <a class="" href="https://blog.incidenthub.cloud/The-Rising-Role-of-Slack-in-Incident-Management">communication</a> and collaboration tools to provide real-time updates.</li>
<li class="">Some incident management tools add context to your ongoing incidents by pulling in data from your infrastructure, applications, and observability systems. This can add significant information to narrow down the root cause.</li>
<li class="">Incident management tools can provide analytics which can be used to gain insights into patterns and performance to create a culture of continuous improvement.</li>
<li class="">An incident management tool can also generate audit trails and standardized documentation for compliance requirements.</li>
<li class="">Incident Management software can also provide <a class="" href="https://blog.incidenthub.cloud/Best-Practices-Choosing-Status-Page-Provider">status pages</a> - both internal and external - for your stakeholders.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="how-to-choose-the-right-incident-management-tool">How To Choose the Right Incident Management Tool<a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#how-to-choose-the-right-incident-management-tool" class="hash-link" aria-label="Direct link to How To Choose the Right Incident Management Tool" title="Direct link to How To Choose the Right Incident Management Tool" translate="no">​</a></h2>
<p>The important thing to note here is that you need to choose the right tool, not necessarily the best one in the market, or what the majority is using. To do that you need to define the right criteria. These guidelines will help you to
arrive at the parameters you can look at to evaluate tools.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="features">Features<a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#features" class="hash-link" aria-label="Direct link to Features" title="Direct link to Features" translate="no">​</a></h3>
<p>Identify the key features you absolutely need to have. Shiny features that sound good to have and may be helpful for other teams need not necessarily be a part of your checklist.
Work with your team to identify the features you need. Does the tool have them in a single pricing tier that fits your budget?</p>
<p>A non-exhaustive list that you can use to arrive at a decision:</p>
<ul>
<li class="">Incident lifecycle management</li>
<li class="">On-call scheduling and management</li>
<li class="">Schedule overrides</li>
<li class="">Alerting policies</li>
<li class="">Third-party integrations</li>
<li class="">Analytics and reporting</li>
<li class="">Status pages</li>
<li class="">Role-based access control</li>
<li class="">API access</li>
<li class="">Runbooks integration</li>
<li class="">Mobile app</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="cost">Cost<a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#cost" class="hash-link" aria-label="Direct link to Cost" title="Direct link to Cost" translate="no">​</a></h3>
<p>Cost is obviously an important factor. Does the tool have transparent pricing? Are there any hidden costs?</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="support">Support<a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#support" class="hash-link" aria-label="Direct link to Support" title="Direct link to Support" translate="no">​</a></h3>
<p>Does the tool have 24/7 support? How responsive is their support team? Look at the various channels by which you can reach support - phone, email, chat, social media channels.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="reliability">Reliability<a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#reliability" class="hash-link" aria-label="Direct link to Reliability" title="Direct link to Reliability" translate="no">​</a></h3>
<p>Does the tool have a good track record of being available? How transparent are they about their uptime? Look at their public status page and search for reviews on social sites.
What are other users saying about their experience?</p>
<p>Incident management tools need to be always available and reliable - or you run the risk of not knowing when anything goes wrong.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="integration-with-your-workflow">Integration With Your Workflow<a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#integration-with-your-workflow" class="hash-link" aria-label="Direct link to Integration With Your Workflow" title="Direct link to Integration With Your Workflow" translate="no">​</a></h3>
<p>The tool should integrate well with your existing workflow. Your team probably already has a well-defined one.
Look at the third-party integrations available. Some might integrate directly with your communication and collaboration tools, others might need custom API integration.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="ability-to-scale-with-your-growth">Ability to Scale With Your Growth<a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#ability-to-scale-with-your-growth" class="hash-link" aria-label="Direct link to Ability to Scale With Your Growth" title="Direct link to Ability to Scale With Your Growth" translate="no">​</a></h3>
<p>Do a forecast of how much your team and infrastructure will grow in the next few years. Would that lead to an increase in your monthly bill for the tool from more users/alerts/etc?</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="documentation">Documentation<a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#documentation" class="hash-link" aria-label="Direct link to Documentation" title="Direct link to Documentation" translate="no">​</a></h3>
<p>Does the tool have comprehensive documentation? Can your team find easy-to-follow instructions on how to use it? What about advanced use cases?</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="ease-of-use">Ease of Use<a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#ease-of-use" class="hash-link" aria-label="Direct link to Ease of Use" title="Direct link to Ease of Use" translate="no">​</a></h3>
<p>A good UI is crucial, especially in times of crisis. The UI should be intuitive, and have an easy way to both summarize the ongoing incidents and drill down into the details.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="data-security">Data Security<a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#data-security" class="hash-link" aria-label="Direct link to Data Security" title="Direct link to Data Security" translate="no">​</a></h3>
<p>Read the data security and privacy policy for the tool. Depending on your business's regulatory requirements, you might need to have a tool that is compliant with your required data security standards.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="trends-in-incident-management-tools-in-2025">Trends in Incident Management Tools in 2025<a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#trends-in-incident-management-tools-in-2025" class="hash-link" aria-label="Direct link to Trends in Incident Management Tools in 2025" title="Direct link to Trends in Incident Management Tools in 2025" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="ai-ops">AI Ops<a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#ai-ops" class="hash-link" aria-label="Direct link to AI Ops" title="Direct link to AI Ops" translate="no">​</a></h3>
<p>AI-based incident management, alert grouping, automated root cause analysis, and natural language interaction for incidents are growing. Although a lot of tools have incorporated AI-based features, it's still in its infancy when it comes to automated remediation. Security risks are also a concern.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="focused-workflows">Focused Workflows<a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#focused-workflows" class="hash-link" aria-label="Direct link to Focused Workflows" title="Direct link to Focused Workflows" translate="no">​</a></h3>
<p>More and more tools are focusing on the ability to handle the entire incident lifecycle in one place. For example, <a class="" href="https://blog.incidenthub.cloud/The-Rising-Role-of-Slack-in-Incident-Management">Slack-based incident management tools</a> - for teams that use Slack - enable users to manage incidents directly from Slack.
This makes it easier for users to collaborate during high-stress situations as they are working in a familiar environment with ready access to colleagues and other information.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="list-of-incident-management-tools-in-2025">List of Incident Management Tools in 2025<a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#list-of-incident-management-tools-in-2025" class="hash-link" aria-label="Direct link to List of Incident Management Tools in 2025" title="Direct link to List of Incident Management Tools in 2025" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="pagerduty"><a href="https://www.pagerduty.com/" target="_blank" rel="noopener noreferrer" class="">PagerDuty</a><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#pagerduty" class="hash-link" aria-label="Direct link to pagerduty" title="Direct link to pagerduty" translate="no">​</a></h3>
<img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/pagerduty-dashboard.webp" alt="PagerDuty">
<p><br>
<strong>Key Features</strong></p>
<ul>
<li class="">Alerting over multiple channels including phone, app, email.</li>
<li class="">On-call management - scheduling, roster management, overrides.</li>
<li class="">Rule definitions for alert routing.</li>
<li class="">Integrations with most common tools.</li>
<li class="">APIs for incident lifecycle management.</li>
<li class="">Status pages.</li>
<li class="">Support for teams with role-based permissions.</li>
<li class="">Integration with ITSM tools.</li>
<li class="">Analytics.</li>
<li class="">Single sign-on.</li>
<li class="">Maintenance mode</li>
</ul>
<p>PagerDuty is best for large enterprises requiring comprehensive incident management, although it can be used by smaller teams too.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="servicenow"><a href="https://www.servicenow.com/products/incident-management.html" target="_blank" rel="noopener noreferrer" class="">ServiceNow</a><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#servicenow" class="hash-link" aria-label="Direct link to servicenow" title="Direct link to servicenow" translate="no">​</a></h3>
<img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/servicenow-incidents.webp" alt="ServiceNow Incidents">
<p><br>
<strong>Key Features</strong></p>
<ul>
<li class="">On-call scheduling with overrides.</li>
<li class="">Supports multiple notification channels.</li>
<li class="">Automated ticket routing.</li>
<li class="">SLA tracking.</li>
<li class="">Compliance and governance features.</li>
<li class="">Integrations with many third-party tools.</li>
<li class="">Analytics.</li>
</ul>
<p>It's best suited for organizations using ServiceNow products like ITSM.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="splunk-on-call"><a href="https://www.splunk.com/en_us/products/on-call.html" target="_blank" rel="noopener noreferrer" class="">Splunk On-Call</a><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#splunk-on-call" class="hash-link" aria-label="Direct link to splunk-on-call" title="Direct link to splunk-on-call" translate="no">​</a></h3>
<img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/splunkoncall-dashboard.webp" alt="Splunk On-Call Dashboard">
<p><br>
<strong>Key Features</strong></p>
<ul>
<li class="">On-call schedules and overrides.</li>
<li class="">Role-based permissions.</li>
<li class="">Rules engine for triggering custom actions.</li>
<li class="">Incident waiting rooms to reduce alert fatigue.</li>
<li class="">Maintenance mode</li>
<li class="">Notifications via email, phone, SMS, email, app push.</li>
<li class="">Third-party integrations with many common tools.</li>
</ul>
<p>Splunk On-Call, formerly VictorOps, is best suited for teams already using Splunk for monitoring.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="grafana-cloud-irm"><a href="https://grafana.com/products/cloud/oncall/" target="_blank" rel="noopener noreferrer" class="">Grafana Cloud IRM</a><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#grafana-cloud-irm" class="hash-link" aria-label="Direct link to grafana-cloud-irm" title="Direct link to grafana-cloud-irm" translate="no">​</a></h3>
<img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/grafanairm-alerting.webp" alt="Grafana Cloud IRM Dashboard">
<p><br>
<strong>Key Features</strong></p>
<ul>
<li class="">Open source and also has a managed solution.</li>
<li class="">Alert grouping.</li>
<li class="">Escalation policies.</li>
<li class="">Alert routing.</li>
<li class="">Calendar-based on-call schedule and roster.</li>
<li class="">Maintenance mode.</li>
<li class="">Integrations with common third-party tools - Slack, SMS, Telegram, Phone.</li>
<li class="">Role based access control.</li>
<li class="">Analytics.</li>
</ul>
<p>Formerly known as Grafana OnCall, the OnCall and Incident apps were <a href="https://grafana.com/blog/2025/03/11/oncall-management-incident-response-grafana-cloud-irm/" target="_blank" rel="noopener noreferrer" class="">merged</a> into a single Grafana Cloud IRM.
Grafana Cloud IRM works seamlessly with other Grafana Cloud products, so it is best suited for teams already using Grafana for monitoring.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="ilert"><a href="https://www.ilert.com/product/on-call-management-escalations" target="_blank" rel="noopener noreferrer" class="">iLert</a><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#ilert" class="hash-link" aria-label="Direct link to ilert" title="Direct link to ilert" translate="no">​</a></h3>
<img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/ilert-dashboard.webp" alt="iLert Dashboard">
<p><br>
<strong>Key Features</strong></p>
<ul>
<li class="">On-call schedules and escalation policies.</li>
<li class="">Notifications using SMS, push, voice call.</li>
<li class="">Maintenance support.</li>
<li class="">Critical phone call routing using customizable multi-language IVR.</li>
<li class="">Public status pages.</li>
<li class="">Integrations with MS Teams and Slack for chatops-based incident management.</li>
<li class="">Integrates with most common tools.</li>
</ul>
<p>iLert is best suited for mid-sized Ops teams.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="incidentio"><a href="https://incident.io/" target="_blank" rel="noopener noreferrer" class="">incident.io</a><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#incidentio" class="hash-link" aria-label="Direct link to incidentio" title="Direct link to incidentio" translate="no">​</a></h3>
<img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/incidentio-alert.webp" alt="incident.io Dashboard">
<p><br>
<strong>Key Features</strong></p>
<ul>
<li class="">On-call scheduling and escalations, with overrides.</li>
<li class="">Notifications with app push, phone, email, Slack, MS Teams.</li>
<li class="">Incident lifecycle management from within Slack.</li>
<li class="">Private incidents support.</li>
<li class="">API for integration and data access.</li>
<li class="">Status pages.</li>
<li class="">Analytics.</li>
<li class="">Third-party integrations.</li>
<li class="">Integrates with CRM systems.</li>
</ul>
<p>incident.io focuses on being an incident management platform with a Slack-first approach.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="firehydrant"><a href="https://firehydrant.com/" target="_blank" rel="noopener noreferrer" class="">FireHydrant</a><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#firehydrant" class="hash-link" aria-label="Direct link to firehydrant" title="Direct link to firehydrant" translate="no">​</a></h3>
<img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/firehydrant-dashboard.webp" alt="FireHydrant Dashboard">
<p><br>
<strong>Key Features</strong></p>
<ul>
<li class="">On-call management.</li>
<li class="">Notifications on app push, Slack, Whatsapp.</li>
<li class="">Runbooks.</li>
<li class="">Service catalog.</li>
<li class="">Incident retrospectives.</li>
<li class="">Analytics.</li>
<li class="">Integrates with most common tools.</li>
<li class="">Status pages.</li>
</ul>
<p>FireHydrant with its strong incident workflows and retrospectives is best suited for SRE teams.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="squadcast"><a href="https://www.squadcast.com/" target="_blank" rel="noopener noreferrer" class="">Squadcast</a><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#squadcast" class="hash-link" aria-label="Direct link to squadcast" title="Direct link to squadcast" translate="no">​</a></h3>
<img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/squadcast-dashboard.webp" alt="Squadcast Dashboard">
<p><br>
<strong>Key Features</strong></p>
<ul>
<li class="">On-call scheduling, escalation policies, and overrides.</li>
<li class="">Integrations with common tools.</li>
<li class="">Live call routing to connect to on-call folks directly.</li>
<li class="">Alert classification and routing rules.</li>
<li class="">Auto-pause flapping alerts.</li>
<li class="">Analytics.</li>
<li class="">Manage incidents directly from Slack.</li>
<li class="">Runbooks.</li>
<li class="">Status pages.</li>
</ul>
<p>Squadcast, acquired by SolarWinds, is meant for modern SRE and Ops teams with its alert routing, post-mortem support, and chatops features.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="better-stack"><a href="https://betterstack.com/" target="_blank" rel="noopener noreferrer" class="">Better Stack</a><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#better-stack" class="hash-link" aria-label="Direct link to better-stack" title="Direct link to better-stack" translate="no">​</a></h3>
<img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/betterstack-dashboard.webp" alt="Better Stack Dashboard">
<p><br>
<strong>Key Features</strong></p>
<ul>
<li class="">On-call scheduling and escalation policies.</li>
<li class="">Incident grouping.</li>
<li class="">Status pages.</li>
<li class="">Integrations with common tools.</li>
<li class="">Single-sign on.</li>
<li class="">Teams support.</li>
</ul>
<p>Better Stack is a suite of products that includes monitoring and logging also, but we felt it should be included in this list because of its integrated on-call features.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="rootly"><a href="https://rootly.com/on-call" target="_blank" rel="noopener noreferrer" class="">Rootly</a><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#rootly" class="hash-link" aria-label="Direct link to rootly" title="Direct link to rootly" translate="no">​</a></h3>
<img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/rootly-dashboard.webp" alt="Rootly Dashboard">
<p><strong>Key Features</strong></p>
<ul>
<li class="">On-call scheduling, escalation policies, and overrides.</li>
<li class="">Alert grouping based on time-window and on content.</li>
<li class="">Integrates with many third-party tools.</li>
<li class="">Playbooks.</li>
<li class="">Support for managing the incident lifecycle directly from Slack.</li>
<li class="">Retrospectives with automatic data capture and sync with Jira.</li>
<li class="">Analytics.</li>
</ul>
<p>Rootly specializes in automating incident workflows with strong integration capabilities and customizable playbooks.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="xmatters"><a href="https://www.xmatters.com/" target="_blank" rel="noopener noreferrer" class="">xMatters</a><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#xmatters" class="hash-link" aria-label="Direct link to xmatters" title="Direct link to xmatters" translate="no">​</a></h3>
<img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/xmatters-dashboard.webp" alt="xMatters Dashboard">
<p><br>
<strong>Key Features</strong></p>
<ul>
<li class="">Alert threshold configuration.</li>
<li class="">Automatic correlation between alerts that might relate to the same underlying issue.</li>
<li class="">Ability to subscribe to notifications as an observer.</li>
<li class="">Integrates with many common tools like Slaack, ServiceNow, Google Chat, MS Teams, etc for alerting</li>
<li class="">Supports cloud provider integrations for monitoring with AWS, Azure, GCP, etc.</li>
<li class="">Detailed analytics and reporting.</li>
<li class="">Incident timeline view.</li>
<li class="">On-call scheduling, rotation, and management.</li>
<li class="">SMS, Phone, Push notifications.</li>
<li class="">Status pages.</li>
<li class="">Mobile apps for iOS and Android.</li>
</ul>
<p>xMatters is suited for Ops/SRE and IT Teams and also for teams that have special regulatory requirements like healthcare and financial services.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="alertops"><a href="https://alertops.com/" target="_blank" rel="noopener noreferrer" class="">AlertOps</a><a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#alertops" class="hash-link" aria-label="Direct link to alertops" title="Direct link to alertops" translate="no">​</a></h3>
<p><strong>Key Features</strong></p>
<ul>
<li class="">On-call scheduling, rotation, and management.</li>
<li class="">Dynamic alert routing and grouping.</li>
<li class="">Custom data enrichment to add more context to alerts.</li>
<li class="">API-based integration with.</li>
<li class="">Integration with Slack, MS Teams, etc.</li>
<li class="">Integrates with Ops/SRE tools like Jenkins, GitLab, GitHub, Prometheus.</li>
<li class="">Predefined message templates.</li>
<li class="">Status pages.</li>
<li class="">Post-mortem reports.</li>
<li class="">AI-based grouping and natural language interaction for incidents.</li>
</ul>
<p>AlertOps is used by IT operations, Ops, and incident response teams to manage alerts and coordinate response.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="conclusion">Conclusion<a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>Choosing an incident management tool involves looking at the problem holistically:</p>
<table><thead><tr><th>Criterion</th><th>What to evaluate</th></tr></thead><tbody><tr><td>Features</td><td>Instead of looking at the number of features, list down the ones you actually need for your team and evaluate based on that.</td></tr><tr><td>Cost</td><td>Incident Management is a key part of your business operations, so you also need to forecast future costs if your team or infrastructure is growing. Look out for hidden costs.</td></tr><tr><td>Reliability</td><td>Uptime and top-notch availability are a must for such a system that is crucial to your business operations.</td></tr><tr><td>Customer support</td><td>Your incident management systems' reliability needs to be top-notch. However, incidents happen, even in incident management software, so make sure they have great customer support.</td></tr><tr><td>Integration capabilities</td><td>Any incident management tool should be able to integrate well with your existing monitoring stack as well as with your communication and collaboration tools.</td></tr><tr><td>Reports</td><td>Metrics and analytics are invaluable for figuring out trends in your outages and where to focus on for improvement.</td></tr><tr><td>Flexibility in scheduling</td><td>Easy roster setup and overrides are a must.</td></tr><tr><td>Growth capabilities</td><td>If your team is growing, you need a tool that can scale with you.</td></tr><tr><td>Documentation</td><td>The tool should have comprehensive documentation and a knowledge base.</td></tr><tr><td>Ease of use</td><td>The tool should be easy to use and have an intuitive UI.</td></tr><tr><td>Data security</td><td>Apart from basic data security to protect your data, look at alignment with your regulatory requirements if any.</td></tr></tbody></table>
<p>Choose the tool that is right for you and your team.</p>
<p style="background-color:var(--ifm-color-emphasis-200);padding:10px;border-radius:10px;border-left:3px solid #60a5fa"></p><p>You might also like:</p><ul><li><a href="https://blog.incidenthub.cloud/The-No-Nonsense-Guide-to-Runbook-Best-Practices">The No-Nonsense Guide to Runbook Best Practices</a></li><li><a href="https://blog.incidenthub.cloud/The-Rising-Role-of-Slack-in-Incident-Management">The Rising Role of Slack in Incident Management</a></li><li><a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance">Best Practices for Planning for Upcoming Cloud Maintenance</a></li><li><a href="https://blog.incidenthub.cloud/A-Step-by-Step-Guide-to-Checking-if-a-SaaS-is-Down">A Step by Step Guide to Checking if a SaaS is Down</a></li></ul><p></p>
<hr>
<p>This article was originally published on the <a href="https://blog.incidenthub.cloud/the-ultimate-guide-to-incident-management-tools-in-2025" target="_blank" rel="noopener noreferrer" class="">IncidentHub blog</a>.</p>
<p>All product names, company names, logos and trademarks are the property of their respective owners.</p>
<p>Photo credits: <a href="https://unsplash.com/@jonathangallegos?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Jonathan Gallegos</a> on <a href="https://unsplash.com/photos/angle-view-of-white-painted-building-interior-nXw0-3l9G6Q?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Unsplash</a></p>]]></content:encoded>
            <category>Incident Management</category>
            <category>Incident Response</category>
        </item>
        <item>
            <title><![CDATA[Mistakes To Avoid With Your Public Status Page]]></title>
            <link>https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page</link>
            <guid>https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page</guid>
            <pubDate>Wed, 23 Jul 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn about the key mistakes to avoid while hosting a status page to maximize its usefulness and reliability.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="introduction">Introduction<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#introduction" class="hash-link" aria-label="Direct link to Introduction" title="Direct link to Introduction" translate="no">​</a></h2>
<p><em>Last updated on August 8, 2025.</em></p>
<p>A public status page forms the public face of your organization's service availability. It is the first point of contact for your customers to check the status of your services during times of crisis. Hence, ensuring the credibility and uptime of your public status page is crucial to your organization's reputation.</p>
<p>In this article we will look at the key mistakes to avoid while hosting and managing a public status page.</p>
<img style="border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/public-status-page.webp" alt="Public Status Page Example">
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#introduction" class="">Introduction</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#public-status-page---expectations" class="">Public Status Page - Expectations</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#uptime-mistakes-to-avoid" class="">Uptime Mistakes to Avoid</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#hosting-the-status-page-on-the-same-infrastructure" class="">Hosting the Status Page on the Same Infrastructure</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#sharing-a-dns-provider-with-your-primary-domain" class="">Sharing a DNS Provider With Your Primary Domain</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#provider-mistakes-to-avoid" class="">Provider Mistakes to Avoid</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#choosing-a-managed-provider-without-an-sla-and-support" class="">Choosing a Managed Provider Without an SLA and Support</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#choosing-a-provider-that-does-not-have-an-api" class="">Choosing a Provider That Does Not Have an API</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#credibility-mistakes-to-avoid" class="">Credibility Mistakes to Avoid</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#not-acknowledging-outages" class="">Not Acknowledging Outages</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#not-owning-up-to-mistakes" class="">Not Owning Up to Mistakes</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#omitting-status-page-updates-from-your-incident-management-strategy" class="">Omitting Status Page Updates from Your Incident Management Strategy</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#transparency-mistakes-to-avoid" class="">Transparency Mistakes to Avoid</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#not-publishing-meaningful-post-mortems" class="">Not Publishing Meaningful Post-Mortems</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#communication-mistakes-to-avoid" class="">Communication Mistakes to Avoid</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#being-vague-about-the-ongoing-status" class="">Being Vague About the Ongoing Status</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#stating-vague-timelines" class="">Stating Vague Timelines</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#not-having-an-easy-way-to-reach-your-support-team" class="">Not Having an Easy Way To Reach Your Support Team</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#actionability-mistakes-to-avoid" class="">Actionability Mistakes to Avoid</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#not-including-workarounds" class="">Not Including Workarounds</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#discoverability-mistakes-to-avoid" class="">Discoverability Mistakes to Avoid</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#leaving-out-the-link-to-your-status-page-from-your-website" class="">Leaving Out the Link to Your Status Page From Your Website</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#not-linking-the-status-page-from-your-social-media-channels" class="">Not Linking the Status Page From Your Social Media Channels</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#usability-mistakes-to-avoid" class="">Usability Mistakes to Avoid</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#not-paying-attention-to-the-design" class="">Not Paying Attention to the Design</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#not-paying-attention-to-accessibility" class="">Not Paying Attention to Accessibility</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#conclusion" class="">Conclusion</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#summary-table" class="">Summary Table</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="public-status-page---expectations">Public Status Page - Expectations<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#public-status-page---expectations" class="hash-link" aria-label="Direct link to Public Status Page - Expectations" title="Direct link to Public Status Page - Expectations" translate="no">​</a></h2>
<p>What do your customers and users expect from a public status page?</p>
<p>The important things are:</p>
<ol>
<li class=""><strong>Uptime</strong>. The status page should be up and running at all times.</li>
<li class=""><strong>Reliability</strong>. The status page should be hosted by a reliable provider if you are using a managed provider.</li>
<li class=""><strong>Credibility</strong>. The status page updates should be managed by your team and reflect the actual status of your services and have a history of being open about incidents and maintenance.</li>
<li class=""><strong>Transparency</strong>. The status page should be honest about the actual status of any ongoing incident or maintenance.</li>
<li class=""><strong>Communication</strong>. The status page should be updated with clear and concise information in a timely manner.</li>
<li class=""><strong>Actionability</strong>. The status page should provide guides and workarounds, if any, for customers.</li>
<li class=""><strong>Discoverability</strong>. Last but not least, users should be able to find the status page easily, either from your website home page, from your support portal, or from your social media channels.</li>
<li class=""><strong>Usability</strong>. The status page should be easy to use and navigate.</li>
</ol>
<p>To meet these expectations, you have to ensure that the status page's importance is appreciated across your organization.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="uptime-mistakes-to-avoid">Uptime Mistakes to Avoid<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#uptime-mistakes-to-avoid" class="hash-link" aria-label="Direct link to Uptime Mistakes to Avoid" title="Direct link to Uptime Mistakes to Avoid" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="hosting-the-status-page-on-the-same-infrastructure">Hosting the Status Page on the Same Infrastructure<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#hosting-the-status-page-on-the-same-infrastructure" class="hash-link" aria-label="Direct link to Hosting the Status Page on the Same Infrastructure" title="Direct link to Hosting the Status Page on the Same Infrastructure" translate="no">​</a></h3>
<p>Think about it - if the same Google Cloud zone that hosts your primary application hosts your status page, and that zone goes down, your status page will be down along with your applications. You would have no way
to update the status page. Cloud providers have zonal and regional isolation to limit the impact of outages. If you are hosting your status page yourself, use this fact and deploy your status page in a different zone or region.</p>
<p>However, if you are using a managed provider, this is usually out of your control. At best, you can check with the provider if they host their infrastructure in the same cloud provider as yours or in a different zone or region.
If you are using a managed provider, they will usually ensure that they have high availability for their customers' status pages.</p>
<img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://blog.incidenthub.cloud/img/heroku-june-10-outage-summary.webp" alt="Heroku June 10 outage summary">
<p style="text-align:center;font-size:12px;color:#666">Screenshot from <a rel="noopener noreferrer nofollow" target="_blank" href="https://www.heroku.com/blog/summary-of-june-10-outage/">Heroku's June 10 outage summary</a>.</p>
<img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://blog.incidenthub.cloud/img/google-cloud-post-mortem.webp" alt="Google Cloud post-mortem">
<p style="text-align:center;font-size:12px;color:#666">Screenshot from <a rel="noopener noreferrer nofollow" target="_blank" href="https://status.cloud.google.com/incidents/ow5i3PPK96RduMcb1SsW">Google Cloud post-mortem</a>.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="sharing-a-dns-provider-with-your-primary-domain">Sharing a DNS Provider With Your Primary Domain<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#sharing-a-dns-provider-with-your-primary-domain" class="hash-link" aria-label="Direct link to Sharing a DNS Provider With Your Primary Domain" title="Direct link to Sharing a DNS Provider With Your Primary Domain" translate="no">​</a></h3>
<p>Your status page would probably be hosted on something like <code>status.yourdomain.com</code> where <code>yourdomain.com</code> is your primary domain.
This leaves your status page vulnerable to outages in your DNS provider.</p>
<p>Strategies to follow here in order of increasing risk:</p>
<ol>
<li class="">Use a different registrar for your status page domain. This naturally implies a different domain. You also have to ensure that you choose a different DNS provider as well. You can configure your DNS provider's root servers in your registrar settings. Why a different registrar? See the Zoom outage example below.</li>
<li class="">Use a different domain with the same registrar but a different DNS provider.</li>
<li class="">(Not recommended) Host your status page on a subdomain of your primary domain.</li>
</ol>
<div style="background-color:var(--ifm-color-emphasis-200);padding-top:15px;padding-bottom:15px;padding-left:20px;padding-right:20px;margin-top:10px;border-radius:10px;border-left:3px solid #60a5fa"><b style="text-align:center"><p>Zoom Outage - April 16, 2025</p></b><span style="text-align:justify"><p>Zoom's status page is hosted at <code>https://status.zoom.us/</code>. On April 16th 2025, due to a communication error between Zoom's domain registrar and GoDaddy, the zoom.us domain was shut down.
The NS records were removed at the TLD level, resulting in DNS clients like browsers and native apps unable to resolve the zoom.us domain as well as any subdomains. As a result, their status page was also unavailable.
The outage lasted around 2 hours.</p></span></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="provider-mistakes-to-avoid">Provider Mistakes to Avoid<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#provider-mistakes-to-avoid" class="hash-link" aria-label="Direct link to Provider Mistakes to Avoid" title="Direct link to Provider Mistakes to Avoid" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="choosing-a-managed-provider-without-an-sla-and-support">Choosing a Managed Provider Without an SLA and Support<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#choosing-a-managed-provider-without-an-sla-and-support" class="hash-link" aria-label="Direct link to Choosing a Managed Provider Without an SLA and Support" title="Direct link to Choosing a Managed Provider Without an SLA and Support" translate="no">​</a></h3>
<p>A status page provider is just like any other service provider like your cloud vendor or payment gateway. Before you choose a provider, make sure they have an acceptable SLA. Look at online reviews and past outage reports
to get an idea of their reliability.</p>
<p>The ability to reach their support team easily is also important. If the provider has multiple support channels, make sure they are reachable. Add their support contacts to your incident response playbook.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="choosing-a-provider-that-does-not-have-an-api">Choosing a Provider That Does Not Have an API<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#choosing-a-provider-that-does-not-have-an-api" class="hash-link" aria-label="Direct link to Choosing a Provider That Does Not Have an API" title="Direct link to Choosing a Provider That Does Not Have an API" translate="no">​</a></h3>
<p>Your uptime monitoring systems will integrate with your status page provider to automatically update the status page about availability trends. If the provider does not have an API, you will have to manually update it.
It can quickly become a bottleneck. Manually updating the status page for uptime information is not a good use of your time.</p>
<p>However, during incidents, you will want to post manual updates, as they are based on the results of human investigation. A provider that has APIs as well as a way to manually post incidents is a good choice.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="credibility-mistakes-to-avoid">Credibility Mistakes to Avoid<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#credibility-mistakes-to-avoid" class="hash-link" aria-label="Direct link to Credibility Mistakes to Avoid" title="Direct link to Credibility Mistakes to Avoid" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="not-acknowledging-outages">Not Acknowledging Outages<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#not-acknowledging-outages" class="hash-link" aria-label="Direct link to Not Acknowledging Outages" title="Direct link to Not Acknowledging Outages" translate="no">​</a></h3>
<p>When you become aware of an outage, the first step is to acknowledge it. The outage could be detected either by your own monitoring systems or in the worst case, by your customers.
Update the status page to reflect the situation. At the outset, your team might not have all the information, which is fine. Accept the fact and post periodic progress updates as you get more information.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="not-owning-up-to-mistakes">Not Owning Up to Mistakes<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#not-owning-up-to-mistakes" class="hash-link" aria-label="Direct link to Not Owning Up to Mistakes" title="Direct link to Not Owning Up to Mistakes" translate="no">​</a></h3>
<p>Once you have mitigated the outage and systems are up and running, your team will prepare a post-mortem report. The outage could have been caused by an outage in your cloud vendor, or a bug in your code, or other dependencies, or a human error.
Accepting mistakes and addressing them publicly is a good way to build credibility and trust.</p>
<div style="background-color:var(--ifm-color-emphasis-200);padding-top:15px;padding-bottom:15px;padding-left:20px;padding-right:20px;margin-top:10px;border-radius:10px;border-left:3px solid #60a5fa"><b style="text-align:center"><p>The Service Recovery Paradox</p></b><span style="text-align:justify"><p>The Service Recovery Paradox suggests that a company can build greater trust by publicly acknowledging and effectively addressing them.
The term was coined in 1992 by Michael A. McCollough and Sundar G. Bharadwaj. They described a situation where, after a failure was addressed satisfactorily, customers expressed greater trust and loyalty than before the failure.</p></span></div>
<p><br>
<!-- -->For an example of a detailed post-mortem, see <a rel="noopener noreferrer nofollow" target="_blank" href="https://status.cloud.google.com/incident/cloud-networking/19009">Google Cloud Networking Incident #19009</a>.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="omitting-status-page-updates-from-your-incident-management-strategy">Omitting Status Page Updates from Your Incident Management Strategy<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#omitting-status-page-updates-from-your-incident-management-strategy" class="hash-link" aria-label="Direct link to Omitting Status Page Updates from Your Incident Management Strategy" title="Direct link to Omitting Status Page Updates from Your Incident Management Strategy" translate="no">​</a></h3>
<p>Your incident management strategy should include updating your status page. During an incident, your team will be busy investigating the incident and internal communications. It is easy to forget that your customers,
and not just the ones reaching out to your support directly, need to be informed. Adopting status page updates as part of your process will ensure that it is done.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="transparency-mistakes-to-avoid">Transparency Mistakes to Avoid<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#transparency-mistakes-to-avoid" class="hash-link" aria-label="Direct link to Transparency Mistakes to Avoid" title="Direct link to Transparency Mistakes to Avoid" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="not-publishing-meaningful-post-mortems">Not Publishing Meaningful Post-Mortems<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#not-publishing-meaningful-post-mortems" class="hash-link" aria-label="Direct link to Not Publishing Meaningful Post-Mortems" title="Direct link to Not Publishing Meaningful Post-Mortems" translate="no">​</a></h3>
<p>A post-mortem is a detailed analysis of an incident. It lays out the root causes, lessons learned, and recommendations for improvement.
Your internal post-mortem report will have a lot of internal information that you might not be able to share. However, it is absolutely necessary to publish a public post-mortem which includes
as much detail as possible without revealing sensitive information.</p>
<p>You can post the post-mortem on your status page as a follow-up to the incident. You can also publish it on your blog or social media channels.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="communication-mistakes-to-avoid">Communication Mistakes to Avoid<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#communication-mistakes-to-avoid" class="hash-link" aria-label="Direct link to Communication Mistakes to Avoid" title="Direct link to Communication Mistakes to Avoid" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="being-vague-about-the-ongoing-status">Being Vague About the Ongoing Status<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#being-vague-about-the-ongoing-status" class="hash-link" aria-label="Direct link to Being Vague About the Ongoing Status" title="Direct link to Being Vague About the Ongoing Status" translate="no">​</a></h3>
<ul>
<li class="">"There is an incident and we are looking into it." (We know that already. When will you update us next?)</li>
<li class="">"Some of our customers are experiencing issues." (Which systems are affected so that I can check if I am affected too?)</li>
</ul>
<p>These do not convey much useful information to your customers.</p>
<p>Initially, your team might not be aware of which systems are affected and the impact of the outage. This is totally fine. As your investigation progresses, you can update the status page with more information.
The first update you post will not have much concrete information, but you can still say that you will post the next update within a specific period of time.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="stating-vague-timelines">Stating Vague Timelines<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#stating-vague-timelines" class="hash-link" aria-label="Direct link to Stating Vague Timelines" title="Direct link to Stating Vague Timelines" translate="no">​</a></h3>
<p>Not so good updates:</p>
<ul>
<li class="">"We are working on it."</li>
<li class="">"We will update you as soon as possible."</li>
</ul>
<p>A better update:</p>
<ul>
<li class="">"We are working on it and we will post the next update in 30 minutes."</li>
</ul>
<p>After 30 minutes, even if you have not made much progress, post an update. Your customers will appreciate the fact that you are not just working to fix the issue but are also communicating with them.</p>
<img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://blog.incidenthub.cloud/img/provide-next-update-when.webp" alt="Provide next update when">
<p style="text-align:center;font-size:12px;color:#666">Screenshot from an incident update.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="not-having-an-easy-way-to-reach-your-support-team">Not Having an Easy Way To Reach Your Support Team<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#not-having-an-easy-way-to-reach-your-support-team" class="hash-link" aria-label="Direct link to Not Having an Easy Way To Reach Your Support Team" title="Direct link to Not Having an Easy Way To Reach Your Support Team" translate="no">​</a></h3>
<p>Your support team might be on one or more of these:</p>
<ul>
<li class="">Helpdesk software</li>
<li class="">Email</li>
<li class="">Chat</li>
<li class="">Social Media</li>
<li class="">Phone</li>
</ul>
<p>Link to your support team from your status page, and make sure your support portal is separately hosted or managed by an external provider.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="actionability-mistakes-to-avoid">Actionability Mistakes to Avoid<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#actionability-mistakes-to-avoid" class="hash-link" aria-label="Direct link to Actionability Mistakes to Avoid" title="Direct link to Actionability Mistakes to Avoid" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="not-including-workarounds">Not Including Workarounds<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#not-including-workarounds" class="hash-link" aria-label="Direct link to Not Including Workarounds" title="Direct link to Not Including Workarounds" translate="no">​</a></h3>
<p>If your customers are affected by the outage, share any known workarounds or alternatives, if any.</p>
<img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://blog.incidenthub.cloud/img/hetzner-outage-workaround.webp" alt="Hetzner outage workaround">
<p style="text-align:center;font-size:12px;color:#666">Screenshot from a workaround from <a rel="noopener noreferrer nofollow" target="_blank" href="https://status.hetzner.com/incident/579034f0-194d-4b44-bc0a-cdac41abd753">Hetzner's status page</a>.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="discoverability-mistakes-to-avoid">Discoverability Mistakes to Avoid<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#discoverability-mistakes-to-avoid" class="hash-link" aria-label="Direct link to Discoverability Mistakes to Avoid" title="Direct link to Discoverability Mistakes to Avoid" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="leaving-out-the-link-to-your-status-page-from-your-website">Leaving Out the Link to Your Status Page From Your Website<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#leaving-out-the-link-to-your-status-page-from-your-website" class="hash-link" aria-label="Direct link to Leaving Out the Link to Your Status Page From Your Website" title="Direct link to Leaving Out the Link to Your Status Page From Your Website" translate="no">​</a></h3>
<p>Link to your status page from your website and your support portal. Most services will have a link to their status page on their website footer.</p>
<img style="box-shadow:2px 2px 5px rgba(0, 0, 0, 0.1);padding:10px;border:1px solid #e0e0e0;border-radius:10px" src="https://blog.incidenthub.cloud/img/status-page-in-footer.webp" alt="Example of Status Page Link in Footer">
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="not-linking-the-status-page-from-your-social-media-channels">Not Linking the Status Page From Your Social Media Channels<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#not-linking-the-status-page-from-your-social-media-channels" class="hash-link" aria-label="Direct link to Not Linking the Status Page From Your Social Media Channels" title="Direct link to Not Linking the Status Page From Your Social Media Channels" translate="no">​</a></h3>
<p>Link to your status page from your social media channels. When there is an outage, people might visit your social media outlets like X looking for updates about it.
Making it easy to find your status page will also decrease the load on your support team.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="usability-mistakes-to-avoid">Usability Mistakes to Avoid<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#usability-mistakes-to-avoid" class="hash-link" aria-label="Direct link to Usability Mistakes to Avoid" title="Direct link to Usability Mistakes to Avoid" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="not-paying-attention-to-the-design">Not Paying Attention to the Design<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#not-paying-attention-to-the-design" class="hash-link" aria-label="Direct link to Not Paying Attention to the Design" title="Direct link to Not Paying Attention to the Design" translate="no">​</a></h3>
<p>Your status page need not be a work of art fit to be hung in a museum, but it should be easy to use, navigate, and read. Users are going to visit your status page repeatedly, so making it easy to use is important.
Invest in a good, pleasant design, with the most important information at the top. A status page needs to communicate the overall status first, and then let users dig deeper into the details.</p>
<p>Many teams display their status pages in large screens in their offices and NOC (Network Operations Center) rooms. Give users the option to choose between a light and dark theme.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="not-paying-attention-to-accessibility">Not Paying Attention to Accessibility<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#not-paying-attention-to-accessibility" class="hash-link" aria-label="Direct link to Not Paying Attention to Accessibility" title="Direct link to Not Paying Attention to Accessibility" translate="no">​</a></h3>
<p>Accessibility is important for all users, but it is especially important for users with disabilities. Pay attention to the color contrast, font size, and other accessibility features. Use an online accessibility checker to ensure your status page is compliant with standards.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="conclusion">Conclusion<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>A public status page is a crucial part of your organization's public presence. Ensuring it is up and running and that it is managed well is important to your organization's reputation. An updated and reliable status page is one of the key ways
to build trust with your users and customers.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="summary-table">Summary Table<a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page#summary-table" class="hash-link" aria-label="Direct link to Summary Table" title="Direct link to Summary Table" translate="no">​</a></h2>
<p>Here's a comprehensive table of positive recommendations organized by category to help you create an effective public status page:</p>
<table><thead><tr><th><strong>Positive Recommendation</strong></th><th><strong>Category</strong></th></tr></thead><tbody><tr><td>Host status page on separate infrastructure from your main application</td><td>Uptime</td></tr><tr><td>Use different DNS provider and registrar for status page domain</td><td>Uptime</td></tr><tr><td>Choose a managed provider with strong SLA and support channels</td><td>Provider</td></tr><tr><td>Select a provider with comprehensive API for automation</td><td>Provider</td></tr><tr><td>Acknowledge outages immediately when detected</td><td>Credibility</td></tr><tr><td>Accept and publicly address mistakes in post-mortems</td><td>Credibility</td></tr><tr><td>Include status page updates in your incident management strategy</td><td>Credibility</td></tr><tr><td>Publish detailed, meaningful post-mortems</td><td>Transparency</td></tr><tr><td>Provide specific, actionable status updates</td><td>Communication</td></tr><tr><td>Set clear timelines for next updates</td><td>Communication</td></tr><tr><td>Include easy access to support team contact information</td><td>Communication</td></tr><tr><td>Share known workarounds and alternatives during outages</td><td>Actionability</td></tr><tr><td>Link status page from your main website footer</td><td>Discoverability</td></tr><tr><td>Include status page links on social media channels</td><td>Discoverability</td></tr><tr><td>Invest in clean, user-friendly design with light/dark themes</td><td>Usability</td></tr><tr><td>Ensure accessibility compliance (color contrast, font size, etc.)</td><td>Usability</td></tr></tbody></table>
<div style="background-color:var(--ifm-color-emphasis-100);padding-top:15px;padding-bottom:5px;padding-left:20px;padding-right:20px;margin-top:10px;border-radius:10px"><b style="text-align:center"><p>Trouble keeping up with dozens of status pages?</p></b><span style="text-align:center"><p>Track all your third-party service statuses on a single status page with <a href="https://incidenthub.cloud/">IncidentHub</a>.</p></span></div>
<hr>
<p>IncidentHub is not affiliated with any of the services and vendors mentioned in this article.</p>
<p>This article first appeared on the <a href="https://blog.incidenthub.cloud/mistakes-to-avoid-with-your-public-status-page" target="_blank" rel="noopener noreferrer" class="">IncidentHub blog</a>.</p>]]></content:encoded>
            <category>Status Pages</category>
        </item>
        <item>
            <title><![CDATA[Best Practices for Planning for Upcoming Cloud Maintenance]]></title>
            <link>https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance</link>
            <guid>https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance</guid>
            <pubDate>Sat, 05 Jul 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn how you can plan for upcoming cloud maintenance to avoid potential downtime and keep your team informed.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="introduction">Introduction<a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#introduction" class="hash-link" aria-label="Direct link to Introduction" title="Direct link to Introduction" translate="no">​</a></h2>
<p>Cloud maintenance is a common practice in the tech industry. Whether you manage your own infrastructure or use a cloud provider, you will need to plan for maintenance and include it as part of your operational readiness.
This ensures that your team is prepared for potential downtime and can deal with any incidents in a timely manner. This article will cover some best practices for planning for upcoming cloud maintenance.</p>
<img style="border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/upcoming-maintenance.webp" alt="IncidentHub Public Status Pages">
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#introduction" class="">Introduction</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#types-of-maintenance" class="">Types of Maintenance</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#scheduled-maintenance" class="">Scheduled Maintenance</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#emergency-maintenance" class="">Emergency Maintenance</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#maintenance-planning" class="">Maintenance Planning</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#impact-assessment" class="">Impact Assessment</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#operational-readiness" class="">Operational Readiness</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#during-the-maintenance-window" class="">During the Maintenance Window</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#communication" class="">Communication</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#dealing-with-unexpected-issues" class="">Dealing With Unexpected Issues</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#tracking-upcoming-maintenance" class="">Tracking Upcoming Maintenance</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#maintenance-notifications" class="">Maintenance Notifications</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#email-alerts" class="">Email Alerts</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#dashboard-push-notifications" class="">Dashboard Push Notifications</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#use-a-status-page-monitoraggregator-like-incidenthub" class="">Use a Status Page Monitor/Aggregator like IncidentHub</a></li>
</ul>
</li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#conclusion" class="">Conclusion</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="types-of-maintenance">Types of Maintenance<a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#types-of-maintenance" class="hash-link" aria-label="Direct link to Types of Maintenance" title="Direct link to Types of Maintenance" translate="no">​</a></h2>
<p>Based on advance preparedness, there are two types of maintenance - scheduled and emergency. It is important to understand that a maintenance need not always cause downtime. However,
being prepared for downtime is one of the key aspects of maintenance planning.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="scheduled-maintenance">Scheduled Maintenance<a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#scheduled-maintenance" class="hash-link" aria-label="Direct link to Scheduled Maintenance" title="Direct link to Scheduled Maintenance" translate="no">​</a></h3>
<p>A scheduled maintenance is a planned maintenance that is announced in advance, sometimes days or weeks or even months ahead.
Scheduled maintenance announcements give sufficient opportunity to plan for any downtime.</p>
<img style="border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/scheduled-maintenance.webp" alt="Scheduled Maintenance">
<p>Scheduled maintenances can be modified or cancelled. Some cloud providers give you a way to reschedule or control the maintenance window
if it affects only your resources (and not other customers'). You can leverage this to your advantage to minimize the impact of the maintenance.</p>
<p>Some examples:</p>
<ol>
<li class="">
<p><a href="https://incidenthub.cloud/status/amazonwebservices" target="_blank" rel="noopener noreferrer" class="">Amazon Web Services EC2</a> - AWS EC2 maintenance events involve starting and stopping the instances. You can schedule the maintenance during off-peak hours to minimize the impact. You can also trigger the start/stop yourself at a chosen time before the scheduled window.</p>
</li>
<li class="">
<p><a href="https://incidenthub.cloud/status/render" target="_blank" rel="noopener noreferrer" class="">Render</a> - Render's scheduled maintenance can be rescheduled to a different time, and you can also choose to trigger it at a time of your choosing.</p>
</li>
</ol>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="emergency-maintenance">Emergency Maintenance<a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#emergency-maintenance" class="hash-link" aria-label="Direct link to Emergency Maintenance" title="Direct link to Emergency Maintenance" translate="no">​</a></h3>
<p>An emergency maintenance is not planned and is triggered as a way to mitigate a critical issue.
Users will still be notified but they may not have enough time to plan for it.</p>
<img style="border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/emergency-maintenance.webp" alt="Emergency Maintenance">
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="maintenance-planning">Maintenance Planning<a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#maintenance-planning" class="hash-link" aria-label="Direct link to Maintenance Planning" title="Direct link to Maintenance Planning" translate="no">​</a></h2>
<p>Planning for maintenance events in advance is an important part of your incident management strategy. The advantage your team has for maintenance events as opposed to incidents is that you can plan for them
beforehand, and assess and mitigate any possible impact.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="impact-assessment">Impact Assessment<a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#impact-assessment" class="hash-link" aria-label="Direct link to Impact Assessment" title="Direct link to Impact Assessment" translate="no">​</a></h3>
<p>A maintenance announcement will have at least these details:</p>
<ul>
<li class="">The expected start and end time of the maintenance - the "maintenance window".</li>
<li class="">The cloud services affected.</li>
</ul>
<p>Based on the services affected, your team can determine if there will be any impact on your own applications or users.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="operational-readiness">Operational Readiness<a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#operational-readiness" class="hash-link" aria-label="Direct link to Operational Readiness" title="Direct link to Operational Readiness" translate="no">​</a></h3>
<p>If there will be impact to your own applications or users, you can plan for it by:</p>
<ul>
<li class="">Identifying the applications and services that will be affected. If it's a cloud service that your applications or services depend on, inform the affected teams so that they have workarounds in place. These can be measures like keeping standby servers in a different region as a fallback, or actively routing client traffic to a different region in your load balancer.</li>
<li class="">Identifying the impact on your users. If it's a SaaS service like a communication suite, or an office productivity suite, informing your users in an org-wide channel lets them plan their work. They can ensure that no critical work is scheduled during the maintenance window.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="during-the-maintenance-window">During the Maintenance Window<a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#during-the-maintenance-window" class="hash-link" aria-label="Direct link to During the Maintenance Window" title="Direct link to During the Maintenance Window" translate="no">​</a></h2>
<p>During an ongoing maintenance, these are the important things to keep in mind:</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="communication">Communication<a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#communication" class="hash-link" aria-label="Direct link to Communication" title="Direct link to Communication" translate="no">​</a></h3>
<p>Keep your team informed about the status of the maintenance. This can be done using your existing communication tools like Slack or MS Teams. Another option is to use a status page that is accessible to everyone in your organization and provides real-time updates.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="dealing-with-unexpected-issues">Dealing With Unexpected Issues<a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#dealing-with-unexpected-issues" class="hash-link" aria-label="Direct link to Dealing With Unexpected Issues" title="Direct link to Dealing With Unexpected Issues" translate="no">​</a></h3>
<p>You can run into unexpected pitfalls due to various reasons:</p>
<ul>
<li class=""><strong>The maintenance window was longer than expected</strong> In such cases, the affected cloud resources may be affected for a longer period of time. For your own applications, the teams have to continue with their original mitigation plan and adjust as needed. For SaaS applications, inform your users as soon as possible so that they can plan too.</li>
<li class=""><strong>The maintenance activity affected resources other than the ones that were announced</strong> This is rare but possible. In such cases track the cloud provider's status page and get in touch with their support. If you have a support contract with them, get in touch with your support representative.</li>
<li class=""><strong>Your team missed planning for one or more of the applications or services that will be affected</strong> You have to treat this like any other incident and put your incident response plan into action. It's also an opportunity to improve your incident management process.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="tracking-upcoming-maintenance">Tracking Upcoming Maintenance<a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#tracking-upcoming-maintenance" class="hash-link" aria-label="Direct link to Tracking Upcoming Maintenance" title="Direct link to Tracking Upcoming Maintenance" translate="no">​</a></h2>
<p>All cloud and SaaS services announce maintenance beforehand. You can track these announcements easily by using a status page monitor/aggregator like IncidentHub.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="maintenance-notifications">Maintenance Notifications<a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#maintenance-notifications" class="hash-link" aria-label="Direct link to Maintenance Notifications" title="Direct link to Maintenance Notifications" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="email-alerts">Email Alerts<a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#email-alerts" class="hash-link" aria-label="Direct link to Email Alerts" title="Direct link to Email Alerts" translate="no">​</a></h4>
<p>You can receive email notifications by signing up on the cloud or SaaS provider's status page. However, this is cumbersome if there are too many status pages. Not all status pages offer this feature. It's also difficult to inform users in real time with this approach.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="dashboard-push-notifications">Dashboard Push Notifications<a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#dashboard-push-notifications" class="hash-link" aria-label="Direct link to Dashboard Push Notifications" title="Direct link to Dashboard Push Notifications" translate="no">​</a></h4>
<p>Some cloud providers show you notifications on their dashboard. However, you need to be logged in and keep the browser tab open to see them. It's also not easy to track all your services easily, or communicate such updates to your team.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="use-a-status-page-monitoraggregator-like-incidenthub">Use a Status Page Monitor/Aggregator like IncidentHub<a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#use-a-status-page-monitoraggregator-like-incidenthub" class="hash-link" aria-label="Direct link to Use a Status Page Monitor/Aggregator like IncidentHub" title="Direct link to Use a Status Page Monitor/Aggregator like IncidentHub" translate="no">​</a></h4>
<p>IncidentHub tracks and shows you a maintenance feed of all upcoming and scheduled maintenances across your services.</p>
<img style="border:1px solid #e0e0e0;border-radius:10px;width:70%" src="https://cdn.incidenthub.cloud/blog/maintenance-feed.webp" alt="IncidentHub Maintenance Feed">
<p>You can also set advance reminders and customize when you wish to receive them.</p>
<img style="border:1px solid #e0e0e0;border-radius:10px;width:50%" src="https://cdn.incidenthub.cloud/blog/maintenance-reminders.webp" alt="IncidentHub Maintenance Reminder">
<p style="background-color:var(--ifm-color-emphasis-100);padding-top:10px;padding-bottom:1px;margin-top:20px;text-align:center;border-radius:10px"></p><p>Sign up for an <a href="https://incidenthub.cloud/" target="_blank" rel="noopener noreferrer" class="">IncidentHub</a> account to track your cloud and SaaS service maintenances in one place</p><p></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="conclusion">Conclusion<a href="https://blog.incidenthub.cloud/planning-for-upcoming-cloud-maintenance#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>Maintenance tracking is an important part of your incident management process. It helps you stay informed about the status of your services and plan for any potential downtime, and lets your users plan their work in a better way.</p>
<p>Photo Credits: <a href="https://unsplash.com/@_ivann?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Ivan N</a> on <a href="https://unsplash.com/photos/a-bunch-of-wires-and-wires-in-a-room-AfStyhXC5kM?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Unsplash</a></p>
<p><em>IncidentHub is not affiliated with any of the services and vendors mentioned in this article.</em></p>]]></content:encoded>
            <category>Incident Management</category>
        </item>
        <item>
            <title><![CDATA[Product Update - Public Status Pages]]></title>
            <link>https://blog.incidenthub.cloud/product-update-public-status-pages</link>
            <guid>https://blog.incidenthub.cloud/product-update-public-status-pages</guid>
            <pubDate>Thu, 01 May 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn how you can set up your public status page with IncidentHub to share the status of your third-party services with your users.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="introduction">Introduction<a href="https://blog.incidenthub.cloud/product-update-public-status-pages#introduction" class="hash-link" aria-label="Direct link to Introduction" title="Direct link to Introduction" translate="no">​</a></h2>
<p>A public status page is a page that you can share with your team to show a summary view of your third-party dependencies. We rolled out this feature recently to all IncidentHub users. This
article is a quick tour of the feature and how to set it up.</p>
<img style="border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/public-status-page-hero.png" alt="IncidentHub Public Status Pages">
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-public-status-pages#introduction" class="">Introduction</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-public-status-pages#the-value-of-a-public-status-page" class="">The Value of a Public Status Page</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-public-status-pages#setting-up-a-public-status-page" class="">Setting Up a Public Status Page</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-public-status-pages#whats-next" class="">What's Next?</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-public-status-pages#conclusion" class="">Conclusion</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-value-of-a-public-status-page">The Value of a Public Status Page<a href="https://blog.incidenthub.cloud/product-update-public-status-pages#the-value-of-a-public-status-page" class="hash-link" aria-label="Direct link to The Value of a Public Status Page" title="Direct link to The Value of a Public Status Page" translate="no">​</a></h2>
<p>A public status page can be used to share your third-party service status with your team members or with your organization as a whole. During outages this page should be the first stop to check the status of your external services.
For deeper analysis there is always the IncidentHub dashboard.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="setting-up-a-public-status-page">Setting Up a Public Status Page<a href="https://blog.incidenthub.cloud/product-update-public-status-pages#setting-up-a-public-status-page" class="hash-link" aria-label="Direct link to Setting Up a Public Status Page" title="Direct link to Setting Up a Public Status Page" translate="no">​</a></h2>
<p>When you create an IncidentHub account, you get a public status page for free. You can customize the page with your logo and other branding options.
Login to your account and click on the "Status Page" button on the navigation bar on the left.</p>
<img style="height:50%;width:50%;border:1px solid #e0e0e0;border-radius:10px" src="https://blog.incidenthub.cloud/img/dashboard-navbar.png" alt="IncidentHub Public Status Pages">
<p><br>
<!-- -->On the configuration page, you can fill in the details of your status page like title, name of your company, URL, support email, and logo.</p>
<img style="border:1px solid #e0e0e0;border-radius:10px" src="https://blog.incidenthub.cloud/img/public-status-page-config.png" alt="IncidentHub Public Status Pages">
<p><br>
<!-- -->IncidentHub automatically creates a dedicated subdomain for your status page. You can copy the URL to this from the bottom of the <a href="https://incidenthub.cloud/statuspage" target="_blank" rel="noopener noreferrer" class="">configuration page</a>.</p>
<p>A customized status page will look like this:</p>
<img style="border:1px solid #e0e0e0;border-radius:10px" src="https://blog.incidenthub.cloud/img/public-status-page-example.png" alt="IncidentHub Public Status Pages">
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="whats-next">What's Next?<a href="https://blog.incidenthub.cloud/product-update-public-status-pages#whats-next" class="hash-link" aria-label="Direct link to What's Next?" title="Direct link to What's Next?" translate="no">​</a></h2>
<p>We are working on making the public status page more useful with options for whitelabeling. Here's a sneak peek of what's coming:</p>
<ul>
<li class="">Theme customization</li>
<li class="">Custom domains</li>
<li class="">Password protection</li>
<li class="">Subscription options</li>
<li class="">Affected components view</li>
</ul>
<p>Have a feature request? <a href="mailto:support@incidenthub.cloud" target="_blank" rel="noopener noreferrer" class="">Let us know</a>.</p>
<p style="background-color:var(--ifm-color-emphasis-100);padding-top:10px;padding-bottom:2px;margin-top:20px;text-align:center;border-radius:10px"></p><p>Watch our video tutorial on how to set up a public status page</p><iframe style="border:1px solid #e0e0e0;border-radius:10px;width:100%;height:315px;margin:0 auto" src="https://www.youtube.com/embed/2auQnYW215M" frameborder="0" allow="autoplay; encrypted-media" title="IncidentHub Public Status Page Setup Tutorial"></iframe><p></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="conclusion">Conclusion<a href="https://blog.incidenthub.cloud/product-update-public-status-pages#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>IncidentHub's public status page is a great way to keep your team informed about the status of your external services. Use it to improve your incident response.</p>
<p>You can sign up for a free (forever) account at <a href="https://incidenthub.cloud/#pricing" target="_blank" rel="noopener noreferrer" class="">IncidentHub</a>.</p>
<p><em>IncidentHub is not affiliated with any of the services and vendors mentioned in this article.</em></p>]]></content:encoded>
            <category>Status Pages</category>
            <category>Product</category>
        </item>
        <item>
            <title><![CDATA[How to Fine Tune Your IncidentHub Alerts]]></title>
            <link>https://blog.incidenthub.cloud/how-to-fine-tune-your-incidenthub-alerts</link>
            <guid>https://blog.incidenthub.cloud/how-to-fine-tune-your-incidenthub-alerts</guid>
            <pubDate>Tue, 08 Apr 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn how you can configure IncidentHub to send only relevant alerts for your third-party services using component filtering and alert lifecycle configuration.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="introduction">Introduction<a href="https://blog.incidenthub.cloud/how-to-fine-tune-your-incidenthub-alerts#introduction" class="hash-link" aria-label="Direct link to Introduction" title="Direct link to Introduction" translate="no">​</a></h2>
<p>IncidentHub can send outage alerts to many external systems. You can choose from Slack, Webhook, Email, Discord, PagerDuty, and more. Alerts are effective only when they are
relevant and actionable. In this article, we will explore how to fine-tune your IncidentHub alerts to receive only the relevant ones for your third-party services.</p>
<img style="border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/fine-tune-alerts-2048.png" alt="Fine-tuning your IncidentHub alerts">
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-fine-tune-your-incidenthub-alerts#introduction" class="">Introduction</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-fine-tune-your-incidenthub-alerts#what-are-relevant-and-actionable-alerts" class="">What Are Relevant and Actionable Alerts?</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-fine-tune-your-incidenthub-alerts#fine-tuning-your-incidenthub-alerts" class="">Fine-Tuning Your IncidentHub Alerts</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-fine-tune-your-incidenthub-alerts#component-filtering" class="">Component Filtering</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-fine-tune-your-incidenthub-alerts#component-auto-detection" class="">Component Auto-Detection</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-fine-tune-your-incidenthub-alerts#alert-notifications--beginning-end-or-everything" class="">Alert Notifications – Beginning, End, or Everything?</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-fine-tune-your-incidenthub-alerts#best-practices" class="">Best Practices</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-fine-tune-your-incidenthub-alerts#conclusion" class="">Conclusion</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-fine-tune-your-incidenthub-alerts#references" class="">References</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-fine-tune-your-incidenthub-alerts#faq" class="">FAQ</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-are-relevant-and-actionable-alerts">What Are Relevant and Actionable Alerts?<a href="https://blog.incidenthub.cloud/how-to-fine-tune-your-incidenthub-alerts#what-are-relevant-and-actionable-alerts" class="hash-link" aria-label="Direct link to What Are Relevant and Actionable Alerts?" title="Direct link to What Are Relevant and Actionable Alerts?" translate="no">​</a></h2>
<p>For third-party services, an alert is relevant if it directly affects your business in some way. The third-party could be a cloud service, a SaaS application, or a payment gateway. An actionable alert is one that you can do something about.
To keep your alerts relevant and actionable, you can use this checklist:</p>
<ul>
<li class="">Your applications/business should directly use the service or product that the alert originates from. For example, if you use Google Kubernetes Engine (GKE) in the us-central1 region, an alert from GKE in the us-central1 region is relevant. Alerts from GKE in other regions, or from other Google Cloud products, are not relevant.</li>
<li class="">Some third-party services are more critical than others for your business. You would want to closely follow every outage update until it is resolved in a business-critical service. For not so critical services, it is okay to receive alerts only when the outage start and ends.</li>
</ul>
<p>Irrelevant alerts just add to your already crowded list of notifications and cause alert fatigue.
Let's see how to use IncidentHub's fine-tuning features to keep your alerts relevant.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="fine-tuning-your-incidenthub-alerts">Fine-Tuning Your IncidentHub Alerts<a href="https://blog.incidenthub.cloud/how-to-fine-tune-your-incidenthub-alerts#fine-tuning-your-incidenthub-alerts" class="hash-link" aria-label="Direct link to Fine-Tuning Your IncidentHub Alerts" title="Direct link to Fine-Tuning Your IncidentHub Alerts" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="component-filtering">Component Filtering<a href="https://blog.incidenthub.cloud/how-to-fine-tune-your-incidenthub-alerts#component-filtering" class="hash-link" aria-label="Direct link to Component Filtering" title="Direct link to Component Filtering" translate="no">​</a></h3>
<p>Global services like Google Cloud and Microsoft Azure have many constituent services or components, and they are spread across different regions and zones.
Your business most likely uses only a subset of these components. Here is how to filter alerts from the components you are interested in.</p>
<p>When adding a new service, you can select "Monitor Specific Components" and choose the ones you need. Some components might have sub-components.</p>
<img style="border:1px solid #e0e0e0;border-radius:10px" src="https://blog.incidenthub.cloud/img/monitor-specific-components.png" alt="Monitor Specific Components">
<p><br>
<!-- -->In the example above, if you choose "App Platform" -&gt; Amsterdam, London, and New York, you will receive alerts only when there is an outage in Amsterdam, London, or New York for App Platform.</p>
<p>You can add change the monitored components for a service at any time by clicking on "Edit" next to the service.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="component-auto-detection">Component Auto-Detection<a href="https://blog.incidenthub.cloud/how-to-fine-tune-your-incidenthub-alerts#component-auto-detection" class="hash-link" aria-label="Direct link to Component Auto-Detection" title="Direct link to Component Auto-Detection" translate="no">​</a></h3>
<p>Some services have a large number of components. Choosing the components manually is a chore, and there is also the risk of missing something that you forgot about. To mitigate this, IncidentHub can auto-detect the components for you using the invoice for your cloud provider uploaded as a CSV file.</p>
<img style="border:1px solid #e0e0e0;border-radius:10px" src="https://blog.incidenthub.cloud/img/autodetected-components-gcp.png" alt="Auto-Detect Components">
<p><br>
<!-- -->We don't store your billing data. It is only used during the auto-detection process. You can also remove the financial information from the CSV file before uploading it - IncidentHub needs only the service names.</p>
<p>As of this writing, this feature is in beta and available for Google Cloud Platform only. We are rolling it out to other services soon.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="alert-notifications--beginning-end-or-everything">Alert Notifications – Beginning, End, or Everything?<a href="https://blog.incidenthub.cloud/how-to-fine-tune-your-incidenthub-alerts#alert-notifications--beginning-end-or-everything" class="hash-link" aria-label="Direct link to Alert Notifications – Beginning, End, or Everything?" title="Direct link to Alert Notifications – Beginning, End, or Everything?" translate="no">​</a></h3>
<p>An outage goes through multiple stages - beginning, in progress, and resolved. The middle stage has one or more, sometimes many, updates - sent out by the service provider's team as they investigate and mitigate the issue.
Depending on how critical the service is for your business, you might want to be notified only when the outage starts, or when it ends, or for everything.</p>
<p>You can fine tune this behavior for each monitored service.</p>
<img style="border:1px solid #e0e0e0;border-radius:10px" src="https://blog.incidenthub.cloud/img/alert-notifications-configuration.png" alt="Alert Notifications Configuration">
<p><br>
<!-- -->This setting is per-service, and the default is to notify for everything.</p>
<p>Let me give you an example from IncidentHub itself. We use Paddle for our subscription billing. It's business-critical for us. We want to be notified for every update from Paddle outages and maintenance events. This helps us to be on top of issues and respond to customer queries quickly.</p>
<p>In contrast, we use Grafana Cloud as an external store for our logs. It is not business-critical and we are okay to miss the intermediate notifications as long we know an outage is ongoing or has been resolved. A Grafana Cloud outage does  make it difficult to debug production issues, but we can temporarily fallback to our cloud hosting provider's log viewer.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="best-practices">Best Practices<a href="https://blog.incidenthub.cloud/how-to-fine-tune-your-incidenthub-alerts#best-practices" class="hash-link" aria-label="Direct link to Best Practices" title="Direct link to Best Practices" translate="no">​</a></h2>
<p>To get the most out of IncidentHub's alerting, you can follow these recommendations:</p>
<ul>
<li class="">Revisit your components for your services periodically to ensure they are up to date. Add or remove component filters as needed.</li>
<li class="">Check your alert notification configuration once in a while - this might change depending on your business needs.</li>
<li class="">If you or your team find yourself ignoring certain alerts, it's a sign that it's too noisy, and you need to reconfigure them.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="conclusion">Conclusion<a href="https://blog.incidenthub.cloud/how-to-fine-tune-your-incidenthub-alerts#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>IncidentHub's alert fine-tuning features are designed to make alerts more meaningful and prevent alert fatigue. Use them to focus on what matters to your business.</p>
<p>You can sign up for a free (forever) account at <a href="https://incidenthub.com/#pricing" target="_blank" rel="noopener noreferrer" class="">incidentHub.cloud</a>.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="references">References<a href="https://blog.incidenthub.cloud/how-to-fine-tune-your-incidenthub-alerts#references" class="hash-link" aria-label="Direct link to References" title="Direct link to References" translate="no">​</a></h2>
<ul>
<li class=""><a href="https://docs.incidenthub.cloud/incidenthub-documentation/services/monitoring-a-service#choosing-components-to-monitor" target="_blank" rel="noopener noreferrer" class="">Component Filtering documentation</a></li>
<li class=""><a href="https://docs.incidenthub.cloud/incidenthub-documentation/services/monitoring-a-service#auto-detecting-components" target="_blank" rel="noopener noreferrer" class="">Component Auto-Detection documentation</a></li>
<li class=""><a href="https://docs.incidenthub.cloud/incidenthub-documentation/services/monitoring-a-service#choosing-which-notifications-to-receive" target="_blank" rel="noopener noreferrer" class="">Alert Notifications documentation</a></li>
<li class=""><a class="" href="https://blog.incidenthub.cloud/Monitoring-Specific-Components-and-Regions-in-Your-Third-Party-Services">More about component filtering</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="faq">FAQ<a href="https://blog.incidenthub.cloud/how-to-fine-tune-your-incidenthub-alerts#faq" class="hash-link" aria-label="Direct link to FAQ" title="Direct link to FAQ" translate="no">​</a></h2>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>What makes a third-party service alert relevant?</summary><div><div class="collapsibleContent_i85q"><p></p><p>A third-party service alert is relevant if it directly impacts your business, such as an outage in a specific service or region your applications rely on.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>What makes a third-party service alert actionable?</summary><div><div class="collapsibleContent_i85q"><p></p><p>A third-party service alert is actionable if you can take steps to address it, helping you respond effectively to the outage.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>How can I filter IncidentHub alerts by components?</summary><div><div class="collapsibleContent_i85q"><p></p><p>When adding or editing a service in IncidentHub, select "Monitor Specific Components" and choose the components you want alerts for.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>What is Component Auto-Detection?</summary><div><div class="collapsibleContent_i85q"><p></p><p>This beta feature (currently for Google Cloud Platform) auto-detects components from your cloud provider’s invoice (CSV file) to simplify setup. Billing data isn’t stored.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>Can I choose for which updates in an outage IncidentHub alerts me?</summary><div><div class="collapsibleContent_i85q"><p></p><p>Yes, per-service settings let you choose notifications for outage start, end, or all updates, based on the service’s criticality to your business.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>How can I avoid alert fatigue?</summary><div><div class="collapsibleContent_i85q"><p></p><p>Filter out irrelevant alerts by selecting specific components and adjusting notification settings to focus on what matters.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>What are some best practices for using IncidentHub alerts?</summary><div><div class="collapsibleContent_i85q"><p></p><ol>
<li class="">Periodically review monitored components.</li>
<li class="">Adjust notification settings as business needs evolve.</li>
<li class="">Reconfigure alerts if you're ignoring them due to noise.</li>
</ol><p></p></div></div></details>
<p><em>IncidentHub is not affiliated with any of the services and vendors mentioned in this article.</em></p>]]></content:encoded>
            <category>Alerting</category>
            <category>Product</category>
        </item>
        <item>
            <title><![CDATA[Top 6 Reasons Why You Need a Status Page Aggregator]]></title>
            <link>https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator</link>
            <guid>https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator</guid>
            <pubDate>Mon, 31 Mar 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[A status page aggregator aggregates multiple status pages into a single view. Here are the top 6 reasons why you should use one.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="introduction">Introduction<a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#introduction" class="hash-link" aria-label="Direct link to Introduction" title="Direct link to Introduction" translate="no">​</a></h2>
<p><em>Last updated on August 8, 2025.</em></p>
<p>Your business depends on the reliability of the third-party services you use. Monitoring multiple status pages, one for each of these services, is the best way of keeping track of their outages and maintenances.
Although some status pages let you subscribe to alerts, there is no standard way of doing this. Service providers can change their status page providers, disable subscriptions, or not support the same notification options.</p>
<p>A status page aggregator is a tool that solves all these problems by summarizing the status pages of multiple services in one place.
If you depend on only 2-3 third-party services, you can probably get away without a status page aggregator. Beyond that, it becomes hard to stay on top of third-party service outages and maintenances, and leaves serious gaps in your monitoring.</p>
<p>Let's look at the top 6 reasons why you need a status page aggregator.</p>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#introduction" class="">Introduction</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#top-6-reasons-why-you-need-a-status-page-aggregator" class="">Top 6 Reasons Why You Need a Status Page Aggregator</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#services-can-change-status-page-providers" class="">Services Can Change Status Page Providers</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#not-all-status-pages-let-you-subscribe-to-specific-components-and-regions" class="">Not All Status Pages Let You Subscribe to Specific Components and Regions</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#there-can-be-too-many-status-pages-to-track" class="">There Can Be Too Many Status Pages To Track</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#status-page-urls-can-change" class="">Status Page URLs Can Change</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#some-status-pages-dont-have-any-way-of-subscribing-to-outages" class="">Some Status Pages Don't Have Any Way of Subscribing to Outages</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#home-grown-status-aggregation-approaches-do-not-work" class="">Home-Grown Status Aggregation Approaches Do Not Work</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#stringing-up-rss-feeds-into-slackdiscord" class="">Stringing Up RSS Feeds Into Slack/Discord</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#building-your-own-tool" class="">Building Your Own Tool</a></li>
</ul>
</li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#conclusion" class="">Conclusion</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#summary---why-you-need-a-status-page-aggregator" class="">Summary - Why You Need a Status Page Aggregator</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#faq" class="">FAQ</a></li>
</ul>
<img style="border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/status-page-aggregator-2048.png" alt="Status Page Aggregator">
<p style="background-color:var(--ifm-color-emphasis-100);padding-top:18px;padding-bottom:1px;margin-top:20px;text-align:center;border-radius:10px"></p><p>Download a summary of this article as a <a href="https://cdn.incidenthub.cloud/ebooks/Top-6-Reasons-Why-You-Need-a-Status-Page-Aggregator.pdf" target="_blank" rel="noopener noreferrer" class="">PDF</a></p><p></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="top-6-reasons-why-you-need-a-status-page-aggregator">Top 6 Reasons Why You Need a Status Page Aggregator<a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#top-6-reasons-why-you-need-a-status-page-aggregator" class="hash-link" aria-label="Direct link to Top 6 Reasons Why You Need a Status Page Aggregator" title="Direct link to Top 6 Reasons Why You Need a Status Page Aggregator" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="services-can-change-status-page-providers">Services Can Change Status Page Providers<a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#services-can-change-status-page-providers" class="hash-link" aria-label="Direct link to Services Can Change Status Page Providers" title="Direct link to Services Can Change Status Page Providers" translate="no">​</a></h3>
<p>Businesses use a <a class="" href="https://blog.incidenthub.cloud/Best-Practices-Choosing-Status-Page-Provider">status page provider</a> to create a managed status page that they can use to communicate with their customers and users. Depending on business needs, provider reliability, integration options, and more, businesses can change their status page provider. The status page URL would remain the same but the format and subscription options would change.</p>
<p>A recent example of such a move is OpenAI's status page. In Jan 2025, OpenAI was using Atlassian Statuspage. You can check it at the <a href="https://web.archive.org/web/20250101055627/https://status.openai.com/" target="_blank" rel="noopener noreferrer" class="">Wayback Machine</a>.</p>
<img style="border:1px solid #e0e0e0;border-radius:10px" src="https://blog.incidenthub.cloud/img/previous-open-ai-status-page.png" alt="Previous OpenAI Status Page">
<p><br>
<!-- -->The <a href="https://status.openai.com/" target="_blank" rel="noopener noreferrer" class="">current OpenAI status page</a> as of this writing is managed by Incident.io. The URL remains the same.</p>
<img style="border:1px solid #e0e0e0;border-radius:10px" src="https://blog.incidenthub.cloud/img/current-open-ai-status-page.png" alt="Current OpenAI Status Page">
<p><br>
<!-- -->The subscription options have changed. If you were previously subscribed using webhooks, that option is no longer available.
What's more, you would not even know that this happened. Once you setup the webhook subscription, you would not visit the status page except to check for details of outages and maintenances. If the subscription were removed, you would be blissfully unaware of any future outages. That is, until the outages start affecting your applications, and by extension, your business.
You can end up with angry customers, lost revenue, and stressed SRE/Ops teams.</p>
<p>IncidentHub - a status page aggregator - automatically detects such changes. Using an aggregator shifts the responsibility of outage notifications to the aggregator, which can smooth over any differences in the status page providers.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="not-all-status-pages-let-you-subscribe-to-specific-components-and-regions">Not All Status Pages Let You Subscribe to Specific Components and Regions<a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#not-all-status-pages-let-you-subscribe-to-specific-components-and-regions" class="hash-link" aria-label="Direct link to Not All Status Pages Let You Subscribe to Specific Components and Regions" title="Direct link to Not All Status Pages Let You Subscribe to Specific Components and Regions" translate="no">​</a></h3>
<p>Your third-party cloud and SaaS dependencies would be globally distributed and have many regions of operation. Your applications use a subset of these services. Why receive alerts for everything?</p>
<p>Some status pages, like <a href="https://stabilityai.instatus.com/" target="_blank" rel="noopener noreferrer" class="">Stability.ai's</a>, let you subscribe to specific components and regions.</p>
<img style="border:1px solid #e0e0e0;border-radius:10px" src="https://blog.incidenthub.cloud/img/stability-ai-subscribe.png" alt="Stability.ai Status Page">
<p><br>
<!-- -->Others, like <a href="https://status.litellm.ai/" target="_blank" rel="noopener noreferrer" class="">LiteLLM's status page</a>, have an RSS feed only. If you connect the feed to your Slack channel using the <a href="https://slack.com/intl/en-in/help/articles/218688467-Add-RSS-feeds-to-Slack" target="_blank" rel="noopener noreferrer" class=""><code>/feed</code></a> command, you will get notified of each and every outage in LiteLLM. There is no way to subscribe to a specific LiteLLM service from its status page.</p>
<img style="border:1px solid #e0e0e0;border-radius:10px" src="https://blog.incidenthub.cloud/img/litellm-status-page.png" alt="LiteLLM Status Page">
<p><br>
<!-- -->A status page aggregator like IncidentHub lets you monitor <a class="" href="https://blog.incidenthub.cloud/Monitoring-Specific-Components-and-Regions-in-Your-Third-Party-Services">specific components and regions</a> as long as the information is on the status page.
This is true even when the originating status page does not offer component-specific subscriptions.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="there-can-be-too-many-status-pages-to-track">There Can Be Too Many Status Pages To Track<a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#there-can-be-too-many-status-pages-to-track" class="hash-link" aria-label="Direct link to There Can Be Too Many Status Pages To Track" title="Direct link to There Can Be Too Many Status Pages To Track" translate="no">​</a></h3>
<p>According to the <a href="https://www.bettercloud.com/resources/state-of-saas/" target="_blank" rel="noopener noreferrer" class="">State of SaaSOps Report 2024</a>, organizations use an average of 112 SaaS tools.
Even for smaller organizations and startups, most operations are outsourced to SaaS and Cloud vendors. 100+ tools means 100+ chances of unnoticed disruptions.</p>
<p>Monitoring all these services manually by tracking their status pages is not only hard but also not scalable.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="status-page-urls-can-change">Status Page URLs Can Change<a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#status-page-urls-can-change" class="hash-link" aria-label="Direct link to Status Page URLs Can Change" title="Direct link to Status Page URLs Can Change" translate="no">​</a></h3>
<p>For various reasons, the third-party vendor's organization can change their status page URLs.</p>
<p>Cloudflare acquired Area 1 Security, which previously had its own <a href="https://web.archive.org/web/20250108114141/https://status.area1security.com/" target="_blank" rel="noopener noreferrer" class="">status page</a>.
A few months ago, they removed the status page and Area 1's products are part of the <a href="https://www.cloudflarestatus.com/" target="_blank" rel="noopener noreferrer" class="">Cloudflare status page</a> now.</p>
<img style="border:1px solid #e0e0e0;border-radius:10px" src="https://blog.incidenthub.cloud/img/area1-products-in-cloudflare.png" alt="Area 1 Products on the Cloudflare Status Page">
<p><br>
<!-- -->If you were previously monitoring Area 1's status page directly using just RSS feeds or email notifications, you might not have known about this change, leaving you exposed to undetected outages.</p>
<p>Another example is Railway's status page which moved from <a href="https://web.archive.org/web/20240728002854/https://railway.app/" target="_blank" rel="noopener noreferrer" class=""><code>status.railway.app</code></a> to <a href="https://status.railway.com/" target="_blank" rel="noopener noreferrer" class=""><code>status.railway.com</code></a>.</p>
<p>IncidentHub detects such changes and auto-adjusts its monitoring.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="some-status-pages-dont-have-any-way-of-subscribing-to-outages">Some Status Pages Don't Have Any Way of Subscribing to Outages<a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#some-status-pages-dont-have-any-way-of-subscribing-to-outages" class="hash-link" aria-label="Direct link to Some Status Pages Don't Have Any Way of Subscribing to Outages" title="Direct link to Some Status Pages Don't Have Any Way of Subscribing to Outages" translate="no">​</a></h3>
<p>Most status pages have at least an RSS or Atom feed. However, some status pages don't have any visible means of subscribing to outages.
You need to keep refreshing the status page. This is just not feasible if you have a lot of dependencies.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="home-grown-status-aggregation-approaches-do-not-work">Home-Grown Status Aggregation Approaches Do Not Work<a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#home-grown-status-aggregation-approaches-do-not-work" class="hash-link" aria-label="Direct link to Home-Grown Status Aggregation Approaches Do Not Work" title="Direct link to Home-Grown Status Aggregation Approaches Do Not Work" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="stringing-up-rss-feeds-into-slackdiscord">Stringing Up RSS Feeds Into Slack/Discord<a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#stringing-up-rss-feeds-into-slackdiscord" class="hash-link" aria-label="Direct link to Stringing Up RSS Feeds Into Slack/Discord" title="Direct link to Stringing Up RSS Feeds Into Slack/Discord" translate="no">​</a></h4>
<p>Using a glue script to string up RSS feeds into Slack/Discord is a common approach.
While some status pages offer RSS feeds, not all of them do. Some status pages don't have any visible means of subscribing to outages.
This approach also lacks any filtering capabilities for components and regions. You cannot search through historical data easily or look at ongoing incidents and maintenance at a glance.
There is no single view of all your services. Additionally, some RSS feeds won't notify you when an incident is resolved. As noted above, this will break easily when anything changes in the status page provider or URL.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="building-your-own-tool">Building Your Own Tool<a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#building-your-own-tool" class="hash-link" aria-label="Direct link to Building Your Own Tool" title="Direct link to Building Your Own Tool" translate="no">​</a></h4>
<p>Some engineering and IT teams choose to <a class="" href="https://blog.incidenthub.cloud/Monitoring-Third-Party-Vendors-As-An-Ops-Engineer-SRE">build their own tooling</a> to get around the above problems. After all, why pay for a status page aggregator when you can build your own?
Any self-respecting Ops Engineer/SRE would probably want to whip up a script and try to write this tool by themselves. However, such a homegrown solution requires a lot of upfront development and ongoing maintenance effort. The technical challenges themselves are significant. In addition, there are other costs:</p>
<ol>
<li class="">Any software you write needs maintenance. E.g. when your organization starts using a new service that cannot be monitored using your existing tooling, you need to add support for it.</li>
<li class="">Somebody has to ensure reliability and uptime of the homegrown solution.</li>
<li class="">It becomes an additional burden on your already overburdened SRE/Ops teams.</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="conclusion">Conclusion<a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>The challenges in monitoring multiple status pages yourself or using home-grown solutions are real. A status page aggregator like IncidentHub solves these problems by providing a reliable and scalable solution.
IncidentHub adapts to status page quirks, URL changes, and more, continuously, where more basic tools can falter.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="summary---why-you-need-a-status-page-aggregator">Summary - Why You Need a Status Page Aggregator<a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#summary---why-you-need-a-status-page-aggregator" class="hash-link" aria-label="Direct link to Summary - Why You Need a Status Page Aggregator" title="Direct link to Summary - Why You Need a Status Page Aggregator" translate="no">​</a></h2>
<p>Here's a tabular summary of the top 6 reasons why you need a status page aggregator:</p>
<table><thead><tr><th><strong>Reason</strong></th><th><strong>Description</strong></th></tr></thead><tbody><tr><td><strong>Services Can Change Status Page Providers</strong></td><td>Status page providers can change (e.g., OpenAI switched from Atlassian Statuspage to Incident.io), breaking existing subscriptions and notification setups.</td></tr><tr><td><strong>Not All Status Pages Let You Subscribe to Specific Components and Regions</strong></td><td>Many status pages only offer RSS feeds without component/region filtering, forcing you to receive alerts for all services even when you only use specific ones.</td></tr><tr><td><strong>There Can Be Too Many Status Pages To Track</strong></td><td>Organizations use an average of 112 SaaS tools, making manual monitoring of 100+ status pages impractical and unscalable.</td></tr><tr><td><strong>Status Page URLs Can Change</strong></td><td>Status page URLs can change (e.g., Area 1 Security merged into Cloudflare, Railway moved domains), leaving you unaware of outages if you're monitoring the old URL.</td></tr><tr><td><strong>Some Status Pages Don't Have Any Way of Subscribing to Outages</strong></td><td>Some status pages lack RSS feeds or any subscription options, requiring manual page refreshing which is not feasible for multiple dependencies.</td></tr><tr><td><strong>Home-Grown Status Aggregation Approaches Do Not Work</strong></td><td>DIY solutions using RSS feeds or custom tools lack filtering capabilities, break when status pages change, and place an ongoing maintenance burden on SRE/Ops teams.</td></tr></tbody></table>
<p style="background-color:var(--ifm-color-emphasis-100);padding-top:18px;padding-bottom:1px;margin-top:20px;text-align:center;border-radius:10px"></p><p>Create an <a href="https://incidenthub.cloud/#pricing" target="_blank" rel="noopener noreferrer" class="">IncidentHub</a> account to never miss an outage again.</p><p></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="faq">FAQ<a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator#faq" class="hash-link" aria-label="Direct link to FAQ" title="Direct link to FAQ" translate="no">​</a></h2>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>What is a status page aggregator?</summary><div><div class="collapsibleContent_i85q"><p></p><p>A status page aggregator is a tool that collects and monitors multiple third-party status pages in one centralized location, allowing you to track outages and maintenance across all your service dependencies.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>How many third-party services justify using a status page aggregator?</summary><div><div class="collapsibleContent_i85q"><p></p><p>If you depend on more than 2-3 third-party services, a status page aggregator becomes valuable as it becomes increasingly difficult to manually track multiple status pages.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>Why can't I just subscribe to each status page individually?</summary><div><div class="collapsibleContent_i85q"><p></p><p>There's no standard subscription method across status pages. Providers can change their status page platforms, disable subscriptions, or offer limited notification options, making individual management unreliable.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>What happens if a service changes its status page provider?</summary><div><div class="collapsibleContent_i85q"><p></p><p>When services change status page providers (like OpenAI switching from Atlassian Statuspage to Incident.io), your existing subscription methods may stop working without notice, causing you to miss critical outage alerts.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>Can I monitor specific components or regions with a status page aggregator?</summary><div><div class="collapsibleContent_i85q"><p></p><p>Yes, unlike many individual status pages that don't offer granular subscriptions, a status page aggregator lets you subscribe to specific components and regions, reducing alert noise.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>What if a service changes its status page URL?</summary><div><div class="collapsibleContent_i85q"><p></p><p>A status page aggregator handles URL changes (like Railway's move from status.railway.app to status.railway.com) automatically, ensuring you don't lose monitoring coverage.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>How do status page aggregators handle services with no subscription options?</summary><div><div class="collapsibleContent_i85q"><p></p><p>Status page aggregators monitor status pages even when they don't offer subscription methods, and alert you using the aggregator's own notification options.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>Why not build our own status page monitoring solution?</summary><div><div class="collapsibleContent_i85q"><p></p><p>Home-grown solutions require significant development and ongoing maintenance, create additional reliability concerns, and burden your SRE/Ops teams. A dedicated status page aggregator like IncidentHub eliminates these challenges.</p><p></p></div></div></details>
<p>This article first appeared on the <a href="https://blog.incidenthub.cloud/top-six-reasons-why-you-need-a-status-page-aggregator" target="_blank" rel="noopener noreferrer" class="">IncidentHub blog</a>.</p>
<p><em>IncidentHub is not affiliated with any of the services and vendors mentioned in this article.</em></p>]]></content:encoded>
            <category>Status Page</category>
            <category>Product</category>
            <category>Status Page Aggregators</category>
        </item>
        <item>
            <title><![CDATA[How to Receive IncidentHub Alerts in your Webhook]]></title>
            <link>https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook</link>
            <guid>https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook</guid>
            <pubDate>Wed, 26 Mar 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Integrate IncidentHub with webhooks. Receive IncidentHub alerts - including trigger, update, and resolve events - directly in your webhook.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="introduction">Introduction<a href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook#introduction" class="hash-link" aria-label="Direct link to Introduction" title="Direct link to Introduction" translate="no">​</a></h2>
<p>IncidentHub has many integrations to receive alerts. You can choose from Slack, Webhook, Email, Discord, PagerDuty, and more.
In this article, we will explore how to receive IncidentHub alerts in your webhooks.</p>
<img style="border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/webhook-integration.png" alt="IncidentHub Webhook Integration">
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook#introduction" class="">Introduction</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook#what-are-webhooks" class="">What are Webhooks?</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook#using-webhooks-for-incidenthub-alerts" class="">Using Webhooks For IncidentHub Alerts</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook#integrating-your-webhook" class="">Integrating Your Webhook</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook#testing-your-webhook" class="">Testing your Webhook</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook#the-webhook-payload-format" class="">The Webhook payload format</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook#webhook-security" class="">Webhook Security</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook#webhook-errors" class="">Webhook Errors</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook#best-practices" class="">Best Practices</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook#conclusion" class="">Conclusion</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook#faq" class="">FAQ</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-are-webhooks">What are Webhooks?<a href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook#what-are-webhooks" class="hash-link" aria-label="Direct link to What are Webhooks?" title="Direct link to What are Webhooks?" translate="no">​</a></h2>
<p>Webhooks are a mechanism to receive notifications over HTTP whenever specific events occur. A common use case is to receive notifications from a third-party when
certain events are triggered. Webhooks are an example of a push mechanism, as opposed to a pull mechanism like polling. The benefit of a webhook over a polling mechanism
is that you only need to register your webhook URL once.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="using-webhooks-for-incidenthub-alerts">Using Webhooks For IncidentHub Alerts<a href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook#using-webhooks-for-incidenthub-alerts" class="hash-link" aria-label="Direct link to Using Webhooks For IncidentHub Alerts" title="Direct link to Using Webhooks For IncidentHub Alerts" translate="no">​</a></h2>
<p>IncidentHub has a way to integrate with your webhooks such that you can receive all the usual notifications for incidents, including trigger, update, and resolve events.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="integrating-your-webhook">Integrating Your Webhook<a href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook#integrating-your-webhook" class="hash-link" aria-label="Direct link to Integrating Your Webhook" title="Direct link to Integrating Your Webhook" translate="no">​</a></h3>
<p>Integrating your webhook is simple:</p>
<ul>
<li class="">Login to your IncidentHub account.</li>
<li class="">Under "My Channels", click on "Add" and then select "Webhook".</li>
<li class="">Enter a name and a description for your webhook.</li>
<li class="">Enter the webhook URL.</li>
<li class="">Click on "Save".</li>
</ul>
<img style="border-radius:10px;border:1px" src="https://blog.incidenthub.cloud/img/webhook-popup.png" alt="Create a Webhook">
<p><br>
<!-- -->Webhooks are available in IncidentHub's Starter Plan and above. You can sign up directly for a Starter Plan from the <a href="https://incidenthub.cloud/#pricing" target="_blank" rel="noopener noreferrer" class="">website</a>. If you
are already a user on the Free plan, you can upgrade from your IncidentHub account dashboard.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="testing-your-webhook">Testing your Webhook<a href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook#testing-your-webhook" class="hash-link" aria-label="Direct link to Testing your Webhook" title="Direct link to Testing your Webhook" translate="no">​</a></h3>
<p>While creating the webhook, you can test if IncidentHub can reach your URL. Enter your webhook URL in the popup and click on "Send a test message". IncidentHub will send a test payload to your webhook URL. If the test is successful, you will see a message "Success" - otherwise you will see an error message. You should also check your webhook server to verify that it received the payload correctly.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-webhook-payload-format">The Webhook payload format<a href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook#the-webhook-payload-format" class="hash-link" aria-label="Direct link to The Webhook payload format" title="Direct link to The Webhook payload format" translate="no">​</a></h3>
<p>IncidentHub sends the payload to the webhook as a JSON in a POST request. The format looks like this:</p>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">"type": "updated" | "resolved",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">"timestamp": time as a string with time zone indicator,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">"data": {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "service": Service name,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "statusPage": Link to the official status page,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "eventURL": Link to the incident,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "content": Content of the incident,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "title": Title of the incident,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}</span><br></span></code></pre></div></div>
<p>An example payload looks like this:</p>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "type": "updated",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "timestamp": "2024-10-01T22:38:01.725-07:00",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "data": {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "service": "Twilio",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "statusPage": "https://status.twilio.com/",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "eventURL": "https://stspg.io/mjg3188vltfh",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "content": "We are experiencing call dialling delays to InUsually, ydosat Ooredoo Indonesia phone numbers. Our engineers are working with our carrier partner to resolve the issue. We will provide another update in 1 hour or as soon as more information becomes available.",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "title": "Call Dialling Delays To Indosat Ooredoo Indonesia  Phone Numbers"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}</span><br></span></code></pre></div></div>
<p>The <code>Content-Type: application/json</code> header will be set on the request.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="webhook-security">Webhook Security<a href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook#webhook-security" class="hash-link" aria-label="Direct link to Webhook Security" title="Direct link to Webhook Security" translate="no">​</a></h3>
<p>Your webhook must be a secure endpoint as it will called over the public internet. HTTPS is a must. Usually, you will add a secret token to the webhook URL. For example,</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">https://your-webhook-server.com/webhook/your-secret-token</span><br></span></code></pre></div></div>
<p>IncidentHub will directly invoke your webhook URL with the payload. The webhook URL is stored encrypted in our database and is never exposed anywhere, not even to you in the IncidentHub dashboard. You cannot edit your webhook URL once it is created. To make any changes, delete the existing webhook and create a new one.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="webhook-errors">Webhook Errors<a href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook#webhook-errors" class="hash-link" aria-label="Direct link to Webhook Errors" title="Direct link to Webhook Errors" translate="no">​</a></h3>
<p>If IncidentHub encounters an error while invoking your webhook, it will send an alert email to the email address that your account is registered with.
This is a quick way of notifying you that there was an error. Until you fix the webhook, IncidentHub won't be able to send any notifications to it. Once it is fixed IncidentHub will automatically resume sending notifications.</p>
<p>A sample alert email looks like this:</p>
<img style="border-radius:10px;border:1px" src="https://blog.incidenthub.cloud/img/webhook-failed-alert.png" alt="Webhook Failed Alert">
<p><br>
<!-- -->You should have your own monitoring in place to periodically check your webhook. The IncidentHub webhook failure alert complements your monitoring and is not a substitute.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="best-practices">Best Practices<a href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook#best-practices" class="hash-link" aria-label="Direct link to Best Practices" title="Direct link to Best Practices" translate="no">​</a></h2>
<ul>
<li class="">Do not share your webhook URL or store it in a publicly accessible place. This includes internal wikis and shared documents in your organization.</li>
<li class="">Monitor your email for any alerts from IncidentHub with the subject "Your webhook had errors".</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="conclusion">Conclusion<a href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>Webhooks are a powerful way to receive IncidentHub alerts. They offer flexibility in integrating with almost any custom tool or system that you want.
You can integrate these alerts with your internal monitoring systems, your Internal Developer Portal, or your status dashboard.</p>
<p>You can sign up for a free (forever) IncidentHub account and <a href="https://incidenthub.cloud/#pricing" target="_blank" rel="noopener noreferrer" class="">try it out</a>.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="faq">FAQ<a href="https://blog.incidenthub.cloud/how-to-receive-incident-hub-alerts-in-your-webhook#faq" class="hash-link" aria-label="Direct link to FAQ" title="Direct link to FAQ" translate="no">​</a></h2>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>Can I add multiple webhooks in my IncidentHub account?</summary><div><div class="collapsibleContent_i85q"><p></p><p>Yes. You can add multiple webhooks to your IncidentHub account. The number of webhooks is only limited by your subscription plan.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>What information is sent by IncidentHub in a webhook?</summary><div><div class="collapsibleContent_i85q"><p></p><p>IncidentHub sends all the key details of the incident - the date and time of the incident, the type of the incident, the title, and the URL to the actual incident on the service provider's status page.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>When should I choose webhooks to receive IncidentHub alerts?</summary><div><div class="collapsibleContent_i85q"><p></p><p>You can choose webhooks to receive alerts if you want to integrate with your custom tools or systems.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>What are the benefits of using webhooks for incident alerts?</summary><div><div class="collapsibleContent_i85q"><p></p><p>Webhooks are a powerful way to receive IncidentHub alerts. They offer flexibility in integrating with almost any tool or system that you want. You can process the webhook payloads the way you want.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>How do I ensure security of my webhook?</summary><div><div class="collapsibleContent_i85q"><p></p><p>Do not share your webhook URL or store it in a publicly accessible place. This includes internal wikis and shared repositories in your organization.
Ensure that webhooks are only accessible over HTTPS, and have a secure token embedded in the URL.</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>How do I monitor my webhook?</summary><div><div class="collapsibleContent_i85q"><p></p><p>You can monitor your webhook by checking your email for any alerts from IncidentHub with the subject "Your webhook had errors".</p><p></p></div></div></details>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>How do I test my webhook before integrating it with IncidentHub?</summary><div><div class="collapsibleContent_i85q"><p></p><p>You can test your webhook by clicking on the "Send a test message" button on the webhook creation popup in the IncidentHub dashboard.</p><p></p></div></div></details>
<p>You might also be interested in <a class="" href="https://blog.incidenthub.cloud/Integrate-Incident-Alerts-With-Discord-Using-Webhooks">Integrate Incident Alerts With Discord Using Webhooks</a></p>]]></content:encoded>
            <category>Alerting</category>
            <category>Ops</category>
            <category>Webhook</category>
            <category>Product</category>
        </item>
        <item>
            <title><![CDATA[January 2025 Product Update - Easier Onboarding, Better User Experience, and Reliability Improvements]]></title>
            <link>https://blog.incidenthub.cloud/product-update-easier-onboarding-better-user-experience-and-reliability-improvements</link>
            <guid>https://blog.incidenthub.cloud/product-update-easier-onboarding-better-user-experience-and-reliability-improvements</guid>
            <pubDate>Wed, 29 Jan 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[In the last two months, we've focused on improving the onboarding experience for users, smoothened the dashboard UX, and added several reliability improvements.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="introduction">Introduction<a href="https://blog.incidenthub.cloud/product-update-easier-onboarding-better-user-experience-and-reliability-improvements#introduction" class="hash-link" aria-label="Direct link to Introduction" title="Direct link to Introduction" translate="no">​</a></h2>
<p>For the last two months, we have focused on improving the onboarding experience for users so that they can get started with monitoring with minimal effort. We have also added several
improvements in the backend to make the service more robust and reliable. Some of the usability improvements are  driven by user feedback. Others incorporate what we would personally
like to see in such a monitoring service. We have also improved the dashboard user experience.</p>
<img style="border:1px solid #e0e0e0;border-radius:10px" src="https://cdn.incidenthub.cloud/blog/incidents-services-on-top.png" alt="IncidentHub Dashboard">
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-easier-onboarding-better-user-experience-and-reliability-improvements#introduction" class="">Introduction</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-easier-onboarding-better-user-experience-and-reliability-improvements#easier-onboarding" class="">Easier onboarding</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-easier-onboarding-better-user-experience-and-reliability-improvements#service-component-autodetection" class="">Service Component Autodetection</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-easier-onboarding-better-user-experience-and-reliability-improvements#commonly-used-services" class="">Commonly Used Services</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-easier-onboarding-better-user-experience-and-reliability-improvements#test-channels" class="">Test Channels</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-easier-onboarding-better-user-experience-and-reliability-improvements#dashboard-improvements" class="">Dashboard Improvements</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-easier-onboarding-better-user-experience-and-reliability-improvements#services-with-incidents" class="">Services With Incidents</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-easier-onboarding-better-user-experience-and-reliability-improvements#reliability-improvements" class="">Reliability Improvements</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-easier-onboarding-better-user-experience-and-reliability-improvements#alert-emails-for-webhook-failures" class="">Alert Emails for Webhook Failures</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/product-update-easier-onboarding-better-user-experience-and-reliability-improvements#bug-fixes-and-robustness-improvements" class="">Bug Fixes and Robustness Improvements</a></li>
</ul>
</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="easier-onboarding">Easier onboarding<a href="https://blog.incidenthub.cloud/product-update-easier-onboarding-better-user-experience-and-reliability-improvements#easier-onboarding" class="hash-link" aria-label="Direct link to Easier onboarding" title="Direct link to Easier onboarding" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="service-component-autodetection">Service Component Autodetection<a href="https://blog.incidenthub.cloud/product-update-easier-onboarding-better-user-experience-and-reliability-improvements#service-component-autodetection" class="hash-link" aria-label="Direct link to Service Component Autodetection" title="Direct link to Service Component Autodetection" translate="no">​</a></h3>
<p>Setting up monitoring in IncidentHub is straightforward. However, choosing the right components for your services can be daunting, especially for ones that have hundreds of components like Google Cloud Platform and Amazon Web Services.
If you miss out selecting a component, you will not receive alerts for it. On the other hand, if you select all components, you will be overwhelmed with irrelevant alerts.</p>
<p>Instead of you having to painstakingly choose individual components, we have added an autodetection feature. Just upload your billing report as a CSV file and we will automatically detect your components for you.</p>
<p>This feature is available for Google Cloud Platform as a preview at the moment. It will be rolled out to other services soon.</p>
<ul>
<li class="">Go to Google Cloud Console -&gt; Billing -&gt; Your Billing Account -&gt; Reports</li>
<li class="">In "Filters", choose "Group by" as "SKU"</li>
<li class="">Click "Download CSV". You can remove the price and usage colums from the CSV if you want before uploading.</li>
</ul>
<p><img decoding="async" loading="lazy" alt="GCP Autodetected Services" src="https://blog.incidenthub.cloud/assets/images/gcp-autodetected-services-38bf11986e44a9fb8e35c397b1d82d17.png" width="900" height="1000" class="img_ev3q"></p>
<p>We make a best effort attempt to detect the regions and services. You can edit them before saving.</p>
<p><img decoding="async" loading="lazy" alt="GCP Autodetected Regions" src="https://blog.incidenthub.cloud/assets/images/gcp-autodetected-regions-eb2a9eb877a1a2acd88492316b8b808b.png" width="900" height="800" class="img_ev3q"></p>
<p>Some common Google Cloud Platform products are highlighted in the list of components with a "(Suggested)" label even if they are not detected in your billing report. E.g. Virtual Private Cloud is something you would definitely use
as it's a building block for a Google Cloud setup, but it's not something that might show up in your billing report.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="commonly-used-services">Commonly Used Services<a href="https://blog.incidenthub.cloud/product-update-easier-onboarding-better-user-experience-and-reliability-improvements#commonly-used-services" class="hash-link" aria-label="Direct link to Commonly Used Services" title="Direct link to Commonly Used Services" translate="no">​</a></h3>
<p>When you login for the first time to your IncidentHub dashboard, you will see a list of services that are commonly used by most users. You can click on them directly to add them to your monitoring.
<img decoding="async" loading="lazy" alt="Commonly used services" src="https://blog.incidenthub.cloud/assets/images/common-services-e4594caa74751feb1a148aa1a03dc562.png" width="1500" height="500" class="img_ev3q"></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="test-channels">Test Channels<a href="https://blog.incidenthub.cloud/product-update-easier-onboarding-better-user-experience-and-reliability-improvements#test-channels" class="hash-link" aria-label="Direct link to Test Channels" title="Direct link to Test Channels" translate="no">​</a></h3>
<p>This was a long-overdue feature that we ourselves wanted. When we add a channel (email, Slack, Discord, etc.), we want to make sure that it works. However, notifications only sent when there are incidents in any of the monitored services.
The "Test Channel" feature allows you to test if your channel is working before adding it. It's available for all channels and across all subscription plans.</p>
<p><img decoding="async" loading="lazy" alt="Discord Channel Test" src="https://blog.incidenthub.cloud/assets/images/discord-channel-test-bf4d8c196797bb2bb96a048dc7c7fab8.png" width="900" height="600" class="img_ev3q"></p>
<p>It sends a test notification to the channel. A Discord example is shown below:</p>
<p><img decoding="async" loading="lazy" alt="Discord Test Message" src="https://blog.incidenthub.cloud/assets/images/discord-test-message-2b63d337fea39d11a2609179c1d0c899.png" width="700" height="500" class="img_ev3q"></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="dashboard-improvements">Dashboard Improvements<a href="https://blog.incidenthub.cloud/product-update-easier-onboarding-better-user-experience-and-reliability-improvements#dashboard-improvements" class="hash-link" aria-label="Direct link to Dashboard Improvements" title="Direct link to Dashboard Improvements" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="services-with-incidents">Services With Incidents<a href="https://blog.incidenthub.cloud/product-update-easier-onboarding-better-user-experience-and-reliability-improvements#services-with-incidents" class="hash-link" aria-label="Direct link to Services With Incidents" title="Direct link to Services With Incidents" translate="no">​</a></h3>
<p>Previously, the only way to see services with incidents was to click on the "Details" button, or look at the "Availability" page.
We've now made it easier by displaying the services with incidents on top of the list with an indicator against each. Clicking on the indicator will open the same incident details popup as clicking on the "Details" button.</p>
<p><img decoding="async" loading="lazy" alt="Services with Services with indicator" src="https://blog.incidenthub.cloud/assets/images/incidents-services-on-top-a7e9675330960914e5816d0d3495a09a.png" width="1380" height="811" class="img_ev3q"></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="reliability-improvements">Reliability Improvements<a href="https://blog.incidenthub.cloud/product-update-easier-onboarding-better-user-experience-and-reliability-improvements#reliability-improvements" class="hash-link" aria-label="Direct link to Reliability Improvements" title="Direct link to Reliability Improvements" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="alert-emails-for-webhook-failures">Alert Emails for Webhook Failures<a href="https://blog.incidenthub.cloud/product-update-easier-onboarding-better-user-experience-and-reliability-improvements#alert-emails-for-webhook-failures" class="hash-link" aria-label="Direct link to Alert Emails for Webhook Failures" title="Direct link to Alert Emails for Webhook Failures" translate="no">​</a></h3>
<p>If you have a webhook channel configured, and IncidentHub is unable to deliver notifications to it due to the webhook endpoint not being available, it will send an email to the account owner. The email will include details
of the error.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="bug-fixes-and-robustness-improvements">Bug Fixes and Robustness Improvements<a href="https://blog.incidenthub.cloud/product-update-easier-onboarding-better-user-experience-and-reliability-improvements#bug-fixes-and-robustness-improvements" class="hash-link" aria-label="Direct link to Bug Fixes and Robustness Improvements" title="Direct link to Bug Fixes and Robustness Improvements" translate="no">​</a></h3>
<p>We have fixed several bugs in the service specific parsers that has made the incident detection more reliable. We have also added several improvements to make edge case handling more robust.</p>
<p><strong>Last but not the least, we have a <a href="https://incidenthub.cloud/#pricing" target="_blank" rel="noopener noreferrer" class="">new, more attractive, pricing model</a> for our monthly and annual subscriptions.</strong></p>
<p>As always, you can request for features, let us know about bugs, or give us any other feedback on any of the following channels:</p>
<ul>
<li class="">Email us at <a href="mailto:support@incidenthub.cloud" target="_blank" rel="noopener noreferrer" class="">support@incidenthub.cloud</a></li>
<li class="">Use the "Talk to us" button in the dashboard</li>
</ul>
<p><em>All logos and company names are trademarks or registered trademarks of their respective holders</em></p>]]></content:encoded>
            <category>Product Updates</category>
            <category>Monitoring</category>
        </item>
        <item>
            <title><![CDATA[Adding a Grafana Dashboard to Your Prometheus Setup]]></title>
            <link>https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup</link>
            <guid>https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup</guid>
            <pubDate>Wed, 25 Dec 2024 00:00:00 GMT</pubDate>
            <description><![CDATA[This article is a guide on how to add a Grafana instance to your Prometheus setup to view your system metrics.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="introduction">Introduction<a href="https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup#introduction" class="hash-link" aria-label="Direct link to Introduction" title="Direct link to Introduction" translate="no">​</a></h2>
<p><em>This article is part of a <a class="" href="https://blog.incidenthub.cloud/tags/prometheus">series</a> on setting up an end-to-end monitoring and alerting stack using Prometheus.</em></p>
<p>Continuing our series on setting Prometheus in a Docker container, we will add a Grafana instance to our Prometheus setup.</p>
<p>Please refer to the <a class="" href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager">previous article</a> where we use docker compose to run Prometheus and Alertmanager together
as that forms the basis to run multiple related containers. We will add a container to run Grafana to the same compose file in this article.</p>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/grafana-dashboard.png" alt="Grafana Dashboard">
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup#introduction" class="">Introduction</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup#adding-a-grafana-container-to-our-docker-compose" class="">Adding a Grafana Container to our Docker Compose</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup#adding-a-dashboard" class="">Adding a Dashboard</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup#troubleshooting" class="">Troubleshooting</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup#my-datasource-save-and-test-fails" class="">My datasource "Save and Test" fails</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup#i-cannot-see-any-data-in-my-dashboard" class="">I cannot see any data in my dashboard</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup#only-some-of-the-panels-in-my-dashboard-show-data" class="">Only some of the panels in my dashboard show data</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup#if-i-restart-my-containers-the-dashboard-is-gone" class="">If I restart my containers, the dashboard is gone</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup#conclusion" class="">Conclusion</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup#references" class="">References</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="adding-a-grafana-container-to-our-docker-compose">Adding a Grafana Container to our Docker Compose<a href="https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup#adding-a-grafana-container-to-our-docker-compose" class="hash-link" aria-label="Direct link to Adding a Grafana Container to our Docker Compose" title="Direct link to Adding a Grafana Container to our Docker Compose" translate="no">​</a></h2>
<p>First, create a volume for Grafana's data:</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker volume create grafana-data</span><br></span></code></pre></div></div>
<p>Now edit your <code>docker-compose.yml</code> in your prometheus directory and add the volume to the <code>volumes</code> section:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">volumes</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">prometheus</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">external</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">alertmanager</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">external</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">grafana-data</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">external</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><br></span></code></pre></div></div>
<p>We'll use the grafana-enterprise:11.4.0-ubuntu image to run Grafana. Under <code>services</code>, add the following:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">grafana</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">image</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> grafana/grafana</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">enterprise</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">11.4.0</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">ubuntu</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">ports</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> 3005</span><span class="token punctuation" style="color:#393A34">:</span><span class="token number" style="color:#36acaa">3000</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">volumes</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> grafana</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">data</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">/var/lib/grafana</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">restart</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> always</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">links</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> prometheus</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">prometheus</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span></code></pre></div></div>
<p>We've added a link to the prometheus container so that Grafana can find and use Prometheus as a data source.</p>
<p>Your complete <code>docker-compose.yml</code> should look like this:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">volumes</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">prometheus</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">external</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">alertmanager</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">external</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">grafana-data</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">external</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">services</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">prometheus</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">image</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> prom/prometheus</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">v3.0.0</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">volumes</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> ./config</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">/etc/prometheus</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> prometheus</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">/prometheus</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">ports</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> 9000</span><span class="token punctuation" style="color:#393A34">:</span><span class="token number" style="color:#36acaa">9090</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">restart</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> always</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">links</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> alertmanager</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">alertmanager</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">alertmanager</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">image</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> prom/alertmanager</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">v0.27.0</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">ports</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> 9001</span><span class="token punctuation" style="color:#393A34">:</span><span class="token number" style="color:#36acaa">9093</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">volumes</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> ./alertmanager</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">config/</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">/etc/alertmanager/</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> alertmanager</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">/alertmanager</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">restart</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> always</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">command</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'--config.file=/etc/alertmanager/alertmanager.yml'</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'--storage.path=/alertmanager'</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'--log.level=debug'</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">grafana</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">image</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> grafana/grafana</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">enterprise</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">11.4.0</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">ubuntu</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">ports</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> 3005</span><span class="token punctuation" style="color:#393A34">:</span><span class="token number" style="color:#36acaa">3000</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">volumes</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> grafana</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">data</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">/var/lib/grafana</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">restart</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> always</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">links</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> prometheus</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">prometheus</span><br></span></code></pre></div></div>
<p>Grafana should be accessible at <code>http://localhost:3005</code>. To login for the first time, use the default credentials:</p>
<ul>
<li class="">Username: admin</li>
<li class="">Password: admin</li>
</ul>
<p>Once logged in, go to <code>http://localhost:3005/connections/datasources/new</code> and choose "Prometheus". The only parameter we need to worry about here is the Prometheus server URL.
Set it to <code>http://prometheus:9090</code>. How does this work? We've linked the Prometheus container to the Grafana container, so the Prometheus container is accessible at <code>prometheus:9090</code>. Click on "Save and Test" to test the datasource connection.</p>
<p>Your Grafana is now ready and linked to your Prometheus instance.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="adding-a-dashboard">Adding a Dashboard<a href="https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup#adding-a-dashboard" class="hash-link" aria-label="Direct link to Adding a Dashboard" title="Direct link to Adding a Dashboard" translate="no">​</a></h2>
<p>There are hundreds of ready-made <a href="https://grafana.com/grafana/dashboards/" target="_blank" rel="noopener noreferrer" class="">dashboards available for Grafana</a>. Since we have only Prometheus's own metrics in our setup, we will use a dashboard to visualize our Prometheus metrics.
If you have added other exporters to your Prometheus you can use their dashboards, or create one from scratch.</p>
<ul>
<li class="">Download the dashboard from <a href="https://grafana.com/grafana/dashboards/3662-prometheus-2-0-overview/" target="_blank" rel="noopener noreferrer" class="">https://grafana.com/grafana/dashboards/3662-prometheus-2-0-overview/</a></li>
<li class="">Go to <code>http://localhost:3005/dashboards</code> and click on "New"-&gt;"Import"</li>
<li class="">Click on "Upload dashboard JSON file" and select the file you downloaded.</li>
<li class="">For "Choose a Prometheus data source", select the datasource you created earlier.</li>
<li class="">Click on "Import" and you should see your dashboard.</li>
</ul>
<p><img decoding="async" loading="lazy" alt="Prometheus Stats Dashboard" src="https://blog.incidenthub.cloud/assets/images/prometheus-stats-dashboard-de70cb8406faf3b5acd8c0b4062fa9a4.png" width="1700" height="900" class="img_ev3q"></p>
<p>Congratulations! You have just setup a dashboard to visualize your Prometheus metrics. From here, you can do much more:</p>
<ul>
<li class="">Setup other exporters for Prometheus and add dashboards for them.</li>
<li class="">Monitor Alertmanager itself with its own dashboard.</li>
<li class="">Grafana has a ton of other features including alerting, annotations, and integration with other monitoring tools.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="troubleshooting">Troubleshooting<a href="https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup#troubleshooting" class="hash-link" aria-label="Direct link to Troubleshooting" title="Direct link to Troubleshooting" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="my-datasource-save-and-test-fails">My datasource "Save and Test" fails<a href="https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup#my-datasource-save-and-test-fails" class="hash-link" aria-label="Direct link to My datasource &quot;Save and Test&quot; fails" title="Direct link to My datasource &quot;Save and Test&quot; fails" translate="no">​</a></h3>
<ul>
<li class="">Check that the Prometheus server URL is correct. The name of the Prometheus host should match what is set under <code>links</code> in the <code>docker-compose.yml</code> for the <code>grafana</code> service, and the port should match the Prometheus container port - default being 9090.</li>
<li class="">Check that the Prometheus container is running.</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="i-cannot-see-any-data-in-my-dashboard">I cannot see any data in my dashboard<a href="https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup#i-cannot-see-any-data-in-my-dashboard" class="hash-link" aria-label="Direct link to I cannot see any data in my dashboard" title="Direct link to I cannot see any data in my dashboard" translate="no">​</a></h3>
<ul>
<li class="">Check that you have the correct datasource set for the dashboard. One way to troubleshoot is to query the datasource directly from Grafana for a known metric.</li>
<li class="">Edit one of the panels in the dashboard and check if you can see the metric. If not, then check if your Prometheus container is collecting metrics by checking the Prometheus UI.</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="only-some-of-the-panels-in-my-dashboard-show-data">Only some of the panels in my dashboard show data<a href="https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup#only-some-of-the-panels-in-my-dashboard-show-data" class="hash-link" aria-label="Direct link to Only some of the panels in my dashboard show data" title="Direct link to Only some of the panels in my dashboard show data" translate="no">​</a></h3>
<ul>
<li class="">This is most likely because some of the panels are querying metrics that don't exist</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="if-i-restart-my-containers-the-dashboard-is-gone">If I restart my containers, the dashboard is gone<a href="https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup#if-i-restart-my-containers-the-dashboard-is-gone" class="hash-link" aria-label="Direct link to If I restart my containers, the dashboard is gone" title="Direct link to If I restart my containers, the dashboard is gone" translate="no">​</a></h3>
<ul>
<li class="">Grafana stores the dashboard configuration in the <code>grafana-data</code> volume. If you deleted the volume, or changed its name in the <code>docker-compose.yml</code>, your setting will be lost.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="conclusion">Conclusion<a href="https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>Grafana is a powerful dashboarding software that you can use to visualize your metrics from different sources, not just Prometheus.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="references">References<a href="https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup#references" class="hash-link" aria-label="Direct link to References" title="Direct link to References" translate="no">​</a></h2>
<ul>
<li class=""><a href="https://grafana.com/grafana/dashboards/" target="_blank" rel="noopener noreferrer" class="">Grafana Dashboards</a></li>
<li class=""><a href="https://hub.docker.com/u/grafana" target="_blank" rel="noopener noreferrer" class="">Official Grafana container images</a></li>
<li class=""><a href="https://docs.docker.com/reference/cli/docker/volume/" target="_blank" rel="noopener noreferrer" class="">Docker Volume commands</a></li>
<li class=""><a href="https://www.yamllint.com/" target="_blank" rel="noopener noreferrer" class="">YAML Validator</a></li>
</ul>
<p style="background-color:var(--ifm-color-emphasis-200);padding:10px;border-radius:10px;border-left:3px solid #60a5fa"></p><p>You might also like:</p><ul><li><a href="https://blog.incidenthub.cloud/A-Beginners-Guide-To-Service-Discovery-in-Prometheus">A Beginner's Guide To Service Discovery in Prometheus</a></li><li><a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager">Sending Alerts Using Prometheus and Alertmanager</a></li><li><a href="https://blog.incidenthub.cloud/deploying-prometheus-with-docker">Deploying Prometheus With Docker</a></li><li><a href="https://blog.incidenthub.cloud/how-to-configure-a-remote-data-store-for-prometheus">How to Configure a Remote Data Store for Prometheus</a></li></ul><p></p>
<p>This article was originally published on the <a href="https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup" target="_blank" rel="noopener noreferrer" class="">IncidentHub blog</a>.</p>
<p>All product names, company names, logos and trademarks are the property of their respective owners. IncidentHub is not affiliated with any of the vendors mentioned in this article.</p>]]></content:encoded>
            <category>Grafana</category>
            <category>Prometheus</category>
            <category>Monitoring</category>
            <category>Docker</category>
        </item>
        <item>
            <title><![CDATA[How To Decide Between Hosting Your Own Status Page Versus Using a Managed One]]></title>
            <link>https://blog.incidenthub.cloud/how-to-decide-between-hosting-your-own-status-page-versus-using-a-managed-one</link>
            <guid>https://blog.incidenthub.cloud/how-to-decide-between-hosting-your-own-status-page-versus-using-a-managed-one</guid>
            <pubDate>Tue, 17 Dec 2024 00:00:00 GMT</pubDate>
            <description><![CDATA[This article is a guide on which factors to consider when deciding between hosting your own status page versus using a managed one.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="introduction">Introduction<a href="https://blog.incidenthub.cloud/how-to-decide-between-hosting-your-own-status-page-versus-using-a-managed-one#introduction" class="hash-link" aria-label="Direct link to Introduction" title="Direct link to Introduction" translate="no">​</a></h2>
<p>A status page forms a key part of your incident communication strategy. When it comes to setting up a status page, you have two options:</p>
<ul>
<li class="">Host your own - using either an open source project or a custom solution.</li>
<li class="">Use a managed status page provider.</li>
</ul>
<p>We will examine the <a class="" href="https://blog.incidenthub.cloud/Best-Practices-Choosing-Status-Page-Provider">pros and cons</a> of each option along these dimensions:</p>
<ol>
<li class="">Feature Set</li>
<li class="">Service Related</li>
</ol>
<p>For 1, if you choose a self-managed, <a class="" href="https://blog.incidenthub.cloud/The-2024-Guide-to-Open-Source-Status-Page-Providers">open-source</a> or custom solution, it's in your control. For a managed solution, you are limited by the provider's feature set.</p>
<p>For 2, if you choose a self-managed solution, your team is responsible for the quality of the service. For a managed solution, you are dependent on the provider's service quality.</p>
<p>In most cases, you are better off using a managed solution from a reputed provider, unless you have:</p>
<ul>
<li class="">Specific requirements that are not met by the vendor.</li>
<li class="">Budget constraints.</li>
</ul>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/status-page-managed-vs-self.jpg" alt="Shield">
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-decide-between-hosting-your-own-status-page-versus-using-a-managed-one#introduction" class="">Introduction</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-decide-between-hosting-your-own-status-page-versus-using-a-managed-one#feature-set-considerations" class="">Feature Set Considerations</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-decide-between-hosting-your-own-status-page-versus-using-a-managed-one#integration-options" class="">Integration Options</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-decide-between-hosting-your-own-status-page-versus-using-a-managed-one#customizability" class="">Customizability</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-decide-between-hosting-your-own-status-page-versus-using-a-managed-one#reporting-and-analytics" class="">Reporting and Analytics</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-decide-between-hosting-your-own-status-page-versus-using-a-managed-one#service-related-considerations" class="">Service Related Considerations</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-decide-between-hosting-your-own-status-page-versus-using-a-managed-one#cost-and-maintenance" class="">Cost and Maintenance</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-decide-between-hosting-your-own-status-page-versus-using-a-managed-one#availability" class="">Availability</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-decide-between-hosting-your-own-status-page-versus-using-a-managed-one#support" class="">Support</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/how-to-decide-between-hosting-your-own-status-page-versus-using-a-managed-one#conclusion" class="">Conclusion</a></li>
</ul>
<p>This article is not a comparison of specific managed vs self-managed status page providers but is meant to be a set of guidelines to help you make a decision.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="feature-set-considerations">Feature Set Considerations<a href="https://blog.incidenthub.cloud/how-to-decide-between-hosting-your-own-status-page-versus-using-a-managed-one#feature-set-considerations" class="hash-link" aria-label="Direct link to Feature Set Considerations" title="Direct link to Feature Set Considerations" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="integration-options">Integration Options<a href="https://blog.incidenthub.cloud/how-to-decide-between-hosting-your-own-status-page-versus-using-a-managed-one#integration-options" class="hash-link" aria-label="Direct link to Integration Options" title="Direct link to Integration Options" translate="no">​</a></h3>
<p>A status page should be able to integrate with your existing tools and incident management workflows. The minimum integration options you should look for are:</p>
<ul>
<li class="">API access for updates. You would want the option of automatically updating the status page from your incident management tools, or manually updating it, depending on business needs.</li>
<li class="">Easy integration with your existing toolset.</li>
</ul>
<p>These options are for your SRE/Ops/IT Team so that they can keep stakeholders updated by either automatically or manually updating the status page.</p>
<p>For your end users, you would want to have options for them to subscribe to updates via email, SMS, Slack, Discord, or other channels.</p>
<img style="border-radius:10px;border:1px" src="https://blog.incidenthub.cloud/img/github-status-page-subscriptions.png" alt="GitHub Status Page Subscriptions">
<p><em>Subscription options for GitHub's status page, which uses Atlassian Statuspage as a provider.</em></p>
<p>Some managed providers offer a complete stack of tools for monitoring, alerting, and status pages. However, if you have a monitoring stack already, you might not be able to use such providers since their status pages are usually
tied to their monitoring solutions. In such cases consider using a standalone status page provider.</p>
<p>Some standalone status page options are:</p>
<ul>
<li class=""><a href="https://www.atlassian.com/software/statuspage" target="_blank" rel="noopener noreferrer" class="">Atlassian Statuspage</a></li>
<li class=""><a href="https://instatus.com/" target="_blank" rel="noopener noreferrer" class="">Instatus</a></li>
<li class=""><a href="https://statushub.com/" target="_blank" rel="noopener noreferrer" class="">StatusHub</a></li>
<li class=""><a href="https://status.io/" target="_blank" rel="noopener noreferrer" class="">Status.io</a></li>
<li class=""><a href="https://www.statuspal.io/" target="_blank" rel="noopener noreferrer" class="">StatusPal</a></li>
</ul>
<p>If you are using an open source solution, you might be able to customize it to your needs. If you have a custom, homegrown status page, you are in complete control. The downside is that you need to spend time and resources to build as well as maintain it.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="customizability">Customizability<a href="https://blog.incidenthub.cloud/how-to-decide-between-hosting-your-own-status-page-versus-using-a-managed-one#customizability" class="hash-link" aria-label="Direct link to Customizability" title="Direct link to Customizability" translate="no">​</a></h3>
<p>Your status page should reflect your branding as well as support your service's architecture.</p>
<p>Branding includes the logo, colors, other visual elements, and a whitelabeled URL. With both managed and open source solutions, you can customize the branding according to your needs.</p>
<p>A status page should also be able to represent your service components and regions, and different kinds of events. The former is especially important if you many services which are globally distributed. If you want to keep your users
updated about upcoming maintenance, you would want to be able to represent that in the status page. Most managed providers offer this feature.</p>
<p><img decoding="async" loading="lazy" alt="Google Cloud Status Page" src="https://blog.incidenthub.cloud/assets/images/google-status-page-617e2c5365c55f289f7200e6811d13d5.png" width="1400" height="900" class="img_ev3q">
<em>An example of a homegrown status page - Google Cloud Platform.</em></p>
<p>Depending on your business global presence, i18n support can also be a requirement.</p>
<p>During the beginning of an incident, it's possible your teams won't have much information to share about the outage. As they work towards mitigation, more information will become available. A status page should allow you to update accordingly, and maybe add more information to older entries for that incident.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="reporting-and-analytics">Reporting and Analytics<a href="https://blog.incidenthub.cloud/how-to-decide-between-hosting-your-own-status-page-versus-using-a-managed-one#reporting-and-analytics" class="hash-link" aria-label="Direct link to Reporting and Analytics" title="Direct link to Reporting and Analytics" translate="no">​</a></h3>
<p>Every incident is an opportunity to learn. As part of your incident post-mortems, you can improve your incident response process by examining areas of improvement.
Incident metrics can help you identify possible areas. Your status page software should be able to give you such metrics and reports.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="service-related-considerations">Service Related Considerations<a href="https://blog.incidenthub.cloud/how-to-decide-between-hosting-your-own-status-page-versus-using-a-managed-one#service-related-considerations" class="hash-link" aria-label="Direct link to Service Related Considerations" title="Direct link to Service Related Considerations" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="cost-and-maintenance">Cost and Maintenance<a href="https://blog.incidenthub.cloud/how-to-decide-between-hosting-your-own-status-page-versus-using-a-managed-one#cost-and-maintenance" class="hash-link" aria-label="Direct link to Cost and Maintenance" title="Direct link to Cost and Maintenance" translate="no">​</a></h3>
<p>For a managed status page provider that meets all your needs, your only remaining consideration remains budgetary.</p>
<p>If you opt for a self-managed solution, you need to consider these costs:</p>
<ul>
<li class="">Ongoing bug fixes and improvements. If it's in-house, you have more control over the codebase.</li>
<li class="">Hosting and availability costs.</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="availability">Availability<a href="https://blog.incidenthub.cloud/how-to-decide-between-hosting-your-own-status-page-versus-using-a-managed-one#availability" class="hash-link" aria-label="Direct link to Availability" title="Direct link to Availability" translate="no">​</a></h3>
<p>Your status page should have good availability. If you are using a managed solution, check that the provider has a good uptime record. They should also have a way of notifying you if there is an outage with their service.</p>
<p><img decoding="async" loading="lazy" alt="StatusPal Status Page" src="https://blog.incidenthub.cloud/assets/images/statuspal-status-page-394cd534422d182974e9c9cf95747111.png" width="900" height="1000" class="img_ev3q">
<em>An example of a <a href="https://meta.statuspal.io/" target="_blank" rel="noopener noreferrer" class="">status page provider's status page</a>.</em></p>
<p>If you're self-hosting, it's your team's responsibility to ensure availability. A rule of thumb here is to keep the status page hosting logically and physically separate from the rest of your production environment.
You should monitor the status page's availability and performance in the same way you monitor your production environment.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="support">Support<a href="https://blog.incidenthub.cloud/how-to-decide-between-hosting-your-own-status-page-versus-using-a-managed-one#support" class="hash-link" aria-label="Direct link to Support" title="Direct link to Support" translate="no">​</a></h3>
<p>For a managed provider, check their SLA and support options.</p>
<p>For a self-managed solution, it's a good idea to have a dedicated owner - whether it's a person or a team - who is responsible for the status page.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="conclusion">Conclusion<a href="https://blog.incidenthub.cloud/how-to-decide-between-hosting-your-own-status-page-versus-using-a-managed-one#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>Deciding between hosting your own status page and using a managed solution ultimately depends on your organization's specific needs, resources, and priorities. A self-hosted status page offers greater control and customization, making it suitable for teams with unique requirements. However, it can also demands significant time and resources for development and maintenance.</p>
<p>On the other hand, a managed status page provider can outsource the headache of maintenance and availability. With the right provider, you can ensure high availability, robust support, and seamless integration with your existing tools.</p>
<p><em>IncidentHub is not affiliated with any of the vendors mentioned in this article.</em></p>
<p>Image credits: Photo by Image by <a href="https://pixabay.com/users/joshuaworoniecki-12734309/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=5017973">Joshua Woroniecki</a> from <a href="https://pixabay.com//?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=5017973">Pixabay</a></p>]]></content:encoded>
            <category>Status Pages</category>
            <category>Monitoring</category>
        </item>
        <item>
            <title><![CDATA[Monitoring Security Vulnerabilities in Your Cloud Vendors]]></title>
            <link>https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors</link>
            <guid>https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors</guid>
            <pubDate>Thu, 12 Dec 2024 00:00:00 GMT</pubDate>
            <description><![CDATA[This article is a guide on how to keep track of security vulnerabilities in your Cloud vendors to ensure that your apps are not exposed to risk.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="introduction">Introduction<a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#introduction" class="hash-link" aria-label="Direct link to Introduction" title="Direct link to Introduction" translate="no">​</a></h2>
<p>If you manage applications running on cloud platforms, you likely depend on multiple cloud vendors and services. These could be infrastructure providers like AWS, GCP or Azure. A vulnerability in any of these services could potentially impact your applications and your users.</p>
<p>A cloud platform has many moving parts, many of which are dependent on other third-party providers. For example:</p>
<ul>
<li class="">Operating system images for VMs which are maintained by third-party vendors.</li>
<li class="">Container images which are hosted on external repositories.</li>
<li class="">Software stacks which are maintained by other vendors but available for deployment on the cloud provider.</li>
<li class="">Libraries used by the cloud provider's internal software which are maintained by other developers or organizations.</li>
<li class="">Control plane software like Kubernetes.</li>
<li class="">Hardware, like processors, which are provided by the manufacturer.</li>
<li class="">Hypervisors which are developed and maintained by third-party vendors.</li>
<li class="">Networking hardware manufactured by other vendors.</li>
</ul>
<img style="border-radius:10px;border:1px" src="https://cdn.incidenthub.cloud/blog/security-vulnerabilities-cloud-saas.jpg" alt="Shield">
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#introduction" class="">Introduction</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#why-monitoring-cloud-vendors-security-is-important" class="">Why Monitoring Cloud Vendors' Security is Important</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#security-risks-in-cloud-services" class="">Security Risks in Cloud Services</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#real-world-examples" class="">Real-World Examples</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#how-to-stay-updated-about-cloud-vendor-security-issues" class="">How To Stay Updated About Cloud Vendor Security Issues</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#links-to-some-cloud-vendor-security-advisories" class="">Links to Some Cloud Vendor Security Advisories</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#tools-and-services-to-monitor-cloud-vendor-security" class="">Tools and Services to Monitor Cloud Vendor Security</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#rss-feeds" class="">RSS Feeds</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#email-alerts" class="">Email Alerts</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#cloud-dashboard-alerts" class="">Cloud Dashboard Alerts</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#best-practices" class="">Best Practices</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#conclusion" class="">Conclusion</a></li>
</ul>
<p>This guide is particularly relevant for:</p>
<ul>
<li class="">Ops/SRE folks managing cloud applications on cloud platforms.</li>
<li class="">Security teams responsible for application security.</li>
<li class="">Engineering managers and VPs overseeing cloud operations.</li>
</ul>
<p>It is important to distinguish between the security issues affecting the cloud vendor's services and the issues affecting the applications running on the cloud. The cloud vendor is responsible for the security of its infrastructure, including:</p>
<ul>
<li class="">Mitigating vulnerabilities in its control, including datacenter, network, hardware and control plane software.</li>
<li class="">Making patches available for third-party hardware and software that are part of its platform once they are provided by the third-parties. E.g hypervisors, processors, VM images.</li>
</ul>
<p>The cloud vendor is not responsible for the security of the application code that users run on it. As a cloud user, you are responsible for that.</p>
<p>However, your responsibility as a cloud user also includes:</p>
<ul>
<li class="">Staying abreast of security vulnerabilities in the cloud vendor's infrastructure.</li>
<li class="">Updating your applications to use the latest versions of the libraries and frameworks that you depend on.</li>
<li class="">Updating to the latest and secure versions of the software stacks that you use but are provided by the cloud vendor. E.g. VM images, Kubernetes versions, etc.</li>
</ul>
<p>We will cover the how to stay on top of security vulnerabilities in the cloud vendor's infrastructure in this article.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-monitoring-cloud-vendors-security-is-important">Why Monitoring Cloud Vendors' Security is Important<a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#why-monitoring-cloud-vendors-security-is-important" class="hash-link" aria-label="Direct link to Why Monitoring Cloud Vendors' Security is Important" title="Direct link to Why Monitoring Cloud Vendors' Security is Important" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="security-risks-in-cloud-services">Security Risks in Cloud Services<a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#security-risks-in-cloud-services" class="hash-link" aria-label="Direct link to Security Risks in Cloud Services" title="Direct link to Security Risks in Cloud Services" translate="no">​</a></h3>
<p>Cloud vendors face security challenges that could impact your applications:</p>
<ol>
<li class="">
<p><strong>Infrastructure Vulnerabilities</strong></p>
<ul>
<li class="">Zero-day exploits in <a href="https://www.securityweek.com/microsoft-warns-of-windows-hyper-v-zero-day-being-exploited/" target="_blank" rel="noopener noreferrer" class="">hypervisors</a>.</li>
<li class="">Vulnerabilities in <a href="https://security.snyk.io/vuln/SNYK-SLES150-CONTAINERD-3168412" target="_blank" rel="noopener noreferrer" class="">container runtimes</a>.</li>
<li class="">Network protocol <a href="https://blog.cloudflare.com/cloudflare-1111-incident-on-june-27-2024/" target="_blank" rel="noopener noreferrer" class="">weaknesses</a>.</li>
<li class="">Hardware-level security flaws.</li>
</ul>
</li>
<li class="">
<p><strong>Software Supply Chain Issues</strong></p>
<ul>
<li class="">Compromised dependencies being used by the cloud vendor's software.</li>
<li class="">Outdated and vulnerable third-party libraries.</li>
<li class="">Malicious container images.</li>
</ul>
</li>
<li class="">
<p><strong>Configuration Risks</strong></p>
<ul>
<li class="">Default settings that may be insecure.</li>
<li class="">Misconfigurations in service integrations, especially when services are developed by different vendors.</li>
<li class="">Access control weaknesses.</li>
<li class="">Gaps in API security.</li>
<li class="">Expired certificates.</li>
</ul>
</li>
<li class="">
<p><strong>Compliance and Regulatory Risks</strong></p>
<ul>
<li class="">Data privacy violations.</li>
<li class="">Regulatory non-compliance.</li>
<li class="">Geographic data residency issues.</li>
<li class="">Certification lapses.</li>
</ul>
</li>
</ol>
<p>Although Compliance and Regulatory issues are an important part of security, we will not cover them in this article because they are not directly related to the building blocks of the cloud vendor's infrastructure.</p>
<p>The impact of these vulnerabilities can be severe:</p>
<ul>
<li class="">Data breaches exposing sensitive customer and company information.</li>
<li class="">Service disruptions affecting SLAs.</li>
<li class="">Financial losses from security incidents.</li>
<li class="">Damage to company reputation.</li>
<li class="">Legal and regulatory consequences.</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="real-world-examples">Real-World Examples<a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#real-world-examples" class="hash-link" aria-label="Direct link to Real-World Examples" title="Direct link to Real-World Examples" translate="no">​</a></h3>
<p>Some recent incidents highlight the importance of monitoring cloud vendor security:</p>
<ul>
<li class=""><a href="https://en.wikipedia.org/wiki/Meltdown_(security_vulnerability)" target="_blank" rel="noopener noreferrer" class="">Meltdown CPU vulnerability (2018)</a>.</li>
<li class=""><a href="https://en.wikipedia.org/wiki/Spectre_(security_vulnerability)" target="_blank" rel="noopener noreferrer" class="">Spectre CPU vulnerability (2017)</a>.</li>
<li class=""><a href="https://www.fortinet.com/resources/cyberglossary/solarwinds-cyber-attack" target="_blank" rel="noopener noreferrer" class="">Solarwinds supply chain attack (2019-2020)</a>.</li>
<li class=""><a href="https://www.paloaltonetworks.com/blog/prisma-cloud/protect-against-critical-azure-cosmos-db-vulnerability/" target="_blank" rel="noopener noreferrer" class="">Azure Cosmos DB vulnerability (2021)</a>.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="how-to-stay-updated-about-cloud-vendor-security-issues">How To Stay Updated About Cloud Vendor Security Issues<a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#how-to-stay-updated-about-cloud-vendor-security-issues" class="hash-link" aria-label="Direct link to How To Stay Updated About Cloud Vendor Security Issues" title="Direct link to How To Stay Updated About Cloud Vendor Security Issues" translate="no">​</a></h2>
<p>Cloud vendors publish security advisories on their websites. These advisories are published in the form of blog posts, emails, and sometimes in the form of security bulletins. You can subscribe to these advisories to stay updated.
You might also want to monitor vulnerabilities in the software stacks that you use. E.g. Kubernetes, Docker, etc. Even if you use managed versions of these, you can get to know about security issues before your cloud vendor
reports it and releases an updated version of the stack.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="links-to-some-cloud-vendor-security-advisories">Links to Some Cloud Vendor Security Advisories<a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#links-to-some-cloud-vendor-security-advisories" class="hash-link" aria-label="Direct link to Links to Some Cloud Vendor Security Advisories" title="Direct link to Links to Some Cloud Vendor Security Advisories" translate="no">​</a></h3>
<ul>
<li class=""><a href="https://aws.amazon.com/security/security-bulletins/" target="_blank" rel="noopener noreferrer" class="">AWS Security Bulletins</a></li>
<li class=""><a href="https://learn.microsoft.com/en-us/security-updates/" target="_blank" rel="noopener noreferrer" class="">Azure Security Advisories</a></li>
<li class=""><a href="https://cloud.google.com/support/bulletins" target="_blank" rel="noopener noreferrer" class="">Google Cloud Security Bulletins</a></li>
<li class=""><a href="https://www.oracle.com/security-alerts/" target="_blank" rel="noopener noreferrer" class="">Oracle Cloud Security Advisories</a></li>
<li class=""><a href="https://cloud.ibm.com/status/security" target="_blank" rel="noopener noreferrer" class="">IBM Cloud Security Advisories</a></li>
<li class=""><a href="https://access.redhat.com/security/vulnerabilities" target="_blank" rel="noopener noreferrer" class="">Red Hat Cloud Security Advisories</a></li>
<li class=""><a href="https://sec.cloudapps.cisco.com/security/center/publicationListing.x" target="_blank" rel="noopener noreferrer" class="">Cisco Security Advisories</a></li>
<li class=""><a href="https://www.broadcom.com/support/vmware-security-advisories" target="_blank" rel="noopener noreferrer" class="">Broadcom VMware Cloud Security Advisories</a></li>
<li class=""><a href="https://www.dell.com/support/security/en-us" target="_blank" rel="noopener noreferrer" class="">Dell Security Advisories</a></li>
<li class=""><a href="https://www.tencentcloud.com/document/product/627/38433" target="_blank" rel="noopener noreferrer" class="">TencentCloud Security Advisories</a></li>
<li class=""><a href="https://www.fastly.com/security-advisories" target="_blank" rel="noopener noreferrer" class="">Fastly Security Advisories</a></li>
</ul>
<p>These links are valid as of this writing (December 2024).</p>
<p>Based on the publicly available data in the above links, we can see trends in different cloud vendors' security advisories.</p>
<img style="border-radius:10px;border:1px solid black" src="https://blog.incidenthub.cloud/img/gcp_bulletins_by_year.png" alt="GCP Bulletins by Year">
<p>Source: Generated from the <a href="https://cloud.google.com/support/bulletins" target="_blank" rel="noopener noreferrer" class="">Google Cloud Security Bulletins page data</a>.</p>
<img style="border-radius:10px;border:1px solid black" src="https://blog.incidenthub.cloud/img/aws_security_bulletins.png" alt="AWS Security Bulletins">
<p>Source: Generated from the <a href="https://aws.amazon.com/security/security-bulletins/" target="_blank" rel="noopener noreferrer" class="">AWS Security Bulletins page data</a>.</p>
<img style="border-radius:10px;border:1px solid black" src="https://blog.incidenthub.cloud/img/k8s_bulletins_by_year.png" alt="Kubernetes Bulletins by Year">
<p>Source: Generated from the <a href="https://kubernetes.io/docs/reference/issues-security/official-cve-feed/" target="_blank" rel="noopener noreferrer" class="">Kubernetes Security Bulletins page data</a>.</p>
<p>Note that there are multiple factors that affect the advisory count, and a rising trend does not necessarily mean that the cloud vendor is not doing a good job.</p>
<ul>
<li class="">The number of vulnerabilities in dependent software.</li>
<li class="">The number of vulnerabilities in dependent hardware.</li>
<li class="">The number of vulnerabilities actually discovered.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="tools-and-services-to-monitor-cloud-vendor-security">Tools and Services to Monitor Cloud Vendor Security<a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#tools-and-services-to-monitor-cloud-vendor-security" class="hash-link" aria-label="Direct link to Tools and Services to Monitor Cloud Vendor Security" title="Direct link to Tools and Services to Monitor Cloud Vendor Security" translate="no">​</a></h2>
<p>When it comes to staying updated, there are different approaches. Choose the one that works for you.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="rss-feeds">RSS Feeds<a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#rss-feeds" class="hash-link" aria-label="Direct link to RSS Feeds" title="Direct link to RSS Feeds" translate="no">​</a></h3>
<p>If the bulletin page has the option of subscribing to RSS feeds, you can use that to get updates. You can tie in the RSS feed to your email client or Slack channel.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="email-alerts">Email Alerts<a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#email-alerts" class="hash-link" aria-label="Direct link to Email Alerts" title="Direct link to Email Alerts" translate="no">​</a></h3>
<p>If the bulletin page has the option of subscribing to email alerts, you can use that to get updates.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="cloud-dashboard-alerts">Cloud Dashboard Alerts<a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#cloud-dashboard-alerts" class="hash-link" aria-label="Direct link to Cloud Dashboard Alerts" title="Direct link to Cloud Dashboard Alerts" translate="no">​</a></h3>
<p>If your cloud vendor has the option of sending you security notifications in your dashboard, you can use that. However, this has the drawback that you won't see the alert unless you are signed in and monitoring your dashboard.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="best-practices">Best Practices<a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#best-practices" class="hash-link" aria-label="Direct link to Best Practices" title="Direct link to Best Practices" translate="no">​</a></h2>
<ul>
<li class="">Do periodic reviews of which security advisories you are monitoring. Your team(s) might have started using a new cloud vendor or dropped an existing one.</li>
<li class="">If mitigation requires a migration - e.g. to new VMs, or a new version, track it like any other development task. This is especially important if you have many applications or a distributed architecture.</li>
<li class="">When a security advisory is published, identify potential attack vectors in your applications and if possible test for them. You might have to take alternative measures until the vendor releases a patch.</li>
<li class="">Maintain communication with stakeholders throughout the process.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="conclusion">Conclusion<a href="https://blog.incidenthub.cloud/monitoring-security-vulnerabilities-in-cloud-vendors#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>Keeping track of your cloud vendors' security updates and vulnerabilities is important to your business. It helps you ensure that your applications are not exposed to risk.</p>
<p>Image credits: Photo by <a href="https://unsplash.com/@pawel_czerwinski?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Pawel Czerwinski</a> on <a href="https://unsplash.com/photos/brown-metal-shield-wall-decor-RovCBKMfK_k?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Unsplash</a></p>]]></content:encoded>
            <category>Security</category>
            <category>Monitoring</category>
        </item>
        <item>
            <title><![CDATA[Sending Alerts Using Prometheus and Alertmanager]]></title>
            <link>https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager</link>
            <guid>https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager</guid>
            <pubDate>Tue, 03 Dec 2024 00:00:00 GMT</pubDate>
            <description><![CDATA[This article is a guide to configuring a Prometheus and Alertmanager deployment on Docker to send alerts to Slack.]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="introduction">Introduction<a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager#introduction" class="hash-link" aria-label="Direct link to Introduction" title="Direct link to Introduction" translate="no">​</a></h2>
<p><em>This article is part of a <a class="" href="https://blog.incidenthub.cloud/tags/prometheus">series</a> on setting up an end-to-end monitoring and alerting stack using Prometheus.</em></p>
<p>Continuing our series on setting up Prometheus in a container, this article provides a step-by-step guide for how to configure alerts in Prometheus.
We will add alerting rules and deploy Prometheus Alertmanager with Slack integration.</p>
<p>If you follow the steps in this article, you will end up with a containerized setup for:</p>
<ol>
<li class="">A Prometheus instance with alerting rules.</li>
<li class="">An Alertmanager instance which can send alerts originating from those rules to a Slack channel.</li>
</ol>
<p>Let's get started.</p>
<p><img decoding="async" loading="lazy" src="https://cdn.incidenthub.cloud/blog/prometheus-alerts.png" alt="Prometheus alerts" class="img_ev3q"></p>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager#introduction" class="">Introduction</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager#using-docker-compose" class="">Using Docker Compose</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager#moving-the-prometheus-container-to-docker-compose" class="">Moving the Prometheus Container to Docker Compose</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager#adding-alerting-rules-to-the-prometheus-container" class="">Adding Alerting Rules to the Prometheus container</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager#setting-up-prometheus-alertmanager" class="">Setting up Prometheus Alertmanager</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager#setting-up-the-alertmanager-container" class="">Setting up the Alertmanager Container</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager#link-alertmanager-to-prometheus" class="">Link Alertmanager to Prometheus</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager#troubleshooting" class="">Troubleshooting</a>
<ul>
<li class=""><a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager#useful-tip" class="">Useful Tip</a></li>
</ul>
</li>
<li class=""><a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager#conclusion" class="">Conclusion</a></li>
<li class=""><a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager#references" class="">References</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="using-docker-compose">Using Docker Compose<a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager#using-docker-compose" class="hash-link" aria-label="Direct link to Using Docker Compose" title="Direct link to Using Docker Compose" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="moving-the-prometheus-container-to-docker-compose">Moving the Prometheus Container to Docker Compose<a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager#moving-the-prometheus-container-to-docker-compose" class="hash-link" aria-label="Direct link to Moving the Prometheus Container to Docker Compose" title="Direct link to Moving the Prometheus Container to Docker Compose" translate="no">​</a></h3>
<p>In a <a class="" href="https://blog.incidenthub.cloud/deploying-prometheus-with-docker">previous article</a>, we looked at how to setup Prometheus in a Docker container. We will now deploy Prometheus Alertmanager in
a different container. To manage multiple containers with a single command, we will move our entire deployment to Docker compose.</p>
<p>In the prometheus directory (refer to the previous article), create a file called <code>docker-compose.yml</code> and add the following content to it:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">volumes</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">prometheus</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">external</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">services</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">prometheus</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">image</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> prom/prometheus</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">v3.0.0</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">volumes</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> ./config</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">/etc/prometheus</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> prometheus</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">/prometheus</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">ports</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> 9000</span><span class="token punctuation" style="color:#393A34">:</span><span class="token number" style="color:#36acaa">9090</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">restart</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> always</span><br></span></code></pre></div></div>
<p>This will keep the previous configuration file as well as the docker volume. To run this we need to have Docker compose installed. We won't go into the details of
installing Docker compose as it's beyond the scope of this article. Note that we used <code>external: true</code> so that we can reuse the previous volume that we created.</p>
<p>Bring up the container by executing</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker compose up --detach</span><br></span></code></pre></div></div>
<p>Access the Prometheus UI by visiting <code>http://localhost:9090</code> as before. The <code>--detach</code> flag will run the containers in the background. To stop the container(s), execute</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker compose down</span><br></span></code></pre></div></div>
<p>You can tail the logs of the running containers by using</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker compose logs --follow</span><br></span></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="adding-alerting-rules-to-the-prometheus-container">Adding Alerting Rules to the Prometheus container<a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager#adding-alerting-rules-to-the-prometheus-container" class="hash-link" aria-label="Direct link to Adding Alerting Rules to the Prometheus container" title="Direct link to Adding Alerting Rules to the Prometheus container" translate="no">​</a></h2>
<p>Prometheus can process alert rules written in PromQL. Let's create a basic rule file that will alert if the cpu usage exceeds 20% and
80% for 1 minute, with different severity levels. In the config directory, create a file called <code>alert_rules.yml</code> with the
following content:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">groups</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> default</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">labels</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">tier</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> infra</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">rules</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">alert</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> ModeratelyHighCPU</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">expr</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> rate(process_cpu_seconds_total</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">1m</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain">) </span><span class="token punctuation" style="color:#393A34">&gt;</span><span class="token plain"> 0.2</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">for</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 1m</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">keep_firing_for</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 1m</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">labels</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">severity</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> warning</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">annotations</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">summary</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Moderately High CPU</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">alert</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> VeryHighCPU</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">expr</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> rate(process_cpu_seconds_total</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">1m</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain">) </span><span class="token punctuation" style="color:#393A34">&gt;</span><span class="token plain"> 0.8</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">for</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 1m</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">keep_firing_for</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 1m</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">labels</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">severity</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> critical</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">annotations</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">summary</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Very High CPU</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span></code></pre></div></div>
<p>This defines two alerts on a metric <code>process_cpu_seconds_total</code> that fire based on whether the total CPU usage exceeds 20% or 80% for 1 minute.</p>
<p>The thresholds are just for illustration.</p>
<p>Note that we have assigned labels with different values for the <code>severity</code> key to the alerts. We will see how they are used to route alerts later in the Alertmanager configuration.</p>
<p>Now add this rule file's name to the <code>prometheus.yml</code> file so that the latter looks like this:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">global</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">scrape_interval</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 15s</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">evaluation_interval</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 15s</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">alerting</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">alertmanagers</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">static_configs</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">targets</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">rule_files</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"alert_rules.yml"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">scrape_configs</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">job_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"prometheus"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">static_configs</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">targets</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"localhost:9090"</span><span class="token punctuation" style="color:#393A34">]</span><br></span></code></pre></div></div>
<p>The path to the rules file is relative to the location of the prometheus.yml. You can also give an absolute path if it's somewhere else. Now restart the container and you should be able to see the rule
in your UI at <code>http://localhost:9000/rules</code></p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker compose down</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">docker compose up --detach</span><br></span></code></pre></div></div>
<p>This alert won't fire until it exceeds the threshold, and that will depend on how busy your machine is. So if you want to make sure it works you can change the threshold to a very low number like 0.0002 and test it. You can see the alert in <code>http://localhost:9000/alerts</code>.</p>
<p>Now that we have a rule engine that can generate alerts, we need a way to send these alerts to an endpoint like Slack or Email or PagerDuty to receive notifications. To do this,
we will use Prometheus Alertmanager.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="setting-up-prometheus-alertmanager">Setting up Prometheus Alertmanager<a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager#setting-up-prometheus-alertmanager" class="hash-link" aria-label="Direct link to Setting up Prometheus Alertmanager" title="Direct link to Setting up Prometheus Alertmanager" translate="no">​</a></h2>
<p>Alertmanager is a Prometheus project that acts as a gateway between your Prometheus rule engine and external tools that can process alerts. You can configure routing rules as well as silence alerts in Alertmanager.
At the end of this section you will have your Prometheus container sending alerts to your Alertmanager container.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="setting-up-the-alertmanager-container">Setting up the Alertmanager Container<a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager#setting-up-the-alertmanager-container" class="hash-link" aria-label="Direct link to Setting up the Alertmanager Container" title="Direct link to Setting up the Alertmanager Container" translate="no">​</a></h3>
<p>First, to keep things consistent, let us create a volume for alertmanager's data:</p>
<div class="language-sh codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-sh codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker volume create alertmanager</span><br></span></code></pre></div></div>
<p>Alertmanager will use this volume to store alert silences among other things. So if you silence an alert, and your Alertmanager container restarts, the silence will still be active.</p>
<p>Next, create a directory called <code>alertmanager-config</code> at the same level as <code>prometheus</code>. This will be our Alertmanager configuration directory. Inside it, create a basic alertmanager config in a file <code>alertmanager.yml</code>:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">global</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># The directory from which notification templates are read.</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># We won't be using this now</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">templates</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'/etc/alertmanager/template/*.tmpl'</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">route</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">group_by</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">'service'</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">group_wait</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 30s</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">group_interval</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 5m</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">repeat_interval</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 3h</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">receiver</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> team</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">slack</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">inhibit_rules</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">source_matchers</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">severity="critical"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">target_matchers</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">severity="warning"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">receivers</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'team-slack'</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">slack_configs</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">api_url</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'&lt;webhook_url&gt;'</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">channel</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'#&lt;channel_name&gt;'</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">send_resolved</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><br></span></code></pre></div></div>
<p>Replace the <code>channel_name</code> with the Slack channel where you want to receive alerts, and the <code>webhook_url</code> with a Slack webhook URL. Be careful <em>not</em> to commit this file with the webhook URL in it.</p>
<p>This configuration just routes all alerts into a Slack channel irrespective of their severity or other labels. In a later article we will see how to route alerts to different teams and channels based on labels.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="link-alertmanager-to-prometheus">Link Alertmanager to Prometheus<a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager#link-alertmanager-to-prometheus" class="hash-link" aria-label="Direct link to Link Alertmanager to Prometheus" title="Direct link to Link Alertmanager to Prometheus" translate="no">​</a></h3>
<p>Now add the Alertmanager configuration to the docker-compose.yml so that it becomes part of our Prometheus setup:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">volumes</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">prometheus</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">external</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">alertmanager</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">external</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">services</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">prometheus</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">image</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> prom/prometheus</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">v3.0.0</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">volumes</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> ./config</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">/etc/prometheus</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> prometheus</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">/prometheus</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">ports</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> 9000</span><span class="token punctuation" style="color:#393A34">:</span><span class="token number" style="color:#36acaa">9090</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">restart</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> always</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">links</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> alertmanager</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">alertmanager</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">alertmanager</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">image</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> prom/alertmanager</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">v0.27.0</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">ports</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> 9001</span><span class="token punctuation" style="color:#393A34">:</span><span class="token number" style="color:#36acaa">9093</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">volumes</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> ./alertmanager</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">config/</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">/etc/alertmanager/</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> alertmanager</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">/alertmanager</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">restart</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> always</span><br></span></code></pre></div></div>
<p>We've added a link to the prometheus container so that it can find the alertmanager container.</p>
<p>Now there's one more piece to the puzzle. Prometheus should be configured so that it knows the Alertmanager endpoint to send to. Think of the <code>link</code> in the <code>docker-compose.yml</code> as an internal DNS name that is reachable by the Prometheus container. Modify the <code>prometheus.yml</code> file to add this under the <code>alertmanagers</code> key</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">global</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">scrape_interval</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 15s</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">evaluation_interval</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 15s</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">alerting</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">alertmanagers</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">static_configs</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">targets</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> alertmanager</span><span class="token punctuation" style="color:#393A34">:</span><span class="token number" style="color:#36acaa">9093</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">rule_files</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"alert_rules.yml"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">scrape_configs</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">job_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"prometheus"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">static_configs</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">targets</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"localhost:9090"</span><span class="token punctuation" style="color:#393A34">]</span><br></span></code></pre></div></div>
<p>We are referring to Alertmanager using the internal name and its default port. If you change the link name in <code>docker-compose.yaml</code> you have to change it here too.</p>
<p>Once you restart your containers, you should be able to access the Alertmanager UI at <code>http://localhost:9001</code>.</p>
<p>To test this end-to-end, modify your <code>alert_rules.yml</code> to lower the threshold. Within 1 minute (the value of "<code>for</code>" in the <code>alert_rules.yml</code>) you should see the alert message in your Slack channel.</p>
<p>Congratulations! You have just setup a minimal end-to-end alerting pipeline with Prometheus and Alertmanager. From here, we can do many more things:</p>
<ul>
<li class="">Setup advanced routing rules for Alertmanager, including routing to different teams on different channels, routing a subset of one team's alerts to another, and routing alerts of a certain severity to a different endpoint like PagerDuty.</li>
<li class="">Add basic authentication and Google SSO to our Prometheus UI.</li>
<li class="">Configure our own custom email templates for Alertmanager emails.</li>
<li class="">Store secrets securely.</li>
</ul>
<p>Stay tuned for updates on these topics.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="troubleshooting">Troubleshooting<a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager#troubleshooting" class="hash-link" aria-label="Direct link to Troubleshooting" title="Direct link to Troubleshooting" translate="no">​</a></h2>
<p>If you don't see any alerts in your Slack channel, follow this step-by-step guide to troubleshoot:</p>
<ul>
<li class="">Check all the config files including the <code>docker-compose.yml</code> and verify that they match what we have been walking through. You might want to run a YAML linter on all your files.</li>
<li class="">Check the container logs for any errors. Run <code>docker compose logs --follow</code></li>
<li class="">Are all your containers running? Run <code>docker ps</code> to check.</li>
<li class="">Is the alert firing in Prometheus? Check the Prometheus UI at <code>http://localhost:9000/alerts</code>.</li>
<li class="">If yes, is the alert reaching Alertmanager? Check the Alertmanager UI at <code>http://localhost:9001/#/alerts</code>.</li>
<li class="">If yes, is your Slack webhook URL correct? Is the channel correct? Note that the channel name should have a leading '#'.</li>
</ul>
<p>If all of these things are correct, turn on debug for both Prometheus and Alertmanager in <code>docker-compose.yml</code>:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">volumes</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">prometheus</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">external</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">alertmanager</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">external</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">services</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">prometheus</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">image</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> prom/prometheus</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">v3.0.0</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">volumes</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> ./config</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">/etc/prometheus</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> prometheus</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">/prometheus</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">ports</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> 9000</span><span class="token punctuation" style="color:#393A34">:</span><span class="token number" style="color:#36acaa">9090</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">restart</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> always</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">links</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> alertmanager</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">alertmanager</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">command</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'--config.file=/etc/prometheus/prometheus.yml'</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'--storage.path=/prometheus'</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'--log.level=debug'</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">alertmanager</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">image</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> prom/alertmanager</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">v0.27.0</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">ports</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> 9001</span><span class="token punctuation" style="color:#393A34">:</span><span class="token number" style="color:#36acaa">9093</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">volumes</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> ./alertmanager</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">config/</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">/etc/alertmanager/</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> alertmanager</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">/alertmanager</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">restart</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> always</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">command</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'--config.file=/etc/alertmanager/alertmanager.yml'</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'--storage.path=/alertmanager'</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'--log.level=debug'</span><br></span></code></pre></div></div>
<p>This should result in more verbose output which should help you debug. If you are still stuck, reach out to me on <a href="https://x.com/talonx" target="_blank" rel="noopener noreferrer" class="">Twitter</a> and I'll try my best to help.</p>
<div style="border:1px solid blue;border-radius:10px;margin:10px;padding:10px"><h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="useful-tip">Useful Tip<a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager#useful-tip" class="hash-link" aria-label="Direct link to Useful Tip" title="Direct link to Useful Tip" translate="no">​</a></h4><p>If you enable debug, don't add just the <code>log.level</code> settings. You have to override the other two options also. This is because the <a href="https://github.com/prometheus/prometheus/blob/main/Dockerfile" target="_blank" rel="noopener noreferrer" class="">Prometheus image</a> ships with the storage and config directories set to <code>/prometheus</code> and <code>/etc/alertmanager/</code> respectively. We mount these directories in our compose file. If you override the command option with only the log level, the container will fallback on the default values for these, <a href="https://prometheus.io/docs/prometheus/3.0/command-line/prometheus/" target="_blank" rel="noopener noreferrer" class="">which are different</a>. The same thing applies to Alertmanager.</p></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="conclusion">Conclusion<a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion" translate="no">​</a></h2>
<p>Prometheus and Alertmanager together form a powerful monitoring and alerting stack. It's easy to setup for basic cases, but can require significant work for more sophisticated
use-cases. This article is a guide to a basic setup of these tools.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="references">References<a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager#references" class="hash-link" aria-label="Direct link to References" title="Direct link to References" translate="no">​</a></h2>
<ul>
<li class=""><a href="https://prometheus.io/docs/prometheus/3.0/getting_started/" target="_blank" rel="noopener noreferrer" class="">Prometheus 3.0.0 documentation</a></li>
<li class=""><a href="https://prometheus.io/docs/alerting/0.27/configuration/" target="_blank" rel="noopener noreferrer" class="">Alertmaanger 0.27.0 documentation</a></li>
<li class=""><a href="https://docs.docker.com/reference/cli/docker/container/" target="_blank" rel="noopener noreferrer" class="">Docker Container Management commands</a></li>
<li class=""><a href="https://docs.docker.com/compose/install/" target="_blank" rel="noopener noreferrer" class="">Docker Compose Installation</a></li>
<li class=""><a href="https://docs.docker.com/reference/cli/docker/volume/" target="_blank" rel="noopener noreferrer" class="">Docker Volume commands</a></li>
<li class=""><a href="https://www.yamllint.com/" target="_blank" rel="noopener noreferrer" class="">YAML Validator</a></li>
</ul>
<p style="background-color:var(--ifm-color-emphasis-200);padding:10px;border-radius:10px;border-left:3px solid #60a5fa"></p><p>You might also like:</p><ul><li><a href="https://blog.incidenthub.cloud/A-Beginners-Guide-To-Service-Discovery-in-Prometheus">A Beginner's Guide To Service Discovery in Prometheus</a></li><li><a href="https://blog.incidenthub.cloud/adding-a-grafana-dashboard-to-your-prometheus-setup">Adding a Grafana Dashboard to Your Prometheus Setup</a></li><li><a href="https://blog.incidenthub.cloud/deploying-prometheus-with-docker">Deploying Prometheus With Docker</a></li><li><a href="https://blog.incidenthub.cloud/how-to-configure-a-remote-data-store-for-prometheus">How to Configure a Remote Data Store for Prometheus</a></li></ul><p></p>
<p>This article was originally published on the <a href="https://blog.incidenthub.cloud/sending-alerts-using-prometheus-and-alertmanager" target="_blank" rel="noopener noreferrer" class="">IncidentHub blog</a>.</p>
<p>All product names, company names, logos and trademarks are the property of their respective owners.</p>
<p>Photo credits: <a href="https://pixabay.com/users/elisariva-1348268/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=1911678">Elisa</a> from <a href="https://pixabay.com//?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=1911678">Pixabay</a>.</p>]]></content:encoded>
            <category>Prometheus</category>
            <category>Monitoring</category>
            <category>Docker</category>
            <category>Alerting</category>
        </item>
    </channel>
</rss>