Skip to main content

2 posts tagged with "status-pages"

IncidentHub posts related to status-pages

View All Tags

How To Monitor Public Status Pages of Cloud Providers - a Step-by-Step Approach

· 8 min read
Hrishikesh Barua
Founder @IncidentHub.cloud

Introduction

Incident updates on the public status pages of your cloud providers are often the first indication that they might have an outage. Providers also post updates about upcoming and ongoing maintenance on their status pages. Thus, monitoring your cloud status pages becomes crucial to your business operations. This article will guide you through the process of effectively monitoring such status pages.

  1. Identify Your Cloud Providers
  2. Locate Their Public Status Pages
  3. Understand the Status Page Structure
  4. Configure Notifications
  5. Best Practices
  6. Include in Your Incident Response Plan
  7. Use a Monitoring Tool
  8. Conclusion
  9. FAQ

Identify Your Cloud Providers

Work with your Dev/Ops/SRE and IT teams to come up with a comprehensive list of your cloud providers. Any service that is not managed by your teams is by definition a cloud service. Although we focus on Cloud providers - i.e. providers that let you deploy your services on their infrastructure - this article is equally applicable to any of your external SaaS vendors.

Locate Their Public Status Pages

Every cloud provider has a public status page. You can find the link either on their company website, or by doing a web search. The status page software is either managed by your cloud provider, or outsourced to another service like Atlassian Statuspage or Instatus. Many observability and incident management providers like Incident.io and BetterStack also offer public status pages.

Understand the Status Page Structure

There is no official standard for status page formats but most of them use a similar visual layout. The common terms used to describe incident states are "Major/Minor outage", "Maintenance", "Informational", "Monitoring", and "Resolved".

Status pages will have any ongoing incidents at the top, followed by a list of components or services, followed by past incidents. Clicking on the ongoing incident link will take you to a detailed description of the incident.

An example from the Twilio status page:

Twilio status

Configure Notifications

Instead of periodically visiting status pages you can choose to sign up to receive notifications when there is an incident created, updated or resolved. Depending on your provider, status pages offer different modes of notification.

  • SMS
  • Slack
  • Email
  • RSS feed
  • Google Chat
  • Discord
  • Webhooks

Some status pages offer only one or two options, or sometimes no options at all. If the status page is managed by someone other than your cloud provider, your cloud provider can choose to enable/disable some of the available notification options. For an example, both DigitalOcean and Mailgun use Atlassian Statuspage. DigitalOcean allows you to subscribe using many channels:

DigitalOcean status

whereas Mailgun has disabled all options

Mailgun status

This is as of this writing. Providers can modify these options over time depending on their business requirements.

Notification Challenges

Your notifications should be delivered in a way that ensures the right team in your organization receives the alerts. If the team uses Slack that is where you want the notifications. If it's Discord, the notifications should go to a Discord channel.

The status pages used by your providers can have different notification options, which can pose a challenge. They might not offer the option you want. Some providers may have your chosen option, some might not. See the section on Use a Monitoring Tool on how to mitigate this.

Best Practices

Filtering Your Monitors

Cloud providers have many, sometimes hundreds, services in different locations across the globe. A cloud provider's status page shows incidents across all of them. Your team should receive notifications only for the services they use, and in the regions they use them in. Most status pages have an option to choose the services and the regions. Utilize this feature so that your team is not flooded with unnecessary notifications.

E.g. The Fastmail status page which is hosted by Instatus has options to sign up for notifications for specific components: Fastmail status notifications

In some large cloud providers like Google Cloud, it can become difficult to sign up for specific components and regions. Let's say you use Google Kubernetes Engine in us-central1. Currently the Google Cloud status page offers no way to receive notifications for only GKE in us-central1.

Do Periodic Reviews

Status pages keep changing. Your cloud provider may choose to add/remove services, switch to a different status page provider, or add/remove notification modes.

Have a Single View Across All Providers

To check if any of your cloud providers have an outage, a single visual way where all your providers show up is a must. In the absence of a dedicated monitoring tool that monitors your cloud provider status pages, a poor substitute will be your notification channel. If it's Slack, you can configure the notifications to go into a specific Slack channel. However, it can be difficult to search for past incidents as well as look at ongoing incidents with Slack.

Include in Your Incident Response Plan

Irrespective of your chosen notification mode, ensure that your incident response plan includes cloud provider alerts. Determine the right priority of such alerts so that your team can respond effectively. Include cloud provider alerts in your incident response plans so that teams can correlate alerts from other parts of your systems with cloud provider alerts to dig down faster into the root cause.

Use a Monitoring Tool

As noted in the previous sections, there are various challenges to monitoring cloud providers' status pages by yourself, unless you have only one or two such providers. There are various tools which aim to solve these pain points. IncidentHub is a SaaS tool created specifically to solve these challenges faced by Dev/Ops/SRE and IT Teams. You can create a free account which comes with 20 status page monitors and try it out.

IncidentHub monitors hundreds of cloud provider status pages periodically. It can send you notifications over the medium you choose - Email, Slack, PagerDuty, Discord, MS Teams, etc. IncidentHub also gives you a single dashboard where you can view ongoing and past incidents with your cloud providers: Availability page

The Benefits of Using a Monitoring Tool

The benefits of using a dedicated tool which monitors cloud status pages:

  • Offers a single normalized view across cloud providers' status pages
  • Hides the complexity of different status page formats
  • Detects and handles changing status page formats over time
  • Lets you choose the notification mode you want for alerts
  • Offers notification modes not available in the status page

Conclusion

Monitoring public status pages of cloud providers should form a key part of your monitoring strategy to maintain operational effectiveness and customer trust. Your team can stay informed and responsive during cloud service disruptions. There are various challenges in doing this by yourself - heterogeneous status page formats, non-overlapping notification modes, non-standard incident updates, and changing status page structures. A status page monitoring tool like IncidentHub can mitigate all these issues.

FAQ

Why should I monitor my cloud provider status pages?

Your cloud providers publish information about ongoing incidents and maintenance on their public status pages. Such disruptions can affect your business operations.

What if I am not able to locate a cloud provider's status page?

Cloud providers have a link to their status page on their website or you can find it using web search. If you are unable to locate it please get in touch with us at support@incidenthub.cloud and we will try our best to help you.

What is the best way to receive notifications?

The best way to receive notifications about cloud provider incidents is specific to your team. Discuss with your team what would make it most effective.

Is there a standard status page format?

There is no standard for a status page format. However, many cloud providers use one of the popular status page services like Atlassian Statuspage or Instatus. Providers using the same status page service will have a similar format. Some providers have their own format - like Google Cloud and Amazon Web Services.

What are the benefits of using a dedicated status page monitoring tool?

A dedicated status page monitoring tool smoothens out the differences between different cloud providers' status pages and gives you the option to receive notifications in your chosen way.

A Step by Step Guide to Checking if a SaaS is Down

· 6 min read
Hrishikesh Barua
Founder @IncidentHub.cloud

Introduction

Modern businesses depend heavily on Software as a Service (SaaS). Almost all aspects of business operations - accounting, HR, payroll, marketing, IT, sales, support - depend on one or more SaaS applications. SaaS is not limited to being used by software development teams. Given this dependency on SaaS applications, their uptime becomes tightly tied to a business's uptime. Any SaaS downtime can affect both a business's daily operations as well as the user experience.

How to check if a SaaS is experiencing downtime? Follow the steps below:

  1. Visit the SaaS Provider's Status Page
  2. Use External Monitoring Services
  3. Check Social Media
  4. Run Manual Tests
  5. Incident Communication
  6. Conclusion
  7. FAQ
  8. Popular SaaS Service Statuses

Visit the SaaS Provider's Status Page

The SaaS provider's status page will have first-hand information about ongoing issues.

Locate the SaaS provider's Status Page

You can find this by either doing a web search like "Zoom status page" or "OpenAI status page". You can also visit the SaaS provider's website and look for the status page link - it is usually in the footer. Another option is to check their documentation. If it's not available ask on their social media handles.

Understanding Status Pages

A SaaS provider's status page will indicate if the service is experiencing any downtime. Common status indicators are

  • Degraded performance
  • Service disruption
  • Partial outage

For example, take a look at the OpenAI status page

OpenAI status

Status pages also show you past incidents:

OpenAI past incidents

You can find more information about the outage by clicking on the downtime link on the status page. It will have details about which components or services are affected by the outage. If your SaaS has many independent locations, like a cloud provider, look for region/zone information as well. It's possible that the outage is limited to some components or locations. Check if any of the components or services you use are in the list. If it's a cloud provider or a similar service, check if the affected locations are among the ones that you use.

E.g. this Google Cloud outage affected Google Compute Engine in the asia-northeast1 region.

Google Cloud incident

Use External Monitoring Services

There are many monitoring tools that can track SaaS uptime. They are designed to continuously check the availability of SaaS services. These tools take away the hassle of you having to check uptime manually, especially if you have many SaaS applications. Checking the status page of each SaaS application is cumbersome. A status page monitoring tool like IncidentHub can make very easy by showing you the overall status of all your SaaS providers in one place.

Setting Up a Monitoring Tool

IncidentHub is a monitoring tool that checks official status pages of hundreds of SaaS applications. It notifies you in real-time if there is an outage or downtime. Setting up IncidentHub is just a few steps

Check Social Media

Twitter and Reddit are popular platforms to find SaaS outage information. Users post on these platforms to find more information and to check if others are also experiencing similar downtimes with the service. Such platforms can often provide real-time updates from other users. A caveat here is that if the outage is localized to some components or regions, you may not always find information about it on social media.

If your SaaS has a sub-Reddit, check the latest postings there for information.

Other community forums where users of the SaaS hang out can also provide important outage information.

Run Manual Tests

Running manual tests is another way to check if your SaaS is experiencing downtime. Check common functionality issues such as login failures, API errors, resource creation issues, and other specific functionalities. Correlate these with the official status page data, and what other users are reporting on social media. This is more of an ad-hoc method but it can add valuable information.

Incident Communication

It's very important to communicate with your team and your stakeholders about ongoing SaaS incidents. Your users and other business stakeholders should be notified as soon as you know there is an outage. This helps them to plan their work accordingly, and also decreases the number of user requests and helpdesk tickets you might get.

Incident communication is effective when you continuously share updates as they occur. It builds trust with your users. It's even better if users can check the status of their SaaS applications themselves on a status page or a dashboard.

Incident dashboard

Create alerts in your monitoring tool to inform your team about the status of services. Monitoring tools can integrate with most communication tools like email, Slack, Discord, etc.

Conclusion

In summary, you can check if your SaaS applications are down by checking the official status pages, using a monitoring tool, checking social media, and running manual tests. Keep communicating with your users about the current status.

This guide offers a clear method for users to quickly determine if their SaaS applications are down.

FAQ

How can you locate a SaaS provider's status page?

Check the SaaS provider's website, or run a web search.

Why is an external monitoring service important to track SaaS outages?

External monitoring tools continuously check SaaS status pages and other sources for incidents. They also check multiple SaaS providers at the same time. Doing this yourself is impractical and time-consuming.

How can you use social media to find out about SaaS downtime?

Popular social media channels like Twitter and Reddit often have real-time updates about SaaS outages from users who are experiencing downtime. SaaS-specific subreddits can be a good source of such information.

Popular SaaS Service Statuses

Airtable status
Akamai status
Azure status
Cloudflare status
Coinbase status
Discord status
Dropbox status
Fortnite status
GitHub status
Google Cloud status
Hetzner status
npm status
OpenAI status
PayPal status
Railway status
Reddit status
Rollbar status
SendGrid status
Twilio status
Vercel status
Zapier status