Skip to main content

3 posts tagged with "saas"

IncidentHub posts related to Software-as-a-Service (SaaS)

View All Tags

A Step by Step Guide to Checking if a SaaS is Down

· 6 min read
Hrishikesh Barua
Founder @IncidentHub.cloud

Introduction

Modern businesses depend heavily on Software as a Service (SaaS). Almost all aspects of business operations - accounting, HR, payroll, marketing, IT, sales, support - depend on one or more SaaS applications. SaaS is not limited to being used by software development teams. Given this dependency on SaaS applications, their uptime becomes tightly tied to a business's uptime. Any SaaS downtime can affect both a business's daily operations as well as the user experience.

How to check if a SaaS is experiencing downtime? Follow the steps below:

  1. Visit the SaaS Provider's Status Page
  2. Use External Monitoring Services
  3. Check Social Media
  4. Run Manual Tests
  5. Incident Communication
  6. Conclusion
  7. FAQ
  8. Popular SaaS Service Statuses

Visit the SaaS Provider's Status Page

The SaaS provider's status page will have first-hand information about ongoing issues.

Locate the SaaS provider's Status Page

You can find this by either doing a web search like "Zoom status page" or "OpenAI status page". You can also visit the SaaS provider's website and look for the status page link - it is usually in the footer. Another option is to check their documentation. If it's not available ask on their social media handles.

Understanding Status Pages

A SaaS provider's status page will indicate if the service is experiencing any downtime. Common status indicators are

  • Degraded performance
  • Service disruption
  • Partial outage

For example, take a look at the OpenAI status page

OpenAI status

Status pages also show you past incidents:

OpenAI past incidents

You can find more information about the outage by clicking on the downtime link on the status page. It will have details about which components or services are affected by the outage. If your SaaS has many independent locations, like a cloud provider, look for region/zone information as well. It's possible that the outage is limited to some components or locations. Check if any of the components or services you use are in the list. If it's a cloud provider or a similar service, check if the affected locations are among the ones that you use.

E.g. this Google Cloud outage affected Google Compute Engine in the asia-northeast1 region.

Google Cloud incident

Use External Monitoring Services

There are many monitoring tools that can track SaaS uptime. They are designed to continuously check the availability of SaaS services. These tools take away the hassle of you having to check uptime manually, especially if you have many SaaS applications. Checking the status page of each SaaS application is cumbersome. A status page monitoring tool like IncidentHub can make very easy by showing you the overall status of all your SaaS providers in one place.

Setting Up a Monitoring Tool

IncidentHub is a monitoring tool that checks official status pages of hundreds of SaaS applications. It notifies you in real-time if there is an outage or downtime. Setting up IncidentHub is just a few steps

Check Social Media

Twitter and Reddit are popular platforms to find SaaS outage information. Users post on these platforms to find more information and to check if others are also experiencing similar downtimes with the service. Such platforms can often provide real-time updates from other users. A caveat here is that if the outage is localized to some components or regions, you may not always find information about it on social media.

If your SaaS has a sub-Reddit, check the latest postings there for information.

Other community forums where users of the SaaS hang out can also provide important outage information.

Run Manual Tests

Running manual tests is another way to check if your SaaS is experiencing downtime. Check common functionality issues such as login failures, API errors, resource creation issues, and other specific functionalities. Correlate these with the official status page data, and what other users are reporting on social media. This is more of an ad-hoc method but it can add valuable information.

Incident Communication

It's very important to communicate with your team and your stakeholders about ongoing SaaS incidents. Your users and other business stakeholders should be notified as soon as you know there is an outage. This helps them to plan their work accordingly, and also decreases the number of user requests and helpdesk tickets you might get.

Incident communication is effective when you continuously share updates as they occur. It builds trust with your users. It's even better if users can check the status of their SaaS applications themselves on a status page or a dashboard.

Incident dashboard

Create alerts in your monitoring tool to inform your team about the status of services. Monitoring tools can integrate with most communication tools like email, Slack, Discord, etc.

Conclusion

In summary, you can check if your SaaS applications are down by checking the official status pages, using a monitoring tool, checking social media, and running manual tests. Keep communicating with your users about the current status.

This guide offers a clear method for users to quickly determine if their SaaS applications are down.

FAQ

How can you locate a SaaS provider's status page?

Check the SaaS provider's website, or run a web search.

Why is an external monitoring service important to track SaaS outages?

External monitoring tools continuously check SaaS status pages and other sources for incidents. They also check multiple SaaS providers at the same time. Doing this yourself is impractical and time-consuming.

How can you use social media to find out about SaaS downtime?

Popular social media channels like Twitter and Reddit often have real-time updates about SaaS outages from users who are experiencing downtime. SaaS-specific subreddits can be a good source of such information.

Popular SaaS Service Statuses

Airtable status
Akamai status
Azure status
Cloudflare status
Coinbase status
Discord status
Dropbox status
Fortnite status
GitHub status
Google Cloud status
Hetzner status
npm status
OpenAI status
PayPal status
Railway status
Reddit status
Rollbar status
SendGrid status
Twilio status
Vercel status
Zapier status

Monitoring Specific Components and Regions in Your Third-Party Services

· 3 min read
Hrishikesh Barua
Founder @IncidentHub.cloud

Chances are, most of your third-party cloud and SaaS dependencies are globally distributed and have many regions of operation. Chances are, your applications use a subset of a cloud or SaaS service. If you are monitoring such a service, why should you receive alerts for all regions or every single component in the service?

E.g. if you use Digital Ocean, you might be using Kubernetes in their US locations (NYC and SFO). You would want to know only when there is an outage in one of these locations. Digital Ocean's status page gives you the option to subscribe to outages across the board - it’s all or nothing. This is the case with most services with a few exceptions.

Choosing Specific Components to Monitor

You can now choose which components/regions you wish to monitor in IncidentHub. Let us continue with our Digital Ocean example.

You can choose to monitor all components:

Monitor all components

or a subset that is relevant to you:

Monitor specific components

Once you save this configuration, you will be alerted only for outages that affect these components.

Adding/Removing Components

You can always go back and edit the components later. This is helpful when you start using say, Kubernetes in a new region, or new components. In your IncidentHub dashboard, you should see the "Edit Components" button next to your list of services.

Edit components

Benefits

  • This new feature will help you to receive only relevant and actionable alerts. If you are a developer you need not worry about receiving irrelevant alerts for components your application does not even use.
  • SRE/Ops teams can react to infrastructure issues quicker without wading through noise and correlate those with outages reported in their own applications.
  • If you are in an IT Team with hundreds or thousands of users depending on tools like Zoom, Slack, or Google Workspace, you can react to issues before your users start logging helpdesk tickets.

This powerful new feature, which significantly reduces alert noise, is being rolled out to eligible services as of this writing. Log in to your IncidentHub account today to start customizing your monitoring settings. For a step-by-step guide on how to set up your custom monitoring preferences, check out our knowledge base article. We would love to hear how this new feature is working for you.

Watch this blog or our X/LinkedIn feeds for updates on more exciting new features.

Monitoring Third Party Vendors as an Ops Engineer/SRE

· 3 min read
Hrishikesh Barua
Founder @IncidentHub.cloud

Why should you monitor your third-party Cloud and SaaS vendors if you are in SRE/Ops?

As part of an SRE team, your primary responsibility is ensuring the reliability of your applications. What makes you responsible for monitoring services that you don't even manage? Third-party services are just like yours - with SLAs. And outages happen, affecting you as well as many others who depend on them.

It's a no-brainer that you should know when such outages happen to be on top of things if/when it affects your running applications.

Most of your third party dependencies will have a public status page or a Twitter account where they publish updates on their outages. Here are some seemingly easy ways to monitor these pages

  • Subscribe to the RSS feed of these pages
  • Follow the Twitter account
  • Sign up for Slack, Email, SMS notifications on the status page itself if the page supports these

But if you have tried it, it's not that easy

  • Not all pages have RSS feeds
  • Some have Slack, Email, SMS integration - some don't
  • Some don't have a Twitter account
  • You need to sign up on all of these pages one by one, and all services may not support the same notification channel

You can easily end up doing this one by one for 10-15 or more service providers. Let's do a quick check. Which services in this list below do you use in your stack?

  • DNS - GCP/GoDaddy/UltraDNS/Route53
  • Cloud/PaaS - GCP/AWS/Azure/DigitalOcean/Heroku/Render/Railway/Hetzner
  • Monitoring - Grafana Cloud/DataDog/New Relic/SolarWinds
  • On-call management - PagerDuty/OpsGenie
  • Email - Google Workspace/Zoho
  • Communication - Zoom/Slack
  • Collaboration - Atlassian Jira/Confluence
  • Source code - GitLab/GitHub
  • CI/CD/GitOps - TravisCI/CircleCI/CodeFresh
  • CDN/Content delivery/ - Cloudflare/CDNJS/Fastly/Akamai
  • SMTP providers - SMTP.com/SendGrid
  • Payments - PayPal/Stripe
  • Artifact Repo - Maven/DockerHub.Quay.io
  • Others - OpenAI/Apple Dev Platform/Meta Platform
  • Marketing - MailChimp/Hubspot
  • Auth - Okta/Clerk/Auth0

This is a small list. You may not have all of these, or may have more/others, but you get the point.

Like any self-respecting Ops Engineer/SRE, you would probably want to whip up a script and write this check-pages-and-notify-in-one-place tool by yourself. I know, because I've worked in Ops/SRE roles for the better part of my career, and NIH is a very real thing. Here's why it's not a great idea

  • Any software you write has to be maintained. Say your org starts using a new service which does not have an RSS feed on the status page. What now?
  • Who monitors the monitor? How do you know when your script is not running?
  • You probably have better uses for your time

IncidentHub was built to solve precisely these problems - so you can focus on what's important, and hand off monitoring third-party services to something that was built with that goal in mind. So stop hacking together scripts to monitor public status pages, and try it out.