Skip to main content

12 posts tagged with "product"

IncidentHub posts related to the product

View All Tags

Monitoring Third Party Vendors as an Ops Engineer/SRE

· 4 min read
Hrishikesh Barua
Founder, IncidentHub

Why should you monitor your third-party Cloud and SaaS vendors if you are in SRE/Ops?

As part of an SRE team, your primary responsibility is ensuring the reliability of your applications. What makes you responsible for monitoring services that you don't even manage? Third-party services are just like yours - with SLAs. And outages happen, affecting you as well as many others who depend on them.

It's a no-brainer that you should know when such outages happen to be on top of things if/when it affects your running applications.

Most of your third party dependencies will have a public status page or a Twitter account where they publish updates on their outages. Here are some seemingly easy ways to monitor these pages

  • Subscribe to the RSS feed of these pages
  • Follow the Twitter account
  • Sign up for Slack, Email, SMS notifications on the status page itself if the page supports these

Monitoring Your Third-Party Cloud and SaaS Services is Critical

· 3 min read
Hrishikesh Barua
Founder, IncidentHub

If you have a software-based business, you are using at least a few cloud based tools. It does not matter if you are a solo developer, or part of a 50-member team in a large organization. Take this random list and chances are you are using at least half of them:

Your entire business - irrespective of org or market size - including your development tools, collaboration/communication tools, infrastructure and hosting, monitoring, even email - is dependent on services that you don’t control. They are provided by other vendors.

Of course, you pay for some of them and they all have SLAs. Having an SLA does not translate to 100% uptime. Companies will try their best to meet SLAs - which promise a percentage of uptime (usually 99.xx). There are going to be incidents in your providers at some point, and the effect will cascade to the service that you provide to your customers. This means that your own product's SLA can be breached due to causes outside your control.