Best Practices for Planning for Upcoming Cloud Maintenance
Introduction
Cloud maintenance is a common practice in the tech industry. Whether you manage your own infrastructure or use a cloud provider, you will need to plan for maintenance and include it as part of your operational readiness. This ensures that your team is prepared for potential downtime and can deal with any incidents in a timely manner. This article will cover some best practices for planning for upcoming cloud maintenance.

- Introduction
- Types of Maintenance
- Maintenance Planning
- During the Maintenance Window
- Tracking Upcoming Maintenance
- Conclusion
Types of Maintenance
Based on advance preparedness, there are two types of maintenance - scheduled and emergency. It is important to understand that a maintenance need not always cause downtime. However, being prepared for downtime is one of the key aspects of maintenance planning.
Scheduled Maintenance
A scheduled maintenance is a planned maintenance that is announced in advance, sometimes days or weeks or even months ahead. Scheduled maintenance announcements give sufficient opportunity to plan for any downtime.

Scheduled maintenances can be modified or cancelled. Some cloud providers give you a way to reschedule or control the maintenance window if it affects only your resources (and not other customers'). You can leverage this to your advantage to minimize the impact of the maintenance.
Some examples:
-
Amazon Web Services EC2 - AWS EC2 maintenance events involve starting and stopping the instances. You can schedule the maintenance during off-peak hours to minimize the impact. You can also trigger the start/stop yourself at a chosen time before the scheduled window.
-
Render - Render's scheduled maintenance can be rescheduled to a different time, and you can also choose to trigger it at a time of your choosing.
Emergency Maintenance
An emergency maintenance is not planned and is triggered as a way to mitigate a critical issue. Users will still be notified but they may not have enough time to plan for it.

Maintenance Planning
Planning for maintenance events in advance is an important part of your incident management strategy. The advantage your team has for maintenance events as opposed to incidents is that you can plan for them beforehand, and assess and mitigate any possible impact.
Impact Assessment
A maintenance announcement will have at least these details:
- The expected start and end time of the maintenance - the "maintenance window".
- The cloud services affected.
Based on the services affected, your team can determine if there will be any impact on your own applications or users.
Operational Readiness
If there will be impact to your own applications or users, you can plan for it by:
- Identifying the applications and services that will be affected. If it's a cloud service that your applications or services depend on, inform the affected teams so that they have workarounds in place. These can be measures like keeping standby servers in a different region as a fallback, or actively routing client traffic to a different region in your load balancer.
- Identifying the impact on your users. If it's a SaaS service like a communication suite, or an office productivity suite, informing your users in an org-wide channel lets them plan their work. They can ensure that no critical work is scheduled during the maintenance window.
During the Maintenance Window
During an ongoing maintenance, these are the important things to keep in mind:
Communication
Keep your team informed about the status of the maintenance. This can be done using your existing communication tools like Slack or MS Teams. Another option is to use a status page that is accessible to everyone in your organization and provides real-time updates.
Dealing With Unexpected Issues
You can run into unexpected pitfalls due to various reasons:
- The maintenance window was longer than expected In such cases, the affected cloud resources may be affected for a longer period of time. For your own applications, the teams have to continue with their original mitigation plan and adjust as needed. For SaaS applications, inform your users as soon as possible so that they can plan too.
- The maintenance activity affected resources other than the ones that were announced This is rare but possible. In such cases track the cloud provider's status page and get in touch with their support. If you have a support contract with them, get in touch with your support representative.
- Your team missed planning for one or more of the applications or services that will be affected You have to treat this like any other incident and put your incident response plan into action. It's also an opportunity to improve your incident management process.
Tracking Upcoming Maintenance
All cloud and SaaS services announce maintenance beforehand. You can track these announcements easily by using a status page monitor/aggregator like IncidentHub.
Maintenance Notifications
Email Alerts
You can receive email notifications by signing up on the cloud or SaaS provider's status page. However, this is cumbersome if there are too many status pages. Not all status pages offer this feature. It's also difficult to inform users in real time with this approach.
Dashboard Push Notifications
Some cloud providers show you notifications on their dashboard. However, you need to be logged in and keep the browser tab open to see them. It's also not easy to track all your services easily, or communicate such updates to your team.
Use a Status Page Monitor/Aggregator like IncidentHub
IncidentHub tracks and shows you a maintenance feed of all upcoming and scheduled maintenances across your services.

You can also set advance reminders and customize when you wish to receive them.

Sign up for an IncidentHub account to track your cloud and SaaS service maintenances in one place
Conclusion
Maintenance tracking is an important part of your incident management process. It helps you stay informed about the status of your services and plan for any potential downtime, and lets your users plan their work in a better way.
Photo Credits: Ivan N on Unsplash
IncidentHub is not affiliated with any of the services and vendors mentioned in this article.