AlertOps outage

Incident Report for AlertOps

Postmortem

Service Disruption Related to Microsoft Azure Outage

Summary

On October 28, 2025, beginning at approximately 10:36 CDT, AlertOps experiences service disruptions across several components due to a regional outage affecting Microsoft Azure infrastructure.

During this time, the AlertOps web application and APIs are unavailable, and multiple connected services — including Inbound and Outbound Integrations, Notifications Delivery Service, and the Mobile App — operate intermittently.

To maintain continuity of alert delivery and inbound event processing while Azure services are impacted, we implement a failover method for inbound API services. This allows inbound integrations and notifications to continue functioning even while the web application remains offline.

Once Azure restores full functionality and the AlertOps web application becomes stable, we revert back to the primary API infrastructure. All systems are verified to be operational and stable by 17:01 CDT on October 29, 2025.

What Happened

On October 28, 2025, at approximately 10:36 CDT, AlertOps begins to experience widespread service degradation due to a Microsoft Azure outage impacting infrastructure resources used by our web application and APIs.

Between 10:36 CDT and 12:00 CDT, users may experience an inability to access the AlertOps web application, with degraded or intermittent behavior observed in integrations, notifications, and mobile access.

As Azure continues to report and mitigate the upstream issue, AlertOps engineers identify that inbound API paths — which manage alert ingestion and routing — can be temporarily redirected to an alternate failover configuration. This approach maintains operational continuity for inbound processing and notification delivery even while the main web and API endpoints are impaired.

Through this failover, alert creation, routing, and delivery mechanisms continue to operate, ensuring customers continue to receive notifications and that inbound integrations remain active.

Once Azure restores stability and the AlertOps web application returns to normal operation, we perform a controlled reversion from the failover environment back to our primary API services. Post-reversion validation confirms that all components are functioning as expected.

What We Are Doing About This

Following resolution of this incident, we conduct a full review of our service continuity and failover procedures. We are taking the following actions to strengthen resilience and response for future provider-level outages:

Enhance Automation: We are implementing improved monitoring and automation to trigger failover and recovery actions more rapidly and safely.
Improve Health and Recovery Monitoring: We are expanding observability and alerting coverage to better detect upstream degradation and validate failover transitions.
Evaluate Additional Redundancy for Web Application Access: We are exploring resilient hosting and alternate access methods to maintain basic operational functionality during cloud provider disruptions.

We sincerely apologize for the impact this incident may have caused. We understand how critical AlertOps is for managing your operations and incident response, and we remain committed to providing a resilient, reliable, and transparent platform.

For any additional questions or concerns, please contact support@alertops.com

Posted Oct 31, 2025 - 13:34 CDT

Resolved

We can confirm full recovery across all AlertOps services. All systems are now fully operational and stable.

Posted Oct 29, 2025 - 17:01 CDT

Update

Currently experiencing degraded performance in the AlertOps web application and APIs.
All other components — including Inbound Integrations, Outbound Integrations, Notifications Delivery Service, and Mobile App — remained partially unavailable.
Our team is working toward full service restoration.

Posted Oct 29, 2025 - 15:56 CDT

Update

Inbound and Outbound Integrations, Notifications Delivery Service, and the Mobile App are operating intermittently.
The web application remains unavailable.
Our engineering team is actively implementing mitigations and monitoring recovery.

Posted Oct 29, 2025 - 13:58 CDT

Update

We continue to work on restoring all affected services following the AWS outage.
Mitigation efforts remain in progress to stabilize integrations and notification delivery.

Posted Oct 29, 2025 - 12:02 CDT

Identified

We are monitoring a widespread outage with Microsoft Azure that is impacting multiple AlertOps services.

Posted Oct 29, 2025 - 12:01 CDT

Investigating

We are investigating an ongoing service disruption affecting the AlertOps web application and APIs.
Our team is working to identify the cause and restore full functionality as quickly as possible.

Posted Oct 29, 2025 - 10:36 CDT

This incident affected: Inbound Integrations, Web Application, Notifications Delivery Service, Outbound Integrations, and Mobile App.