On October 28, 2025, beginning at approximately 10:36 CDT, AlertOps experiences service disruptions across several components due to a regional outage affecting Microsoft Azure infrastructure.
During this time, the AlertOps web application and APIs are unavailable, and multiple connected services — including Inbound and Outbound Integrations, Notifications Delivery Service, and the Mobile App — operate intermittently.
To maintain continuity of alert delivery and inbound event processing while Azure services are impacted, we implement a failover method for inbound API services. This allows inbound integrations and notifications to continue functioning even while the web application remains offline.
Once Azure restores full functionality and the AlertOps web application becomes stable, we revert back to the primary API infrastructure. All systems are verified to be operational and stable by 17:01 CDT on October 29, 2025.
On October 28, 2025, at approximately 10:36 CDT, AlertOps begins to experience widespread service degradation due to a Microsoft Azure outage impacting infrastructure resources used by our web application and APIs.
Between 10:36 CDT and 12:00 CDT, users may experience an inability to access the AlertOps web application, with degraded or intermittent behavior observed in integrations, notifications, and mobile access.
As Azure continues to report and mitigate the upstream issue, AlertOps engineers identify that inbound API paths — which manage alert ingestion and routing — can be temporarily redirected to an alternate failover configuration. This approach maintains operational continuity for inbound processing and notification delivery even while the main web and API endpoints are impaired.
Through this failover, alert creation, routing, and delivery mechanisms continue to operate, ensuring customers continue to receive notifications and that inbound integrations remain active.
Once Azure restores stability and the AlertOps web application returns to normal operation, we perform a controlled reversion from the failover environment back to our primary API services. Post-reversion validation confirms that all components are functioning as expected.
Following resolution of this incident, we conduct a full review of our service continuity and failover procedures. We are taking the following actions to strengthen resilience and response for future provider-level outages:
We sincerely apologize for the impact this incident may have caused. We understand how critical AlertOps is for managing your operations and incident response, and we remain committed to providing a resilient, reliable, and transparent platform.
For any additional questions or concerns, please contact support@alertops.com