WAF Service Availability
Resolved
Oct 15 at 09:37pm BST
RFO
Date and Time of Incident:
15-Oct-2025, 20:28 - 20:49
Timeline of Events
- 20:28: Availability alerts triggered, indicating a loss of Web Application Firewall cluster 1.
- 20:32: Technical teams begin fault investigation.
- 20:40: Issue identified as a locked-up worker node.
- 20:45: Worker node removed from WAF cluster.
- 20:49: Service availability returns to normal.
Summary of impact
One of the WAF worker nodes in cluster 1 experienced a software failure that caused the TCP session state table to fail synchronising correctly across the WAF cluster. This, in turn, led to the cluster falling out of synchronisation and failing to pass traffic correctly.
Services routing through the affected WAF cluster were unavailable for 21 minutes.
Resolution Status
We have multiple redundancies in place to prevent a total outage from happening should a worker node fail entirely; however, in this case, a worker node experienced a very unique partial failure, where it was sending OKs to health checks, but wasn’t healthy.
We have engaged with our WAF vendors TAC, and will deploy remedial fixes as soon as they are made available.
As a precaution, we will be reloading all worker nodes overnight. No downtime is expected from this change.
Affected services
Updated
Oct 15 at 08:49pm BST
Services are now back online.
Affected services
Updated
Oct 15 at 08:42pm BST
We have identified the issue and are working to bring the services back online.
Affected services
Created
Oct 15 at 08:32pm BST
We are investigating availability issues with our Web Application Firewall.
Affected services