Back to overview
Downtime

WAF Service Availability

Oct 15 at 08:32pm BST
Affected services
Web Application Firewall (WAF)

Resolved
Oct 15 at 09:37pm BST

RFO

Date and Time of Incident:

15-Oct-2025, 20:28 - 20:49

Timeline of Events

  • 20:28: Availability alerts triggered, indicating a loss of Web Application Firewall cluster 1.
  • 20:32: Technical teams begin fault investigation.
  • 20:40: Issue identified as a locked-up worker node.
  • 20:45: Worker node removed from WAF cluster.
  • 20:49: Service availability returns to normal.

Summary of impact

One of the WAF worker nodes in cluster 1 experienced a software failure that caused the TCP session state table to fail synchronising correctly across the WAF cluster. This, in turn, led to the cluster falling out of synchronisation and failing to pass traffic correctly.

Services routing through the affected WAF cluster were unavailable for 21 minutes.

Resolution Status

We have multiple redundancies in place to prevent a total outage from happening should a worker node fail entirely; however, in this case, a worker node experienced a very unique partial failure, where it was sending OKs to health checks, but wasn’t healthy.

We have engaged with our WAF vendors TAC, and will deploy remedial fixes as soon as they are made available.

As a precaution, we will be reloading all worker nodes overnight. No downtime is expected from this change.

Updated
Oct 15 at 08:49pm BST

Services are now back online.

Updated
Oct 15 at 08:42pm BST

We have identified the issue and are working to bring the services back online.

Created
Oct 15 at 08:32pm BST

We are investigating availability issues with our Web Application Firewall.