Back to overview
Downtime

Network Connectivity Issues

Aug 11 at 02:50pm BST

Resolved
Aug 24 at 01:31pm BST

Post Incident Report:

Executive Summary

On Friday 11th August at 14:30, monitoring systems detected a significant issue with traffic routing via DDoS mitigation systems, triggering a Major Incident response. Core network devices failed to handle traffic as expected.

Under normal operating conditions, traffic would have been routed via our resilient infrastructure. However, due to a third-party supplier failure, this route was unavailable.

This had a cascading effect on downstream services, leading to service degradation on our platform.

To rectify the situation and ensure continuity of services, we implemented fixes to our DDoS mitigation protocols restoring traffic routing and stabilisation in the London Region.

Root Cause

The service disruption was caused by a routing fault. Under normal operating conditions, traffic would have been routed via resilient infrastructure. However, a failure by a third-party supplier meant that the route to the resilient infrasctutre
was unavailable for the full duration of the incident.

We encountered a significant issue pertaining to the routing of traffic through DDoS mitigation systems. This was a complex issue resulting in a lengthy investigation process. Our investigations confirmed the issue was in the network layer and therefore we made the necessary amendments leading to service restoration.

Next Steps

The root cause was analysed and technical teams defined a detailed action plan, which includes an immediate review of DDoS/routing appliance configuration, software upgrades, resiliency validation and process improvements.

Updated
Aug 12 at 08:26am BST

Network connectivity has been online and stable during the last 12 hours.

We will continue to monitor out services and will provide updates as required.

Updated
Aug 11 at 09:40pm BST

We have observed stable connectivity for the past 60 minutes with no packet loss or disruption to service.

We will continue to monitor service health closely over the next 24 hours, and will provide updates as required.

Image

Updated
Aug 11 at 08:30pm BST

We have performed emergency maintenance on our core and distribution infrastructure to address the issues earlier today.

The maintenance was performed under maintenance note 244702.

We will be providing a post-mortem analysis once we have gathered all available information.

Our focus remains on ensuring full functionality and stability are restored.

Updated
Aug 11 at 03:05pm BST

We have observed restoration of all services.

Updated
Aug 11 at 02:55pm BST

We are working on rerouting traffic through alternate PoPs - The issue currently appears to be caused by a number of Tier 1 providers not routing traffic correctly.

We will provide updates shortly.

Created
Aug 11 at 02:50pm BST

We are aware of an issue with network connectivity to our services in DC1.

We are investigating the cause of these issues and will provide more updates shortly.