Network Connectivity Issues
Resolved
Aug 24 at 01:31pm BST
Post Incident Report:
Executive Summary
On Friday 11th August at 14:30, monitoring systems detected a significant issue with traffic routing via DDoS mitigation systems, triggering a Major Incident response. Core network devices failed to handle traffic as expected.
Under normal operating conditions, traffic would have been routed via our resilient infrastructure. However, due to a third-party supplier failure, this route was unavailable.
This had a cascading effect on downstream services, leading to service degradation on our platform.
To rectify the situation and ensure continuity of services, we implemented fixes to our DDoS mitigation protocols restoring traffic routing and stabilisation in the London Region.
Root Cause
The service disruption was caused by a routing fault. Under normal operating conditions, traffic would have been routed via resilient infrastructure. However, a failure by a third-party supplier meant that the route to the resilient infrasctutre
was unavailable for the full duration of the incident.
We encountered a significant issue pertaining to the routing of traffic through DDoS mitigation systems. This was a complex issue resulting in a lengthy investigation process. Our investigations confirmed the issue was in the network layer and therefore we made the necessary amendments leading to service restoration.
Next Steps
The root cause was analysed and technical teams defined a detailed action plan, which includes an immediate review of DDoS/routing appliance configuration, software upgrades, resiliency validation and process improvements.
Affected services
Updated
Aug 12 at 08:26am BST
Network connectivity has been online and stable during the last 12 hours.
We will continue to monitor out services and will provide updates as required.
Affected services
Updated
Aug 11 at 09:40pm BST
We have observed stable connectivity for the past 60 minutes with no packet loss or disruption to service.
We will continue to monitor service health closely over the next 24 hours, and will provide updates as required.
Affected services
Updated
Aug 11 at 08:30pm BST
We have performed emergency maintenance on our core and distribution infrastructure to address the issues earlier today.
The maintenance was performed under maintenance note 244702.
We will be providing a post-mortem analysis once we have gathered all available information.
Our focus remains on ensuring full functionality and stability are restored.
Affected services
Updated
Aug 11 at 03:05pm BST
We have observed restoration of all services.
Affected services
Updated
Aug 11 at 02:55pm BST
We are working on rerouting traffic through alternate PoPs - The issue currently appears to be caused by a number of Tier 1 providers not routing traffic correctly.
We will provide updates shortly.
Affected services
Created
Aug 11 at 02:50pm BST
We are aware of an issue with network connectivity to our services in DC1.
We are investigating the cause of these issues and will provide more updates shortly.
Affected services