Previous incidents
October 2024
Oct 25, 2024
1 incident
LON1: Core Switch Lockup
Downtime
Resolved Oct 25 at 05:04pm BST
Executive Summary
On 25th October at 09:15 GMT, monitoring systems detected a loss of network availability to several services connected to access switches in a Virtual Chassis configuration, in racks A2, A3 and A4, triggering a major incident response.
Upon investigation, the pair of access switches had moved into a ‘soft-lockup’ state, which meant the switches were actively advertising their availability to carry traffic, but were unable to do so.
To rectify the situation, our on-si...
2 previous updates
Oct 07, 2024
1 incident
DC1: Power Feed Issues
Downtime
Resolved Oct 08 at 03:18pm BST
What happened?
Tuesday 08-October-24 at
- 19:00: We received alerts from our monitoring systems that our primary power feed had dropped offline. No disruption to service availability at this point.
- 19:20: We receive alerts that several access (feed) switches have gone offline and are unreachable internally.
- 20:59: We receive alerts that the offline access switches have come back online and are carrying traffic correctly.
- 21:05: We manually verify that workloads are back online...
3 previous updates
September 2024