New Horizon Hosting LLC - Network outage – Incident details

Network outage

Resolved
Major outage
Started 2 months agoLasted 4 days

Affected

Client/Billing Area

Major outage from 8:38 PM to 1:03 AM, Degraded performance from 1:03 AM to 9:58 PM, Operational from 9:58 PM to 8:40 AM

Main Website

Major outage from 8:38 PM to 1:03 AM, Degraded performance from 1:03 AM to 9:58 PM, Operational from 9:58 PM to 8:40 AM

Web

Major outage from 8:38 PM to 1:03 AM, Degraded performance from 1:03 AM to 9:58 PM, Operational from 9:58 PM to 8:40 AM

US-CHI-W01 (Web Hosting)

Major outage from 8:38 PM to 1:03 AM, Degraded performance from 1:03 AM to 9:58 PM, Operational from 9:58 PM to 8:40 AM

US-CHI-W02 (Web Builder)

Major outage from 8:38 PM to 1:03 AM, Degraded performance from 1:03 AM to 9:58 PM, Operational from 9:58 PM to 8:40 AM

US-CHI-W03 (Web Radio Hosting)

Major outage from 8:38 PM to 1:03 AM, Degraded performance from 1:03 AM to 9:58 PM, Operational from 9:58 PM to 8:40 AM

Updates
  • Resolved
    Resolved
    This incident has been resolved.
  • Monitoring
    Monitoring

    The cables have been replaced, and the network is traversing traffic as expected. We will continue to monitor logs over the next 24 hours for interface errors. As stated previously, we sincerely apologize for any inconvenience this incident has caused.

  • Update
    Update

    Our network team has concluded the investigation, and we have determined the root cause to be failing direct attached copper (DAC) cables between our spines and edge routers. Due to the severity of failing cables, the impact was felt more drastically throughout the network. As a result, we plan to replace the failing cables in the early morning of Wednesday, April 30th, at off-peak hours to minimize disruptions. We will be performing replacements of these DACs with backups from our storage at an off-peak time. This is scheduled to occur at 10:00 UTC on Wednesday, April 30th. We will provide an update when this is completed. We apologize for any inconvenience this incident has caused.

  • Identified
    Identified

    Our network team is investigating deeper, but we can confirm that one of our core spine switches flapped all LACP interfaces. Physical interfaces remained online.

    The flapping of the LACP interfaces caused some connections between our top-of-rack switches and carriers to flap that were connected to said spine switch. More info will be posted soon.

  • Investigating
    Investigating

    We have had an entire network outage with an unknown cause. We are currently investigating this incident.