Server Monitoring customers may experience connectivity issues
Incident Report for Scout
Postmortem

Below is the postmortem supplied by our host, RailsMachine. Again, here at Scout, we apologize for the outage.

November 4, 2016 Outage Follow Up

As you know, we suffered a network outage this morning for approximately 21 minutes starting at 10:37am Eastern. After investigation, it appears this outage was directly related to the same firmware defect in our core switch that caused a previous outage earlier this year. This same defect is why we had purchased a new redundant switching solution, which should have been installed during our maintenance window two weeks ago. However, during that scheduled maintenance window, it was discovered that the hardware recommended by our vendors had an undocumented limitation that could not be overcome. We worked several days with their engineers to no avail, and thusly it was decided to purchase a more robust solution. As a result, all that hardware had to be sent back to the vendor, the limitation had to be tested and verified absent on new hardware, and those replacements ordered. We are currently awaiting delivery on that new hardware now.

It is estimated that the new hardware will be delivered within the next 14-21 days. As soon as it arrives, we will schedule a new maintenance window to resolve this issue permanently, while also adding some great new features, speed, bandwidth, and redundancy. In the meantime, please accept our apologies for today's inconvenience, and know that we are working diligently to continue providing outstanding availability, service, and support. If you have any questions, please reach out to us at support@railsmachine.com, or to me directly at dustin@railsmachine.com, and we'd be happy as always to help!

Zayo Upgrades

As we near the holidays, I also wanted to take this opportunity to update you on Zayo's network upgrade progress. Just these past two weeks, one of our customers was hit by several DDoS attacks. These attacks did not affect any of our other customers, and were absorbed and rate-limited by the new equipment installed since the New Year's Day outages. I spoke with their Lead Network Architect, and this should go a long way in protecting us, and you, from these growing threats.

Posted almost 2 years ago. Nov 04, 2016 - 14:50 MDT

Resolved
This incident has been resolved.
Posted almost 2 years ago. Nov 04, 2016 - 09:54 MDT
Update
Unfortunately, incoming data from 8:35AM - 9AM (Mountain / GMT-6) was lost due to the datacenter's network failure. We will post a post-mortem as soon as RailsMachine makes it available. Note: This affects server monitoring accounts only, not APM accounts.
Posted almost 2 years ago. Nov 04, 2016 - 09:09 MDT
Monitoring
The datacenter seems to have restored connectivity.
Posted almost 2 years ago. Nov 04, 2016 - 09:06 MDT
Investigating
From RailsMachine, our host: "Our monitoring systems have detected network connectivity issues to the datacenter. We are working with Zayo to identify the cause of the problem right now and will provide more details as soon as they are available." More detail and updates here: http://status.railsmachine.com/incidents/30gb5hnzmd1w
Posted almost 2 years ago. Nov 04, 2016 - 08:47 MDT