[Server Monitoring] Connectivity issues to scoutapp.com
Incident Report for Scout
Postmortem

From our hosting provider:

"On further investigation, this issue was due to an additionally damaged PDU from Friday's surge. It has been replaced with a new PDU and service has been restored. We are still awaiting an RCA from Zayo, and will be reaching out with more information as it becomes available."

Later information:

Zayo has just released an RCA regarding the original incident on Nov 12th, which triggered this followup incident. The RCA mentioned a PDU inspection, but it appears one passed the inspection that later failed.

Scout's Next Steps:

This incident reinforces our efforts on better redundancy for networking issues impacting our datacenter. The majority of incidents server monitoring has experienced in 2016 have been network-related vs. application issues. This is our priority for server monitoring entering 2017.

Posted Nov 16, 2016 - 15:33 MST

Resolved
Our host has confirmed service has been restored.
Posted Nov 14, 2016 - 16:05 MST
Monitoring
From Rails Machine: "The power interruption on Friday has damaged the uplink equipment at our network's edge causing it to spontaneously die without warning. We are in the process of replacing the uplink now. "
Posted Nov 14, 2016 - 14:41 MST
Investigating
Our datacenter is investigating a data connectivity loss. We'll update as we hear more.

The Rails Machine Status Page: http://status.railsmachine.com/incidents/22th49clrdfk
Posted Nov 14, 2016 - 14:16 MST