We had a downtime yesterday, here is what happened, and how we are trying to avoid it moving forward.
On November 2nd, we've received notifications that the database cluster has crashed and upon further investigation we've determined that the iSCSI storage was unresponsive. After reaching out to the hosting provider we've found out that they have experienced multiple drive failures on the SAN and that the disk got locked during the array rebuild process.
They managed to get it unlocked but because the array was still rebuilding and operating at a lower I/O capacity, turning on the database cluster resulted in the disk locking again resulting in more crashes.
At that point, we've decided to switch to a more modern storage solution which is SSD-backed and provides better disk I/O ops capacity which has resolved the issue and also provides a great performance boost.
Social and Community Manager,