Monday, April 6, 2015
This past Friday through Sunday ANI experienced the worst mass outage in the company’s history. We had some serious operational and equipment issues which resulted in more than half of our customers experiencing an extended period of downtime while we worked with our vendor to try to troubleshoot and eventually replace the failed hardware.
As of right now, everything is running normally. Yesterday [Saturday, April 4] the failed hardware was replaced and our team spent the day configuring and testing the new part to make sure the configurations were correct. Saturday night we began to bring up each production instance carefully to ensure no further issues were encountered. This extended into Sunday for some clients. We have been monitoring the performance throughout the night and are confident the issue has been permanently resolved.
So what happened?
The central storage server that houses client data had a hardware failure, of the NVRAM, which caused the service interruption. Our storage server is fully redundant system with no single point of failure to provide customers with the best service experience. It means a single component failure must not stop the service. Nevertheless, this time, the fail-over to the backup controller was not fully successful. As a result in order to restore full service, we had to install a replacement part in the primary controller and restore the configuration. Saturday morning we installed the replacement part and reloaded the backup of the configuration. Unfortunately, this was not successful and the storage system vendor had us proceeded to restore manually, which took several more hours. By 7 pm MST, the configurations were restored and we began restoring service client by client.
Will this happen again?
We will work closely with the storage system vendor to ensure a successful fail-over if one should occur in the future. The system is designed for this, and we are following up to understand why this did not happen and why a backup configuration was not reloaded successfully. We will see what steps we can take to ensure a faster recovery if it’s ever needed, whether it is an automatic fail-over, a part replacement and configuration reload, or a full manual recovery.
Again, I apologize to those of you that experienced this difficult and protracted outage. I know it’s not acceptable for you or for us. We remain committed to giving you a stable high performing experience and we are putting together a team to research a different path forward to ensure this never happens again.
Asahi Net International, Inc. (ANI)
1955 S Val Vista Dr., Suite 126
Mesa, AZ 85204