Thursday, May 14, 2009

Dealing with Brownouts

In addition to my work for Major League Media and Loneliest Road Consulting LLC, I work for a K-12 educational organization in southeast Indiana as the assistant technology coordinator.

Day to day operations for this school district are generally tame. Even so, we utilize RAID, a strict backup regimen, UPS devices on all critical systems, and the whole nine yards.

It's amazing how something as trivial as a small Mylar balloon can cause a major electrical nightmare.

Said balloon flew into a transformer owned by our local utility company. The resulting voltage spike and brownout was mostly contained by our array of UPS devices. The switch to our backbone was spared an untimely end, as were the application servers running the school office and gradebook software.

Unfortunately, a total of five Pentium4 workstations- connected only to consumer grade surge protectors, suffered electrical damage to their motherboards. Our Linux firewall suffered a corrupted file system resulting in only repeated kernel panics with each boot attempt.

This event is a lesson in disaster preparedness. Despite your best attempts at protecting your network and attached computer systems, electrical faults can still cause damage. It is important to note:

1. Distributing your key services among several machines is ideal. Had we suffered the loss of a DNS server or a domain controller, we had at least two ancillary machines that would successfully go into failover.

2. Hot spares are essential. Our local hardware vendor did an excellent job in repairing our firewall. It still required time. In the meantime, we were without Internet while the repairs were being made. Having no hot spare for our firewall was an Achilles heel.

3. Cold spares are better than nothing. As for the destroyed workstations, we had some legacy Dell GX-270 Optiplex workstations on hand. We were able to install our key applications within minutes. We were able to utilize them despite the lack of Internet access as they depended only upon resources on our LAN to function.

This brings up another important observation.

4. If possible and practical, keep essential services on your LAN. We were able to operate because we did not remotely host core services. If you require Internet access to get to your data and your ISP or remote host throws a router or has a power outage, you don't want to be crippled as a result. It may not be possible to always avoid out-sourcing the hosting of applications and data, but at the very least you should ensure you are informed of their disaster mitigation and recovery policies.

Fortunately this story has a happy ending. We were prepared for a far worse disaster. Even so, it is good to constantly re-evaluate disaster recovery plans for flaws and potential improvements. If a small mylar balloon hitting a transformer can derail your operation, consider hiring someone to develop a plan for you if you are otherwise unable to do so yourself.

No comments:

Post a Comment