Building Storage Proof Applications
A storage system failure where more drives have failed then the RAID protection scheme will cover you from or where the storage software itself has crashed is a disaster. The amount of work required to return the system to service can be daunting and applications are likely to experience downtime, in some cases that time can be significant. We have to start working on building storage proof applications.
A storage system failure where more drives have failed then the RAID protection scheme will cover you from or where the storage software itself has crashed is a disaster. The amount of work required to return the system to service can be daunting and applications are likely to experience downtime, in some cases that time can be significant. We have to start working on building storage proof applications.While this type of storage system failure is still rare, I do believe that we are seeing a slight increase in the occurrence of this type of failure. I also feel that the impact of such a failure, application unavailability, is more significant than ever. We count on applications more so than in the past and the size of and reach of those applications is larger than ever. In short more users are impacted for a longer period of time as IT scrambles to try to return the application to service. Recovery from any type of backup device may be too slow. In either case it is important to start considering how to build storage proof applications.
As drive capacities increase the time it takes to rebuild a RAID set after a drive failure can now take days in some cases. The chances of a second or even third drive failing during that rebuild process also increases. There is also the impact on performance during the rebuild process. The more you allocate storage processing toward the rebuild effort the faster the rebuild occurs but the slower the application performs. If you allocate more processing toward the application the rebuild process slows down and you are exposed to additional drive failures for a longer period of time.
As we discuss in our recent article "What's Missing From Your Disaster Recovery Plan?" application or operating system clusters often won't help much here. Most rely on shared storage. If that storage fails there is a chance that your application cluster just failed along with it. Most operating system level clustering technologies won't detect specific application failure nor will they monitor performance conditions.
There are a few ways to protect your application from its storage. The first is a better storage system with multiple, more than two, controllers that are resilient to a storage software failure, meaning you can roll a storage software upgrade to each processor. There is also a growing number of backup applications that allow data to be served from the backup device. The third option is to use failover applications that can make sure that application data is being written to two separate storage systems at the same time. The use of software would allow the deployment of a more mid-range storage solution to support an enterprise class storage system. Most of these software solutions will work across applications and not require special versions of operating systems. Some are even application aware, so they can detect an in-application failure or performance degradation.
Armed with this level of resiliency, applications can now be kept available even if the worst case local disaster occurs, a storage system failure. Too often we focus on getting data out of the data center, when in reality the data center is fine. It's these inside the data center failures that really get you into trouble, a software based tool is something to look into to make those troubles go away.
Track us on Twitter: http://twitter.com/storageswiss
Subscribe to our RSS feed.
George Crump is lead analyst of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. Find Storage Switzerland's disclosure statement here.
About the Author
You May Also Like