Getting To The Last Copy Of Data
One of the storage management challenges we see every day in customer data centers is there are too many copies of data in circulation. Ironically its this fact that built much of the value and motivation behind data deduplication. It should not be this way. Why should you get to a last copy of data?
One of the storage management challenges we see every day in customer data centers is there are too many copies of data in circulation. Ironically its this fact that built much of the value and motivation behind data deduplication. It should not be this way. Why should you get to a last copy of data?One of the downsides to inexpensive capacity is that storage practices don't have to be as strict. You can store hundreds of versions of the same or similar data and suffer limited hard cost impact. Deduplication further enhances the affordability of capacity making this practice more forgivable from a expense standpoint.
Of course the data is not just stored multiple times on the file server, versions of it exist on laptops, thumb drives, tape media, replicated disk and a host of other "just in case" storage locations. Ironically it seems, especially as this data ages, having this many copies of the same piece of data make it no easier to find nor any faster to recover, it just means there are that many more places to look for the data.
Ideally a best practice would be that as data ages there are less copies of it and the final copy moves to a known good location, potentially a disk archive solution and is replicated to a disk archive at a disaster recovery site. This means that part of the policy will be to have inactive data moved to an archive much sooner. Disk archive as we discuss in our article on Archiving Basics enables a much more aggressive migration policy because the recall of data happens with almost no noticeable performance impact on the user. In addition the backup application will need to be set to tape media age-out and be retired much sooner.
Archiving solutions like those from EMC, Nexsan or Permabit can set the retention time of this files as they are being stored. For example they can be set to make the data unmodifiable for 7 years and then have them deleted after 10 years. The key is that once you decide you need information from a data set or that you need to get rid of a data set, you know exactly where to go to find that data.
Most of the archives can be indexed by solutions like those offered by Index Engines or Kazeon to help find data on the archive itself but also help identify the data on primary storage that needs to be archived. In our next entry we will discuss how eDiscovery is evolving from a litigation readiness application into a more mainstream application that helps storage managers achieve goals like last copy of data.
At some point in a file's life you have two decisions to make; keep it or delete it. If it is a keep it decision that means that you think someday you will need that data again. If so, you don't want to hunt all over the data center for that data, you want it in one place. Even more so, if it is a delete it decision you want to know you have removed all the copies and versions of that file. Both are best enabled by an IT Discovery application and a final storage location like a disk archive.
Track us on Twitter: http://twitter.com/storageswiss
Subscribe to our RSS feed.
George Crump is founder of Storage Switzerland, an analyst firm focused on the virtualization and storage marketplaces. It provides strategic consulting and analysis to storage users, suppliers, and integrators. An industry veteran of more than 25 years, Crump has held engineering and sales positions at various IT industry manufacturers and integrators. Prior to Storage Switzerland, he was CTO at one of the nation's largest integrators.
About the Author
You May Also Like