Cleaning The Digital Dump

One of the challenges that IT faces is getting rid of all old unused files that are clogging up primary storage. Primary storage can have data on it that has not been modified or even opened for years. The challenge is how do you deal with the digital dump, especially since most IT people don't have the authority to delete other peoples files?

George Crump, President, Storage Switzerland

August 11, 2010

3 Min Read
Dark Reading logo in a gray background | Dark Reading

One of the challenges that IT faces is getting rid of all old unused files that are clogging up primary storage. Primary storage can have data on it that has not been modified or even opened for years. The challenge is how do you deal with the digital dump, especially since most IT people don't have the authority to delete other peoples files?While there are many things we can do to minimize the junk on primary storage, deduplication and compression for example, this data should be removed from primary storage as soon as possible. Also this does not conflict with a keep it forever strategy. You may want to keep it forever, but you don't want to keep it on your most expensive storage.

Beyond space optimization there is also the concept of auto tiering. Again this capability serves a purpose but for long term storage it typically is still less expensive to move this data to a different platform. Your primary storage also may not be as safe a place to hold that data. Longer term storage platforms have built in capabilities to verify data integrity for years into the future. So at some point it makes sense to move this digital junk to a separate storage platform, designed for the task.

The first step is deciding how you are going to get it there. This can be as simple as manually copying the data to the secondary storage platform. This would involve using some sort of tool to identify that data and in most cases means there is no transparent link to bring it back in case a user needs it. The alternative is some sort of automated technology. As we discuss in our article "What is File Virtualization?" one of the best ways to accomplish this is with file virtualization technologies. They can transparently move data from one storage platform to another and then sent up a transparent link to the file. In most cases this is done without the use of stub files. They operate similar to a DNS server. You don't need to know the IP addresses of every web site, you reference them by name. File virtualization is similar, you don't need to know where the file is, just the name of the file and the file virtualization appliance handles the rest.

The second step is deciding what platform to put this data on. Most often here you are looking for something that is reliable, scalable and of course cost effective. Depending on your environment you may also want to have some level of power management available to you as well. To work with file virtualization these products will need to be able to present themselves as a NAS device, although some file virtualization vendors are working on directly supporting object based storage devices. This would allow you to use a NAS for primary storage and then use an object storage system for long term data retention.

The final step is deciding what to move. The obvious target is data that has not been accessed in years. The problem is that if you are like many customers over 50% of your data has not been accessed in years. Freeing up that much storage may be too much of an initial jump. What we recommend is that you decide how much storage you need to free up and then use a measurement tool to decide what age group of files will equal that amount. This gives you the ability to start slowly on data that is not being accessed at all while you get comfortable with the process.

Track us on Twitter: http://twitter.com/storageswiss

Subscribe to our RSS feed.

George Crump is lead analyst of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. Find Storage Switzerland's disclosure statement here.

About the Author

George Crump

President, Storage Switzerland

George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for datacenters across the US, he has seen the birth of such technologies as RAID, NAS, and SAN. Prior to founding Storage Switzerland, he was CTO at one the nation’s largest storage integrators, where he was in charge of technology testing, integration, and product selection. George is responsible for the storage blog on InformationWeek's website and is a regular contributor to publications such as Byte and Switch, SearchStorage, eWeek, SearchServerVirtualizaiton, and SearchDataBackup.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights