Optimizing Primary Storage
Data deduplication has done much to optimize disk backup storage, but can those same efforts be successful in primary storage? Primary storage is, of course, different than secondary storage. Any latency can cause problems with applications and users. Thin provisioning, which I wrote about last week, can help a great deal, but once the data is actually written, the space is allocated. How can you make primary storage take up less space?
Data deduplication has done much to optimize disk backup storage, but can those same efforts be successful in primary storage? Primary storage is, of course, different than secondary storage. Any latency can cause problems with applications and users. Thin provisioning, which I wrote about last week, can help a great deal, but once the data is actually written, the space is allocated. How can you make primary storage take up less space?Three of the approaches on the market today are in-line compression from companies such as Storwize or a post-process crawl approach performed by companies like Network Appliance, and the mixed-mode approach of companies like Ocarina Networks.
In-line compression is pretty straightforward; an appliance is put between you and the storage. Everything going through the box onto that storage is compressed; everything going through the box from storage is decompressed. The post-process crawl is also straightforward. During idle times on the system, files are scanned for duplicate blocks of data stored elsewhere in the array and are deduplicated.
We will explore the other methods in future entries, but today I wanted to focus in on the mixed-mode approach by Ocarina Networks. Its system does post-process crawling of data for deduplicated blocks and chooses the best compression method for the file. This crawling process gives Ocarina the time needed to be data-format aware. Decompression of data is done in-line, on access. This allows for control over what files are optimized for space savings and allows for the functionality to be carried across multiple platforms. You can set up the system, scan the file system, and only compress/deduplicate the files that have not been accessed in the last week, leaving the very active files unmodified. Files that have not been accessed in a week can then be space optimized. Different levels of optimization can be chosen based on file type or age, allowing you to decide between highly optimized access; the less compression, the less performance impact or highly optimized storage efficiencies. Since this decision can be made on a file-by-file basis, it gives you the freedom to "dial in" the correct level of optimization for each.
The Ocarina solution offers data movement as well. For example, you could have a rule any file that has not been accessed in six months will be space optimized to its highest level and then moved off of your expensive NAS to a disk-based archive that uses SATA drives. Because of Ocarina's mixed-mode architecture, spanning across different storage manufacturers' platforms is simple.
The challenge in offering data optimization solutions for primary storage is that it's so easy not to optimize it and just keep buying more shelves of drives. To be a compelling alternative, primary storage optimization suppliers need to do more than just offer 2:1 space savings, and they need to leverage their implementation to offer other capabilities, data movement being a good example.
For a detailed examination of Primary Storage Optimization, see our article on InformationWeek's sister site, Byte and Switch.
George Crump is founder of Storage Switzerland, an analyst firm focused on the virtualization and storage marketplaces. It provides strategic consulting and analysis to storage users, suppliers, and integrators. An industry veteran of more than 25 years, Crump has held engineering and sales positions at various IT industry manufacturers and integrators. Prior to Storage Switzerland, he was CTO at one of the nation's largest integrators.
About the Author
You May Also Like