Sunday 25 July 2010

The Question of Storage and IT

Over the past several years, something insidiously unstoppable has been growing in the Data Centres and server rooms around the globe, we shall call it storage capacity. In the early days storage capacity grew slowly; an application owner might require 10 or 20 gigabytes for their CRM application or their SQL database. The IT manager would, as always resist, ask for further justification; though as always, the storage would be procured and provisioned for the application, by connecting an additional disk drive or moving data to another server. Disk expansion in the early days was controlled by the physical capacity of the server or the attached storage unit, this limitation was physical and absolute in many cases, if the drive bays were full, larger drives might be installed in place of the existing disk drives and the array expanded once again. The next issue was often OS limitation preventing RAID arrays from being easily expanded and accessible by the OS, allocation restrictions imposed a restriction on data growth. If additional storage capacity cost more than the data was worth then the data didn’t get stored.
In traditional IT departments; email and database servers were often the worst consumers of ‘difficult’ storage, IT departments had a constant battle to keep these complex applications in check not to mention the storage being consumed. Many now ‘aged’ IT administrators have been involved in the thankless search for the largest mailboxes and the 'power' users who owned them, where disks were Direct Attached Storage (DAS) and the production Microsoft Exchange 5.5 server had stopped the Information Store Service because there was not sufficient space left to continue processing email data.

Technology had been evolving to solve this conundrum but the cost of the solution remained prohibitive until a few years ago.
The solution was the Storage Area Network (SAN); suitable for Databases and high transaction applications and Network Attached Storage (NAS) technology for file storage where transactions per second was not critical.
It is clear that SAN and NAS technology has revolutionised IT, capacity now is easier than ever before to acquire, provision and re-provision between servers that have a Host Bus Adapter (HBA) – though its flexibility made SAN/NAS more expensive than DAS.

SAN/NAS technology has meant storage can be evenly spread amongst the servers that need it most and the storage can be reclaimed and re-provisoned when any server requires capacity. This step is a large jump up from the scenario where a Server has a Disk array attached and only a single server can access a disk at any one time. There still remains an issue where the available capacity is exhausted. Some of you have seen what happens when the SAN runs out of space and another storage unit needs to be procured, the price can be astronomical in comparison to the individual disk drives consumers purchase for home use.

To this problem, storage vendors and clever developers came up with a further solution to still potentially inefficient storage assignments through DAS; which became expensive centrally managed storage, albeit easy to assign and manage SAN; till finally the concept of Storage Virtualisation was born.
Finally SAN provisioning became more flexible yet again, Storage administrators often over allocate the storage 'assigned' to a host machine to allow for data growth. Now SAN storage could be allocated and the OS would believe it had all the storage it desired, therefore removing OS limitations on resizing volumes as they would be set up right at the beginning without regard to the actual physical SAN capacity available. Many amongst you realise that allocating storage before it exists comes along with a risk (there can be no power without risk)?
Thin Provisioning relies on the fact the storage is generally over allocated to Servers (OS drives will have 5-10 GB at least unused to start with) and in most cases this storage might as well be available in the general storage pool until used, thin provisioning has a good use case but can be catastrophic if the underlying storage actually runs out of physical storage.

As illustrated, Technology has always solved the issue of storage in the enterprise, capacity keeps growing and the business keeps using the capacity available.
Once storage stopped being an issue, users started storing everything without consideration to cost, afterall it was easy enough to add more capacity.
Added to this primary usage of storage, backups suddenly went from being Production Server disks -> Backup Tape (two copies) to Production Server Disk -> DR server Disk -> Backup Staging Disk -> Backup Tape, often upto 4 times the initial data size and all of it had to be stored and this doesn’t take into account the built-in RAID level.
Despite the obvious benefits to storing data more than once, firstly its faster to recover, the secondary site is there incase the primary site fails, the staging disk speeds up the backup speed by allowing the 'Backup Agents' to write to Disk Cache for DeDuplication and serialisation, its still very expensive use of capacity.

What are we storing and why are we storing it is the question to ask? We are all living in the equivalent of a 'Hoarders' household, our IT networks and servers are chocked with information that is completely and utterly redundant and yet it all gets retained just in case, in case of what?, what is the use case of reusing / retrieving this data. Its understandable in Public sector or construction that data may need to be retained for a long period of time but most data in most businesses after 3 months becomes the equivalent of household trash, its most likely that the data will never ever be accessed again and how expensive is the infrastructure being used to store this Trash?

The SAN/NAS/TP places the issue of storage into the history books or did it just make a brand new issue, one that created a lot more revenue for storage companies and more cost for IT departments in Hardware, Energy and floorspace?


There is a solution and its easy! Chargeback on the capacity used by Business Unit and Spring cleaning of the file systems.

No comments: