How Cloud is changing the face of Disaster Recovery?
By Sash Sunkara, CEO, RackWare Inc.
The advent of Cloud infrastructure, both public and private, has completely changed the way the data center operates. It has increased data center agility, reliability, and scalability, while lowering costs. There is one area, however, where Cloud can play a vital role which almost no one is taking advantage of today: Availability.
Data center workloads are typically divided into two camps: (1) Critical workloads and (2) Non-critical workloads. Critical workloads, the ones that can’t tolerate even a few minutes of downtime are usually protected with real-time replication solutions that require a duplicate system that acts as a recovery instance should the production workload experience an outage. Non-critical workloads can tolerate a wide range of outage times and are typically unprotected, or backed-up with image or tape archive solutions. Cloud technology has introduced the possibility of an intermediate solution, where non-critical workloads can get the benefits of expensive and complex high availability solutions, such as failover, without the high cost.
There are 4 ways that Cloud infrastructure can be used to improve your data center’s availability today:
1. Prevent Downtime by Reducing Resource Contention
Unplanned downtime occurs for many reasons but one of these reasons is due to processes fighting over resources (resource contention). As businesses grow, the demand for resources usually grows with it proportionally. Data center workloads are typically not architected to handle the variable demand, and outages may occur due to the inability to handle peak load times. That’s where cloud scaling and cloud bursting come into play. The Cloud has afforded data center managers a way to accommodate drastically changing demands on workloads by allowing the easy and automatic creation of additional workloads in the Cloud without changing or customizing its applications. This is especially true of public clouds, since data centers can “burst” or extend their infrastructure into a public cloud when needed. This alleviates resource contention through automatic scaling out to the Cloud when necessary and ensures that resources are available in to accommodate spikes in demand and prevent downtime and increase overall availability.
2. Replicate workloads into the Cloud to create Asymmetric “Hot Backups”
Cloud infrastructure has created the wondrous ability to clone the complete workload stack (OS, Applications, Data). When combined with a technology that can decouple the workload stack from underlying infrastructure, this “portable” workload can be imported into public or private clouds. In the case of downtime on the production workload, sessions can reconnect to the Cloud instance where processing the service can resume, even if the production workloads and recovery workloads are on differing infrastructures. The Cloud allows data centers to move beyond traditional ‘cold’ backups, where only data is being protected, which requires OS, and applications to be restored manually before data is restored. The notion of the Asymmetric “Hot Backup” is made possible by the Cloud because every workload stored as an image can be “booted” into a live, functioning virtual machine which can take over workloads while the production server is being repaired. This differs from the traditional replication solutions whereby a duplicate set of hardware is required to take over workloads should the production workload fail. Changes that occur to the production instance are replicated into the recovery instance on a periodic basis to keep it up-to-date. Cloud introduces the flexibility of having a flexible recovery instance to save on costs, since the hot backup can be “parked” when not in use, or a smaller instance can be provisioned.
3. Introducing the concept of “Failover” and “Failback” typically reserved only for Critical Workloads
Software replication technology has existed for decades, used to protect and replicate in “real-time” between production and recovery workloads. Typically these setups are extremely expensive from a software and services perspective as they often require a duplicate identical recovery setup, doubling the cost of maintaining and running costs of infrastructure. Meanwhile, the rest of the workloads in the data center are under-protected, typically protected only by slow-to-restore images and tape schemes, which take days to restore, and much undue manual effort. By automating the switching of users or processes from production to recovery instances, downtime can be reduced by up to 80% for the majority of under-protected workloads in the data center.
4. Using Dissimilar Infrastructure for “Off-Premises” Redundancy
For added protection, data centers should consider dissimilar cloud infrastructure to use as part of their DR strategy as well. Cloud infrastructure can be prone to failure, and for data centers that require extra level of protection, workloads should be replicated off-site to different cloud providers. Physical-to-Physical, Physical-to-Cloud, or Cloud-to-Cloud replication can offer a level of protection that can be robust enough to overcome site-wide Denial-of-Service attacks, hacking, or natural disasters.