Everyone likes to talk about the 5 or 10 pitfalls or and challenges when doing your Disaster Recovery Planning, as an example here from Jeff Byrne of the Teneja Group, http://searchsmbstorage.techtarget.com/tip/Common-pitfalls-in-disaster-recovery-procedures. I say there are simply 3: Planning, Don’t just do the easy stuff, and Don’t set and forget.
Planning
It is critical to architect your Disaster Recovery solution with a focus on each applications importance to your business. Don’t assume classic assignments of Tier 1 and 2 applications. Discuss with the business owners how they use each application and its importance to delivering revenue and critical services.
Understand the impact of lost data for each application. More importantly each file associated with an application. This might seem like a daunting task, but it will save your business and reduce the cost of your DR implementations. Managing your DR at the application file level can save you time and money from unneeded synchronization cycles while only targeting critical files that impact the viability of the application and its performance. This reduces burden on the WAN and in a cloud DR implementation save considerably on storage while ensuring a high Recovery Point Objective (RPO) is met.
Understand what the tolerance is of the application full availability after a DR event. How quickly do you need an application up and running? Maybe 10% of your businesses application need sub 5-minute Recovery Time Objectives (RTO). The rest of the applications have tolerances even as great as up to 24 hours. Balancing cost and availability allows you to selectively determine the staging and provisioning of the application’s DR instance. An application that can tolerate an RTO greater than an hour can live in stasis saving compute resources and only those hyper critical applications need to be consuming compute and networking overhead on-going.
What is your overall IT architecture? In a hybrid environment you will have physical and virtual serves across data centers, public, and private clouds. After a DR event where do you want each of your applications primary instance to exist? Is it remaining in the cloud, or does it need to rehydrate back to a physical instance and location? Understanding these questions lets you make decision on how you provision each application’s DR instance. This is important to ensure performance expectations are being met and helps in balancing data center and cloud resources costs.
Don’t do the easy stuff first
I talk with so many customers that tell me they have 80% of their applications in a Disaster Recovery plan. When I ask what about the other 20% they tell me: “It was the hard stuff”, “the applications are on physical servers”, or “It is a legacy application”. Further inquiry about the critical nature of those absent applications customers tell me that those are the crown jewels and they are relying on simple backup copies, worse yet it is on tape! This is insane, there is no application that can’t be included in a Disaster Recovery Plan, and it doesn’t have to be as hard as you think. There are Cloud Management Platform solutions that include Disaster Recovery as a feature. These CMP solutions accommodate all applications, physical and virtualized, in data centers and clouds. Protect your crown jewels not just the easy stuff.
These CMP solutions also save your IT department time and resources. They eliminate multiple point products and their support costs. They simplify the environment and reducing resource needs. These CMP solutions also provide cloud resource management, improving cloud governance and cost management. Some even provide the ability to have cloud resources self-provisioned by application owners while ensuring application templates and distribution adhere to IT and business standards.
Don’t Set and forget
Another frightening trend with IT departments is the set and forget habit. Since we don’t (hopefully) have DR events happen all the time testing the system doesn’t have a high priority. The disruptive nature of some DR implementations requires down time, and upgrades and maintenance take precedence. Just like eliminating 20% of applications from the DR plan, not testing on a quarterly basis is ludicrous. Take a lesson from anyone who attempted to do a restore of a critical application from a tape backup and had it fail. There are solutions that are non-invasive and non-disruptive, there is no excuse to Test, Test, Test, and Test.
In Conclusion
Plan, build a Run Book so the process for each application is documented and not a one size fits all approach. Don’t leave any application behind, this is your companies life blood and there is no excuse for not having every application protected. Inspect what you expect, watching the clock tick away with fingers crossed waiting for your DR instance to come alive is not the time to test the system, but it probably will be the time to update your resume.