This article is part 7 of a 10 part series explaining the most common mistakes that I have seen in reference to SQL Server Performance and Resiliency. However, this post is not all-inclusive.
Most common mistake #7: Disaster Recovery Plans
Often when people hear “disaster recovery plan” their first concern is cost. Disaster recovery plans don’t have to be expensive, expensive disaster recovery plans come from strict requirements.
About 10 years ago when I started as an independent consultant one of my first clients was contacting me to help build a disaster recovery plan for them. After our initial discussion I learned that some consulting firms had forecasted one hundred thousand dollars for solutions. Many large companies would look at that number as a bargain, however this client’s company made less than 50k a year. The data changed about once a year, and if the database was down a week or two it was questioned if anyone would even notice. It was easy to see that the hundred thousand dollar solution was extremely over engineered for this client.
Don’t ignore the basics
Disaster Recovery Solutions should start with two basic questions, what is the recovery point object and what is the recovery time objective.
• RPO – Recovery Point Objectives – To what point must the database be restored after a disaster? Another way to ask this question would be, how much data can be lost?
• RTO – Recovery Time Objectives – How much time can elapse after the disaster has occurred? Or, how long can your system can be down?
Depending on these answers additional questions will arise, however these two questions can help determine what potential solutions will work. SQL Server offers a number of solutions from Transaction Log shipping to AlwaysOn Availability Groups.
Pay Attention to the Details
Whenever I visit a datacenter for a client I make sure that I take some time to review how the cages are wired. On more than one occasion I have seen servers with redundant power supplies that have both of the power cords plugged into one circuit. This configuration will protect you if one of the power supplies goes bad, however if the circuit goes down the redundant power supply isn’t any help.
When executing a disaster recovery plan, ensure all the small details are double checked. If there is a single point of failure in the system, Murphy is going to find it.
Test
I can tell you the most common mistake I see, on a regular basis, with Disaster Recovery solutions is the lack of testing. Some testing is better than no testing, but the best testing is testing that mimic’s actual disasters. If there is a power outage for your servers and you have 5 min. to get everything moved, do you know the steps to complete, before the unlimited power supply loses its charge? What steps must you take if you don’t have the 5 minutes? I was working with the chief technology officer for a major education facility and he had another vendor telling him he was safe. Saying he didn’t have to worry about it. The contract was for a 15-minute recovery point. We reached out to the vendor and asked them to prove it.
The lesson here, perform regular realistic tests, if they don’t work, find out why, and make the needed changes.
If you have questions about SQL Server Disaster Recovery plans or need assistance with SQL Server in general, reach out to us! XTIVIA and we can assist you with adding resiliency for your business. Please don’t miss my other blogs regarding this topic. https://www.xtivia.com/contact-us
Top 10 Tips for SQL Server Performance and Resiliency
1. Improper Backups
2. Improper Security
3. Improper Maintenance
4. Not having a baseline
5. Max Memory settings
6. Change History