This discussion provides an understanding of what is included in a sound backup strategy for your Sql Server data warehouse application.

Most data warehouse applications require a different backup and recovery strategy than a typical online transaction processing (OLTP) system.   While OLTP systems may perform insert, update and delete transactions frequently throughout the day and may archive off older historical data periodically, data warehouses typically experience a single high volume nightly load to refresh the state of the data through the prior day and contain a large amount of static historical data that may span ten years or more.   These differences puts less dependency on transaction logging to restore the database to a complete state, but creates new challenges on managing disk space and processing times.

 Business Needs

Let’s consider the problem from the business perspective.   Here are common needs that the recovery process must serve:

  • Quickly be able to restore from the latest database snapshot state, which in most cases should be as of the prior days’ data load, and catch the data loads up to the current state.
  • If a backup file becomes corrupt, you want to be able to use another available backup file and have an easy way to restore from that backup, and then add any missing day’s data.
  • You want to ensure the IT staff can quickly perform a database recovery and easily recover the missing scheduled loads.
  • You want to ensure your backup strategy will not consume your available storage space unexpectedly, adversely effecting performance or availability of the application.
  • You also may want to be able to present these procedures in the context of a Service Level Agreement to business management so that expectations are clear in case a database recovery is required.

 

Considerations for Your Recovery Strategy

Before we can settle on a good recovery strategy for your situation there are a few variables that you should define.

  • What is the current size and the growth rate of your database? Can you project the size of a full database 1, 2, and 3 years from now?
  •  How long does a full backup take? And do you have adequate disk space to maintain three full backups at any one time? Given the growth rate of your database can you plan to have sufficient storage for backup files 3 years from now?
  •  Is your historical data truly static?   In some cases historical data is re-run through the ETL periodically to accommodate for adjustments and corrections made in the source systems.   If this is the case for your application, then those data sets should not be treated as static.   Identify how far back in time datasets in your data warehouse truly are static, and the time period when they are dynamic. To simplify, look across all of your data sets and determine the latest point in time that you are assured the data will never change.
  •  ETL processes may be designed to aid in the recovery process by extracting more data than is actually needed for a 24 hour update.   For example, if 96 hours of data is extracted then the ETL processes can skip four days of processing and still “catch up” to the current state, without human intervention. This depends on the ETL process design, the size of the extracts, and your processing performance, but assuming you have the capacity then there may be some data recovery built-in to the ETL process.

These variables can help you determine which of the following categories your data warehouse may fall into.

 Size and Complexity affects Your Backup Strategy

Let’s consider three approaches that depend upon the size and complexity of your data warehouse environment.

Small Size, Low in Complexity – if your data warehouse is on the small side and low in complexity, meaning data updates only occur during a single batch load, then configuring nightly full backups is an excellent option. In Sql Server you choose ‘Simple’ as your Recovery Model to turn off transaction logging. Be sure to store the backup files on different media than the database.   And make sure old full backups are deleted to effectively manage disk space.   You can achieve this using a Maintenance Cleanup Task within your DB Maintenance plan in Sql Server.

Many make the choice for the full backup to occur before the nightly load process runs. This places the recovery point before the nightly data load in case there are issues with the ETL process.   With this timing a restore can be performed followed by a re-run of the nightly ETL process.

Medium Size, Moderately Complex –   go to a periodic Full backup (either monthly or weekly) with Daily Differential Backups.   In Sql Server your Recovery Model will still be simple. Differential backups are completed at the page level. This will create both full backup files when those are run, and pages changed backups on a nightly basis.   The decision to choose monthly or weekly will be affected by the size of your nightly loads relative to your available disk space. Large daily load which affects many pages will result in larger differential backups, but if a large load which affect less pages will result in smaller differential backups. Remember to store both the full and differential files on a different media than the database.   This will help save storage space on the database server as well with limiting the amount of time for the backupsrestores to complete.

Large in Size and Complexity – For large data warehouses, segmenting the database into different file groups can help reduce the overhead and time needed to complete a full backup of the system. If you can safely group your tables in to logical partitions which can be converted into physical groupings of read only, occasional changes, and nightly changes you can plan a backup strategy for the file groups accordingly. Instead of performing a single full back up, you would instead perform backups of the individual file groups.

Best Practices of Any Sound Recovery Strategy

  • Become familiar with Sql Servers Database Maintenance Plans. Here you will find graphical tools to design and schedule your backup processes.
  • Keep your recovery approach simple and aligned with your business goals. SQL Server provides a wide variety of backup and restore features. Keep your approach simple while meeting your needs.
  • Schedule periodic test recoveries.   Ask staff, both experienced and inexperienced, to participate.
  • Store backup files on a different media than the databases
  • For more complex applications have a DBA experienced with large data warehouse applications design the initial maintenance plan.
  • Schedule backups for off-peak hours.
  • Compress backups to help reduce the overall footprint of your recovery strategy.

 

 Summary

A carefully thought-through backup and recovery process should provide the following benefits:

  • Quickly be able to restore from the latest database snapshot state, which in most cases should be as of the prior days’ data load, and catch the data loads up to the current state.
  • If a backup file becomes corrupt, you will be able to use another available backup file to restore from, with the option of re-applying any missed processing days via your ETL processes or your transaction log files.
  • Your IT staff should be able to quickly perform a database restore and recover the missing scheduled loads.
  • By measuring the size of your backup files and moving them to media other than the one the database resides on you should be able mitigate the risks of consuming available storage and completely losing backup files to hardware failure.
  • You can present these procedures in the context of a Service Level Agreement to business management so that recovery expectations are clear in case a database recovery is required.

While we recommend having an experienced Sql Server DBA assess your backup and recovery needs and design a maintenance plan for you, hopefully this discussion provides a better understanding of what might be included in such a plan and how to go about matching the right approach to your own data warehouse application.

 

Collaborators:  XTIVIA’s Sql Server Database Administrator, Kat Meadows and  Business Intelligence Solutions Architect, Steve Loyd