Data Backup and Redundancy
As data volumes grow and systems become more complex, off-the-shelf backup tools often fall short of expectations. Companies need to know that their data is stored securely and can be recovered quickly – using the minimum of space for the maximum benefit.
Blueberry Consultants has strong expertise in the design of custom backup solutions for Windows and Linux servers. We can create database backup and database redundancy systems that are tailored to your infrastructure and optimised to suit your data. Assessing your specific backup issues from the outset, we can design a solution that works optimally with your IT systems and strikes the right balance between reliability, accessibility, security and cost.
Introduction - Database Backup and Database Redundancy
IT systems are now so complex and data volumes so large that achieving reliable database backup and redundancy can be a big challenge for many companies. Users are more mobile, and the number of different systems used within organisations is increasing. With up to 50% growth year on year, the result is that many backup systems are struggling to cope. At the same time, achieving high levels of redundancy can be expensive and companies risk paying a premium for high-spec solutions that far outperform their needs.
Blueberry Consultants understands that every customer is different, both in terms of architecture and budget. We look at everything from simple off-the-shelf tools to solutions that are custom fit to your systems – with a constant focus on increasing capacity and reducing costs. With strong expertise in Linux system configuration, Cisco firewall administration, SQL Server management, VMware administration and Windows development, we have the technical skills to get the maximum benefit from even the most complex customer infrastructure. This wide range of skills is concentrated in a tight-knit team, allowing us to work with any customer infrastructure and design effective backup strategies for even the most complex systems.
Of course, many large organisations spend considerable amounts of money on backup tools. In many cases, these products and systems work perfectly well. In what situations might Blueberry’s skills deliver business benefit?
Optimisation of backup strategy to suit the data using custom compression and incremental transfer approaches.
Many standard database backup systems are quite simple in approach – they effectively just copy files. But many file sets contain redundancy which can be exploited to dramatically reduce backup sizes and time to copy.
Handling special cases – e.g. backup of extremely large files over unreliable links.
Again, standard products may fail to handle extreme cases. We can design solutions to handle all possible cases.
Off-site backup or replication of databases.
Conventional database backup strategies tend to make daily backups. For MS SQL server, we can use transaction log shipping to achieve offsite backup with 15-minute resolution.
Integration of virtualisation technologies with backup systems.
Many companies are moving some of their servers onto virtualised platforms, like VMWare. This provides a strong opportunity to achieve backup at the level of the virtual machine, thus eliminating human error in deciding which files to copy.
Special requirements can often be met at low cost using off-the-shelf free tools.
Using only free Linux tools, it’s possible to create a simple backup system that gives users easy access to versions of their files from previous days – without needing to make separate copies. In addition, Linux can take advantage of copy-on-write filesystems such as Btrfs and ZFS, which store snapshots of odl data as new data is written, and any common data between the old and new is shared, thereby making efficient use of storage.
Backup vs Redundancy vs Archiving
The technological overlap between database backup, database redundancy and archiving can often lead to confusion, but each has a different role to play in streamlining and safeguarding data. Backups essentially create a second copy of data at specific points in time, ideally keeping multiple historic copies. Redundancy establishes a straight copy of an entire system, ready to take over if the original system fails. Backup offers a certain level of redundancy, and redundancy a basic level of backup, but neither are stand-alone solutions.
Archiving makes a primary copy of selected data with the aim of retaining data in the long-term. Not all of the data contained in a backup will ultimately end up in an archive so archiving is rarely an adequate backup solution in itself but as a complementary approach, it can considerably optimize the data storage process.
Most backup strategies rely on a combination of backup, redundancy and archiving. An important factor to bear in mind when planning a backup schedule is prioritization of data. Not all data is created equal and a tiered backup strategy that restores the most critical applications first will get you back in business faster and cut data storage costs.
A full backup copies every file in a system. Restore times are fast but backups are time-consuming and space-intensive so scheduling and data prioritization are important considerations.
Differential / Incremental
Differential and incremental backups fill in the gaps between full backups, storing any changes to data. They require a fraction of the server CPU cycles, bandwidth and storage space. The risk of data loss is obviously greater than full backups and restore times are slower but Blueberry can use special snapshot technology such as Amazon EBS to rebuild images more rapidly.
A synthetic backup consolidates a full backup and subsequent incremental backups into a single file. Recovery is fast, using less server cycles and bandwidth.
In contrast to scheduled backup, continuous data protection (CDP) continuously tracks data modifications. CDP saves all changes and data can be recovered rapidly from any point in the past. Bandwidth burden is considerable but using compression techniques and block-level incremental backup, Blueberry can significantly reduce this load.
Mirroring is a redundancy solution that literally mirrors your systems by making a straight copy of data to two or more drives simultaneously. Thereafter only new and modified files are copied. Unlike a full backup, data is not usually compressed so recovery is faster.
Tools and Techniques
A very commonly used tool for incremental backup of files across the WAN. Most commonly used on Linux, but will work on Windows.
A more specific Linux file backup tool built on top of Rsync technologies. Has superior handling of network faults, and can use less CPU.
MD5 / SHA checksums
Modern checksum algorithms are used extensively in backup and file transfer to identify files and confirm successful transfer. We have access to optimised libraries that perform these calculations particularly quickly.
BB FTE (File Transfer Engine)
A Blueberry-developed library that supports reliable block-based transfer of binary files over HTTP/S, with per-block checksums and strong resume capabilities. FTE is superior to file transports such as FTP and plain HTTP because it detects block level errors and retries at the block level.
MS SQL Server Transaction Log Shipping
This is an MS SQL Server feature which allows efficient continuous backup of SQL server databases. With correct configuration, SQL server will write out a file containing all the changes to a database every 15 minutes. This file can be sent to an offsite server and restored. The backup server does not require an MS SQL licence.
A popular virtualisation tool. VMWare allows multiple virtual PCs to run on a single host machine. The relevance for backup is that the virtual machines can be suspended and then copied to a backup server.
A Linux technology used to allow reliable replication of disk volumes across a LAN. We’ve used this to establish auto-failover for VMWare servers.
Backing Up To The Cloud
Cloud-based services such as Amazon’s EC2 offer a cost-efficient and scalable option for offsite backup. Users can easily access data remotely and businesses can expand their storage requirements as needed. There are still some issues to consider when choosing a solution. Pricing models vary from provider to provider, ranging from tiered pay-as-you-go options to basic flat fees. Some vendors may also charge for additional backup services. Sufficient bandwidth is a crucial consideration, although many providers will only send changed data over the network after the first full backup. Data security is also a valid concern and Service Level Agreements (SLA) should be carefully scrutinized to ensure the proper measures are in place.
The same is true for all backup offerings, particularly when delivered as part of a package by hosting companies or Internet Service Providers. It pays to find out exactly the level of backup on offer. SLAs should stipulate specific levels of data availability and set timeframes for recovery.
Common Database Backup Issues
Live data presents a number of challenges for database backup, particularly in the case of database files that are continuously being written to. Ensuring no changes are lost in the backup process can require considerable configuration. Blueberry can apply a number of optimization techniques to ensure continuous backup of live data.
Backing up Web or database servers can also be tricky. Traditionally, a list of folders is written to backup but this leaves data open to human error. A full backup is the failsafe option but with storage space at a premium, solutions that combine one full backup with subsequent incremental backups offer a cost-efficient alternative. One example is the incremental snapshot system provided with the Cloud-based Amazon EBS service.
Backing up in virtual environments demands a different approach from physical servers. The critical factors are storage availability, configuration and management. If these are properly addressed, the benefits can be considerable. Blueberry has experience in engineering unusual backup systems in virtual environments and mirroring virtual machines from one server to another.
Achieving high levels of redundancy usually requires two servers. This can be costly and companies should consider whether instant redundancy is really vital for their business. With solutions such as Amazon EBS, for example, a new system can be set up from a snapshot in just 30 minutes without the need for two servers. Redundancy of database servers is more complex to configure and usually requires some level of mirroring or replication. Blueberry Consultants leverages a range of tools and techniques to optimize this process. These include custom compression, encryption, data deduplication and incremental transfer approaches.
Testing is often the missing link in database backup strategies. The time and money invested in backing up data is too great to risk your recovery plan failing at the critical moment. Companies should schedule regular testing of their backup and restore processes. Cloud setups offer the most convenient test environment with the capacity to restore to a second server instance temporarily. Backup reporting tools can also help safeguard your data by tracking backup failures and determining their causes.
These reporting tools look at the whole data protection lifecycle. This type of holistic approach is perhaps the key to optimising backup on an ongoing basis. The big picture is critical. You need to know your business, know your provider and know your limits in terms of budget, bandwidth and storage space before you go looking for a solution. This not only means analysing your data volumes and usage, but also assessing how well your current backup tools are aligned with your business priorities. And don’t let technology stand in the way of these priorities. Whatever works best for your company can be made to work best for your IT systems.
Case Study Example – ABC Ltd
ABC Ltd has a heterogeneous collection of Windows / Linux servers located at a central data center, including a number of systems running VMWare, and some SQL Servers. The company needed to demonstrate to clients that they had a disaster recovery plan in place, and that all key data was backed up offsite.
Blueberry designed a backup plan based on three different data types used within the ABC network – conventional user files, SQL databases and large virtual machine images. A single new server was installed at a remote location, and configured with a Cisco firewall and a dedicated 24mbps DSL line. For the most important user files, a daily sync job was used to replicate the files securely over SSH. For the SQL databases, transaction log shipping was used in conjunction with Blueberry’s FTE system to replicate the databases to a parallel SQL server running on the backup system. For the large VM images, duplicity was configured to run on a slow incremental cycle, establishing backups of 200gb of VM images on a rotating monthly basis.