June 07, 2017
Fast Failback BMR: Bare Metal Restore without Downtime
If you’ve been in IT for any length of time, you know that bare metal restores can be challenging. A bare metal restore is the process of restoring data, applications, and an operating system to a completely unconfigured server. This typically occurs when a primary server must be replaced with an entirely new server or if a hard drive or drives need to be replaced. Server virtualization has reduced challenges associated with bare metal restores considerably, particularly when it comes to failover. However, the failback process still requires significant scheduled downtime.
“With traditional bare metal restores, you have to [stop the secondary virtual machine] and copy data back to the production server when you need to fail back,” said CEO Austin McChord in his keynote address Tuesday. “If you have hundreds of gigs or terabytes of data it can take hours. I’ve heard lots of stories from managed service providers about long, sleepless nights waiting for BMR processes to happen because their customer is down.”
That’s why we announced a new feature on SIRIS devices called Fast Failback(™), which is aimed at eliminating downtime associated with bare metal restores. Fast Failback provides faster failback when a disaster recovery scenario requires a bare metal restore. To accomplish this, Fast Failback BMR combines the Rescue Agent and Bare Metal Restore (BMR) features of a SIRIS device.
When starting a Local Virtualization, SIRIS users have the option to create a Rescue Agent VM from a selected snapshot. This essentially creates a fork in the backup chain. The original backup chain is preserved, but a second chain is created to track the changes to data that occur on the Rescue Agent. “That’s the key—users can continuously take snapshots of the Rescue VM while it is running to track the changes on the VM,” said Principal Engineer Phil Heckel during a demo of the Fast Failback BMR process yesterday.”
Fast Failback BMR allows users to perform a Bare Metal Restore from the snapshot of the original backup chain, while further backup operations continue on the Rescue Agent. When the initial BMR is complete, changes on the virtual machine captured by the Rescue Agent can be mirrored to the production machine ensuring that no data is lost. When the production server and Rescue Agent are synchronized, users can fail back with very little downtime by simply shutting down the virtual server and transferring just the last delta from the rescue agent.
“The downtime you incur when you switch back to the production machine is just a single reboot,” said McChord.