Bluefin › Mathew MattDisaster recovery - when things go Mad Max
Mathew Matt

Disaster recovery - when things go Mad Max

Mathew Matt , Principal HANA Architect, Bluefin Solutions

Published in 5 Categories - Thursday 22 December 2016

Let’s be brutally honest and face it. At some point in time, things will go bad. It’s happened to many corporations, large and small. The question is simple: are you prepared for a worst case scenario? If your primary data center was pummelled, destroyed, obliterated, incinerated, mashed, flattened, or otherwise rendered inoperable, could you recover? In short: can you still do business in the post-apocalyptic wasteland? Can you still sell widgets?

Admittedly this is all pretty dramatic but I had to try and hook you somehow, didn’t I? I realize though, it’s not totally inconceivable that you could have a true disaster, natural or otherwise, that completely renders your primary datacenter useless. With a little foresight and planning, you can sleep a little better at night.

We’ve been through the concepts of High Availability (HA) and Disaster Recovery (DR) in the first post of this series, and then we went over (at a high level) how to make your SAP Applications highly available. Now it’s time to plan for when things really go bad. That is, the datacenter and all of your servers within are, effectively, toast. 

… But this is hard 

Building a truly effective Disaster Recovery solution for any application can be an arduous process, at best. Building one with SAP can be brutal. Each of the components that we considered in the HA guide also have to be replicated here. Add to that implications of name servers (DNS) and network throughput - for the replication pieces. Alongside these, you have segmentation; have you ever had two production instances with the same name, up at the same time? It’s foul! Finally, there are all sorts of other traps that can cause a level of havoc unprecedented within your organization. Be extremely thoughtful of everything that must take place when you first test your DR strategy. 

DR on the DB… 10-4 

Disaster recovery on an SAP HANA Scale-Up database is pretty straightforward. Essentially, you would setup a server that’s geographically disparate (i.e. far away) from your primary datacenter. You would then use asynchronous HANA replication to create the database at that location. With the proper network connection and throughput, you can have a fully synchronized database in a matter of a few hours. On the HANA server, the asynchronous copy is not functional, as it only has the row store loaded into memory. What this means, however, is that I can put this secondary (or even tertiary) copy of the database onto, say, a second instance on my Quality Assurance system, and it will have a minimal footprint, comparatively.

Things to keep in mind: We cannot perform a proper takeover of the primary database without a little manual intervention. In a disaster, we generally try to avoid automatic failovers because we want to exert control over things like DNS, network, and the order in which pieces of our environments become functional.

Clustering Inside Linux: At this point, neither SUSE Linux (SLES) or Red Hat (RHEL) support a clustered replication chain (i.e. HA Node 1 -> HA Node 2 -> DR Node). While it can be setup, and can work, we lose a little of the automation that the cluster resource agents provide. The best advice that I can give anyone here is test, test, and when you think you’re done, test some more. Practice failovers from nearly any scenario and how to recreate the chain when those failovers occur. You’ll thank me for it later.

Application tier tumbles

The application tier introduces its own set of complexities. Remember DVEBMGS from the HA piece in the series? Each of those components must be duplicated somehow in your environment to work properly in the case of a disaster. File system replication can (and even should) be used to accommodate this but be very careful on the solution you pick. Most virtual machine (and even cloud) providers have solutions for this very scenario.

Fuzzy bits 

The “fuzzy bits” of your SAP solution are the flotsam and jetsam that generally sit around your core instance, and tend to get very little attention. That is, they don’t get attention until they go down. These can include anything from your front-end providers (Portals, UI, etc.) to mission-critical tax calculation applications that help you avoid the rubber glove treatment from the IRS. Be incredibly mindful of these applications and ensure that they’re also accounted for in your disaster recovery plan. Needless to say, we could do an entire series on fluffy bits and only be touching the surface.

‘I have a (nerdy) dream’ 

At some point in the future, I think we end up somewhere in-between HA and DR. A dream of mine is to have a low latency, high throughput connection that is fast enough to allow a geo-clustered SAP application (think message server with failover to secondary data center, with application servers able to have the performance as if it were in the same rack). On top of that, with HANA readable DB replicas that are replicating synchronously (in HANA 2.0, they introduced the concept of a “readable” replica). That is my dream, where we essentially have a geographically disparate HA cluster. Frankly, I’m pretty sure we may not be that far off from it, but for now, it’s just a dream. 

And… we’re done 

While some of this may seem trivial, and even somewhat basic, the nuts and bolts of disaster recovery are anything but. With a lot of thoughtful design and some very well placed testing, though, you too can have a near-bulletproof SAP environment.

High Availability specifics – your inevitable failure Insight

High Availability specifics – your inevitable failure

Mathew Matt Mathew Matt Principal HANA Architect High Availability specifics – your inevitable failure
When good datacenters go bad - building fault tolerant solutions Insight

When good datacenters go bad - building fault tolerant solutions

Mathew Matt Mathew Matt Principal HANA Architect When good datacenters go bad - building fault tolerant solutions

We use cookies on to ensure that we can give you the very best experience. To find out more about how we use cookies, please visit the cookie policy page.