March 10, 2021

With the cloud providers offering flexibility, elasticity and availability, enterprises are not only shifting their primary workloads to the cloud but also the associated disaster-recovery (DR) environments. Having the DR environment on cloud is a good choice for enterprises running on-prem production environments as well, because the cloud enables on-demand consumption of services for DR-related requirements.

Disasters are unpredictable and can lead to loss of data. Natural disasters, unexpected breakdowns, node failures, cyber-attacks, server corruptions, eroded configurations, unstable patches and so on make it almost mandatory for enterprises to keep a replica of production environment at an alternate place. DR on cloud involves storing critical data and applications in a secure cloud storage at a secondary site for the systems to fail over in case of a disaster.

An always-on disaster recovery (DR) environment is critical for businesses to manage any unforeseen incidents. Besides the DR assessment, planning and implementation exercise, the ability to recover in a predictable manner and return to normal business after any disaster scenario depends on flawless maintenance of the DR setup and diligence in performing DR drills.

The cost of downtime is one of the important factors when defining and establishing DR environments. There are various direct and indirect costs to be considered such as revenue loss, damaged reputation, and loss of employee productivity. Planning for a DR environment on cloud has many advantages:

  • Recovery at a lower total cost of ownership (TCO)

  • On-demand services reduce upfront investment for setting up a DR site

  • Reliable and scalable infrastructure

  • Automated solutions and service management

  • Faster time to market

According to a Gartner report, hybrid infrastructure increases the complexity of DR, and legacy recovery strategies may fall short of addressing the full extent of operating scenarios.

Considering changing enterprise architectures and evolving application designs, DR strategies need to be defined to account for workloads running across on-premise data center, in the cloud, and at the edge. Disaster resiliency requirements must be evaluated at the design stage to ensure that they are achievable.

DR planning and design 

When it comes to DR planning, a one-size-fits-all approach does not always work. However, any business can leverage the benefit of geographic diversity offered by the cloud providers (for example, AWS offers cross-region disaster recovery). For effective DR planning, it is important to identify critical IT systems and associated steps to restart, reconfigure, and recover systems and networks in the DR setup. The DR plan should be based on risk and business impact analysis, which helps determine where to focus resources as per the required recovery time objective (RTO) and recovery point objective (RPO) metrics.

Cloud DR planning involves estimation of DR resources with replication solution and covers automation aspects. Cloud DR can be run very efficiently by maintaining a disaster recovery plan with details such as the location of production servers, VM instances, storage, network and security setup on cloud. A detailed assessment is key to good planning and designing of a DR setup. Solutions that help with a detailed view to understand application landscape with all aspects of integrations and infra mapping, are critical to create an accurate and efficient DR design.

DR on cloud

There are multiple ways available with leading cloud players such as AWS where cross-region recovery can be configured to address the DR requirement by default. Even if an entire region goes offline, a business can continue to operate with little disruption as a geographically diverse recovery plan would be in place. Advanced features can be leveraged while running DR on the cloud. For example, AWS offers:

  • Two-way replication of S3 storage across regions, which can play an important role during system fail-back as it gives a lot of flexibility from DR perspective by replicating object metadata changes between the buckets.

  • Aurora Global DB, which can be used to have a single database spanning multiple AWS regions. It provides effective RPO (~1 second) and RTO (< 1 minute), which creates a strong foundation for a global business continuity plan.

  • DataSync, which can be used for automated replication (including on premise) and for high-speed recovery. It is very effective for two-way replication of data between NFS file shares, S3 buckets and EFS file systems across the regions.

  • Multi-region replication, which is possible with DynamoDB. This is a No-SQL database and ideal for massively scaled applications with globally dispersed users. DynamoDB global tables provide automatic multi-active replication to AWS regions worldwide, which is useful in running a successful DR on cloud.

Taking comprehensive DR measures will help enterprises minimize business downtime, service disruption and, thereby, any loss to reputation. DR helps organizations protect enterprise applications, ensure no data is lost and protect against cyber-attacks.

Click here for more details on TCS’ AWS Cloud Storage solution.

Tags

Ashish Vyas leads Cloud Infra Strategy and Modernization in TCS’ AWS business unit. With over 20 years in TCS, he has led many strategic customer engagements globally and provided key solutions to address clients’ needs. His specializations include enterprise architecture and cloud technologies.

Sumitha Rao is a technical architect in TCS’ AWS business unit. With over 14 years in TCS, she has worked on multiple customer requirements from different geographies. Her specialization includes cloud technologies and microservices.