Amazon’s regional outage in April 2011 created havoc in the corporate circles who are in process of or planning to migrate their IT infrastructure to  ec2 cloud. However, even after the major failure with ec2 east coast facility, amazon cloud offers many  benefits such as  scalability, CDN,  hardware redundancy and flexibilty. For almost any web-centric business looking to scale up/out, the cloud is the most effective and quickest way to go live.  But that doesn’t get us off the hook for avoiding failure. It’s still up to the  companies to understand  the limitations and and create a strategy to achieve redundancy even if a complete region goes down.

One way to achieve regional redundancy is by implementing a replica infrastructure in other region and possibly put it behind a single load balancer so that even if one region goes out, you have redundant instances running in other region where all the traffic be redirected to.  However, its very expensive and difficult to maintain such a infrastructure due as intra-region dataa trasfer is counted as internet data transfer by amazon.

Here is a innovative and more cost effective sample plan which will come handy for most of the environments  which have a web app, a database and more components. I am going to consider a simple LAMP type application with two instances behind load balancer and am going to plan for intra regional as well as inter regional redundancy.

Here is how the planned infrastructure looks like

EC2 Regional Redundancy

EC2 Regional Redundancy

Approach:

  1. Use one EBS volume in the first region to share the data across isntances using NFS. All data resides on one volume. To achieve intra regional resilience, the EBS volume can be automatically snapshot nightly or so.
  2. Mount a S3 volume from Region 2 on one of the instnace. Rsync data every 10 minutes or so. Only changes done in the last 10 minutes will be copied as rsyn does incremental backup.
  3. Take database dumps every few hours and copy them over to S3 storage in Region 2.
  4. Create a AMI Image out of the application server in Region1. Bundle it and copy over to Region2 to store in S3 storage. This will not be a live instance, but just an image.
  5. Use tools such as scalr/puppet for automatically launching and  scaling up/down of small web servers in Region

Now, when Region 1 goes down,

  1. Launch a instance from the stored AMI
  2. Mount s3 storage with web data with s3fs on the instance
  3. Restore from the latest database dumps
  4. GO LIVE !! (within few minutes)

Disadvantages

a. Web data from the last rsync. we loose max 10 minutes data if we have rsync running every 10 minutes.
b. We loose recent aata in the database from the time last dump was taken.  Could be few hours data in case dump was taken every few hours.
c. it will take a  few minutes time till the environment comes live, so no seamless migration to region2.

Advantages

  1. We save significant intra-regional bandwidth cost to mainatain live backups.
  2. This is a simple  solution to implement.
  3. We have  off site backup of the complete data.

Summary

If your app could withstand a few minutes of downtime during intra-regional outages, this could be a good cost effective  solution.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

IMPORTANT! To be able to proceed, you need to solve the following simple math (so we know that you are a human) :-)

What is 15 + 10 ?
Please leave these two fields as-is:

site tracking with Asynchronous Google Analytics plugin for Multisite by WordPress Expert at Web Design Jakarta.