Posted by: Joe Devietti on: April 11, 2009
Editor’s Note: This whole rigmarole is unnecessary now that you can boot from EBS-backed AMIs that can have essentially unbounded size. But this trick was fun while it lasted!
I love Amazon’s Elastic Compute Cloud, and have been using it to run research experiments without having to worry about multiplexing computing resources among other members of my research group. No running top after I login to make sure I’m not stepping on someone else’s experiments: I launch an instance and I get it all to myself.
Sharing storage across instances, however, is tricky. For my purposes, having a read-only copy distributed among my instances is sufficient; of course adding read/write access makes things substantially trickier. Yet even given that I was fine with read-only access, none of the solutions that immediately came to mind were satisfactory:
What I really wanted was the ability to mount an EBS volume read-only on multiple instances. Since things are read-only, there won’t be any consistency issues but, still, Amazon doesn’t support this. Until I discovered a hack to make it possible, using EBS snapshots.
The basic idea is to have a master EBS volume V that you want to replicate with read-only copies across a number of instances. Upon bootup, each instance makes a snapshot of V and then its own personal volume Vp based on that snapshot. Each instance can then attach the volume Vp and voila – we’ve got our data replicated across our instances. No fancy network filesystem or S3 hacks necessary.
What makes all this go is that EBS snapshots are very fast (because they’re lazily constructed). My master volume V is 10GB in size, and about 7GB full at the moment. And this whole take-a-snapshot-and-mount-it routine takes less than 10 seconds. After I’m done with an instance, I have it throw away the snapshot and volume Vp to save space. But since snapshots are built on diffs, having a bunch of snapshots doesn’t take up much room in S3 (i.e. cost much money) anyway. Ultimately, EBS is doing exactly what I would want to provide a high performance read-only version of the volume: lazy creation of snapshots makes replication fast, and each snapshot volume functions as a cache to increase read bandwidth. And all this without any extra engineering on my part!
I put together some Python scripts (with the help of the excellent Boto library) to automate this read-only replication of an EBS volume. All you have to do is edit some parameters in ec2lib.py and then link these scripts into your distro’s boot/shutdown routines; this code is designed to be run from the instance itself. The code is available under the MIT license (like Boto itself). The repository includes a copy of Boto 1.7a to keep things self-contained.
Awesome, thanks Joe! Could you elaborate on how EBS-backed AMIs supersedes this? Your approach allows you to point to a Volume, so that you get a snapshot of the latest contents of that Volume at boot time, while the AWS approach seems to require you to point at a Snapshot, so if your Volume changes, you’d have to manually snapshot and update the AMI each time, correct?
If I’m wrong please let me know, as I’d love to be doing this the “correct” way.
March 15, 2011 at 4:01 pm
This is great – helped me out just now since nfs kernel support seems to be broken in the version of ubuntu 10.10 I have running on my ec2 installs.