AWS Lambda Experiments

I recently discovered AWS Lambda – a way to run a small-ish piece of code in the cloud without having to provision any infrastructure. You give AWS your code (e.g., a Java class that implements the required Lambda interface) and they run it for you. You can run the function based on a periodic schedule, in response to various AWS events, or from a web page (via the Lambda JavaScript API).

I realized that some of the little web services I’ve built on Google App Engine (such as the UPenn Calendar Fixer-Upper, and the Architecture & Compilers Conference Map) could actually run nicely via AWS Lambda instead. The data for these web services changes only very rarely, but with my GAE setup I was recomputing everything from scratch on each request. For example, with the Arch Conference Map, every request for the page triggered several requests to wikicfp.com to scrape CFPs. But new CFPs are posted only a couple of times per year, so it’s a bit wasteful to recompute everything from scratch on each request. The Arch Conference Map page was also slow to load as a result, which was always annoying to me. GAE has various request caching features I could use, but this would add more complexity.

Instead, I’ve ported over the Arch Conference Map to AWS Lambda. I wrote a little Java function that scrapes CFPs from wikicfp.com, extracts their information, and builds a static HTML page with these results. The static HTML page then gets uploaded to Amazon S3, which can be configured to serve static web content. As a bonus, it’s easier for me to debug issues with the site because I can run the Lambda function on my local machine under a Java debugger to see how it’s behaving. My “web app” isn’t part of some big framework with lots of abstractions, it’s really just a plain Java function.

Serving static content from S3 is also pretty fast – in my informal testing the page loads several times faster now. Plus, the Lambda function only runs once per day, instead of on each request. If I could measure the carbon footprint of the app, I imagine it’d be a lot lower now.

I ported the Penn Calendar Fixer-Upper over to AWS Lambda as well, to generate a slightly nicer iCal feed than what the university provides. There are iCal feeds for the CIS and ESE departments now.

The only hiccup I’ve run into with Lambda is the limit on code size. The jar file I upload has to be under 50MB, which is actually something of a limitation since the jar needs to include all library dependencies outside of the JDK. While my code is only a couple of hundred lines, adding in the AWS Java SDK and some Apache code it’s quite easy to exceed 50MB. I had to use some maven magic to include only the imports I actually use. Even with that, my simple programs end up generating ~20MB jars.

My apps were operating entirely for free on GAE. In contrast, using AWS costs some money. Lambda is free for my low-rate usage, but there is a nominal cost (a few cents per month) for storing and serving the S3 content. However, for me the ease of working with and debugging Lambda functions makes it worthwhile.

 

 

Advertisements

Sharing Amazon Elastic Block Store among multiple instances

Editor’s Note: This whole rigmarole is unnecessary now that you can boot from EBS-backed AMIs that can have essentially unbounded size.  But this trick was fun while it lasted!

I love Amazon’s Elastic Compute Cloud, and have been using it to run research experiments without having to worry about multiplexing computing resources among other members of my research group.  No running top after I login to make sure I’m not stepping on someone else’s experiments: I launch an instance and I get it all to myself.

Sharing storage across instances, however, is tricky.  For my purposes, having a read-only copy distributed among my instances is sufficient; of course adding read/write access makes things substantially trickier.  Yet even given that I was fine with read-only access, none of the solutions that immediately came to mind were satisfactory: Continue reading