3 Things I've Learned about AWS Lambda
why use lambda
Recently I've been working on a side project that involves an image processing pipeline, where I take a sequence of images, do some number crunching, and then squish them into a video.
The thing is, since I'm a chump at marketing and it's a very niche tool (in biz speak: total addressable market), I don't expect a lot of people to use it. In fact, it'll probably just be me most of the time. Maybe a few friends will try it out once or twice. So why reserve a server running full-time just to rarely ever be used?
On the other hand, what'll happen if it unexpectedly becomes popular? You see, when the pipeline's running, it's using all the resources available. That means 100% CPU usage for the duration. My 15 minutes of fame would immediately turn sour as soon as a second concurrent user tries to play with it.
If only there was something that handles both extremes... oh, there is, thanks Amazon! Keyword: serverless computing
Let me lay out some of my top reasons for using it:
- Unlimited Scale Works: everything is transparent regardless of whether there are no users at all or 1,000 concurrent requests
- Perf: AWS Lambda consistently outperforms the offerings from Google and Azure in terms of both cold start time and execution benchmarks. Important note: AWS Lambda's CPU performance scales with allocated RAM as well, which is not apparent from the UI! This holds at least up to the 2GB configuration.
- 15 minute timeout: Azure only goes up to 5 minutes, and Google Butt Functions only allows 9 minutes
- Ample free tier: 400,000 GB-seconds per monf!
what I've learned
Now to fulfill the promise of this post, I want to share some of the things I've learned on the platform. These are all problems I hadn't expected when I began, so hopefully I can spare you a bit of headache.
- Know the size limits and plan ahead: In particular, note that the function package size must be no greater than 250 MB unzipped. Originally, I had written my handler function in YavaScripp for the Node runtime, but most of the meat of my application was compiled native code. The handler function is just a thin layer that executes a pre-compiled Rust program. And this Rust program invokes
ffmpeg
as the last step of the image pipeline. A quick accounting of the budget at this point puts me at about 100 MB (80 for ffmpeg, 20 for Rust).
But of course plans change, scope creeps, and before I know it I'm reworking the image pipeline to involve processing in Python and OpenCV as well. Well this meant including OpenCV (40 MB) + NumPy (another 60 MB), and now I'm right on the threshold but still should be fine. However, I didn't account for the Node runtime not also having Python installed (I kind of assumed any Linux would have it, I suppose). Including a copy of Python would definitely take me over the limit (it's another 70 MB). This meant I had to switch to the Python runtime, which of course necessitated rewriting the handler routine. All told, not the worst thing in the world, but definitely a cost that I could have avoided with a wiser choice at the outset. - Build binaries with docker: In the beginning I tried porting over regular builds of my programs to Lambda. I spent half my time dealing with missing libraries and the other half fighting against linker errors. It was awful and I hated it. Then I found out about the lambci/docker-lambda project. The blessed authors of this project maintain Docker images that perfectly replicate the environments at runtime on AWS Lambda. That means if you can build and run your program in that container, it'll run on Lambda too. So build in the container, copy out the artifacts, and upload them in your bundle to Amazon. No more clang flag voodoo and cross-toolchain sorcery required.
Here's how I'm building an AWS-compatible build of OpenCV: Dockerfile. - Websockets to communicate progress: One shortcoming of long-running serverless functions is lack of insight on their status during execution, since you only get a response at the end. To solve this, I added a websocket handler on a known path of my web server that each launched Lambda invocation would connect to report progress. Using socket.io makes it easy to pipe this progress back to the client exploder for rendering.
Hope this helps, and good luck!
View Comments