Moving from monoliths to Lambda functions

AWS Lambda offers strong financial incentives for keeping code modular and loosely coupled, and allows developers to gradually move from monolithic servers to microservices using a Strangler Application Pattern. This makes it compelling to create microservices and single-task functions. Breaking down a complex block of inter-dependent code into many isolated functions requires carefully considering coordination and dependency management, and Lambda offers several ways to decouple and invoke code, so choosing the right way to break dependencies is quite important. In this tutorial, we explore typical ways of coordinating and managing inter-dependent tasks.

Important considerations

There are several important aspects you should consider when choosing the dependency and deployment options:

The deployment options and considerations differ slightly based on whether you are using just Lambda functions, or Lambda and API Gateway:

Options for deploying without a Web API

Two tasks, sharing some code, can be packaged in several ways for Lambda functions:

  $ claudia create --config s3-function.json --name s3-processor --region us-east-1
  $ claudia create --config sns-function.json --name sns-processor --region us-east-1
  ...
  $ claudia update --config sns-function.json
  

Example: file conversion for MindMup

MindMup uses very similar code to convert maps to SVG and PDF. In fact, the PDF exporter works by first converting to SVG, and then uses an external tool (rsvg) to make the PDF file. The RSVG binary is huge. SVG conversion itself requires access to font files, which are also huge (30 MB). The files need to be saved to a well known location, on S3 so that clients could pick them up. The external PDF converter requires a lot of memory, which isn’t necessary for the SVG processor. The incoming data format is relatively stable, and already requires multi-versioning support for other uses. The PDF exporter is used by clients significantly more frequently than the SVG one.

Because the two processors have hugely different memory needs, financially it would be more beneficial to split them. They share the same input data format, but because they are not changing the same persistent data records, there’s no risk of data inconsistency if two versions of the shared code run at the same time. The SVG exporter is a lot less risky than the PDF one, so keeping them separate would also allow us to first validate changes on a low-risk format, then deploy the update to a higher-risk one. The solution we chose for decoupling is two functions, two projects, with a shared NPM dependency. Using the same function would increase cost (as the SVG converter could use less memory). Having the PDF exporter invoke the SVG exporter first would require us to either overcomplicate code and somehow post-process the response, or invoke the SVG exporter synchronously and pay double. Direct invocation of the same function would also not allow us to manage risk separately. Having both in the same project, and deploying two functions using --config would unnecessarily increase the size of the (already huge) SVG exporter, because it does not need the external tool for PDF conversions.

Options for deploying with a Web API

Having an API gateway in the picture complicates things a bit more. Reconfiguring API Gateway endpoints takes quite a bit of time because of the request rate limits, so bundling two different tasks together, that use different end-points, increases deployment time significantly. Web API endpoint tasks often need to share configuration as well as code. For example, GET, POST and PUT to /documents/8761 all deal with the same persistent storage. Splitting such tasks into different Lambda functions would complicate configuration maintenance and reconfiguring. Similar to the concerns outlined above, using different functions would also increase the risk of data inconsistency if various endpoints write to the same persistent data entity. Finally, API consumers expect that various methods of the same API share the same basic URL/hostname. With API Gateway, each API instance has a unique URL, so bundling endpoints into the same API is often beneficial.

Claudia allows you to service multiple API Gateway endpoints from the same Lambda function, and can handle routing internally using claudia-api-builder. For simpler scenarios, where you do not need complex routing, you can even use a proxy api. Although it’s technically possible to create an API that invokes multiple Lambda functions for different endpoints, Claudia assumes that a single Lambda function completely controls a single API Gateway API instance. This allows Claudia to simplify API deployment and reconfiguring for most cases, but makes it more difficult to share a common URL.

Here are the options for sharing code and URLs:

Example: Migrating a web server to Lambda

With Web applications running on a dedicated VM, such as Heroku apps, a single web server process normally handles all the web endpoints, so it can be deployed quickly, scaled easily and to save money. This server then handles user authentication, document management, serving web assets and so on. Groups of end-points share authorization and authentication, for example all /public paths are open, all /user end-points require end-user authentication, and all /admin end-points require admin rights. Inside each group, some end-points apply additional authorization, for example a POST to /user/123/preferences should only be allowed to user with ID 123. Almost all endpoints require some shared configuration for external resources such as a common DynamoDB database. Some end-points work on the same piece of data – for example user registration and updating user preferences share user records.

A typical way to convert this to Lambda functions would be to move static asset processing completely to S3 (images, static HTML), and to use Lambda functions for groups of API endpoints, based on the data they manage. For example, all /user/... endpoints would go to a single Lambda function, and all /document/... endpoints would go to another function. Splitting GET, POST and PUT for /user/xxx/preferences to separate Lambda functions would just overcomplicate deployment and configuration, with no particular benefit. If the structure of a user record changes, all /user API end-points would likely have to be redeployed, so grouping them together saves time, and removes the need for each method to handle edge cases that might arise if individual endpoints work on different versions at the same time. That keeps the code simple. Likewise, there isn’t much point keeping /document endpoints together with /user endpoints if they control different pieces of data. If end-points need to go live together, deploy them as a single function. If they don’t, split them. A Custom API Gateway Domain could then group all the various APIs under a common host name.

An exception to this would be for APIs intended for third-party consumption with billing and throttling. If you want to use shared API configuration across end-points, for example the same client API keys, throttling configuration or a shared API usage plans, there is a significant benefit to keeping all those endpoints in a single API (and by implication with Claudia.js, in a single Lambda function).

Shared authentication, and URL-level authorization could be extracted into a custom authorizer, or even a Cognito User Pool. Any additional shared functionality could be shared as a NPM dependency.

Did you like this tutorial? Get notified when we publish the next one.

Once a month, high value mailing list, no ads or spam. (Check out the past issues)