AWS Lambda offers strong financial incentives for keeping code modular and loosely coupled, and allows developers to gradually move from monolithic servers to microservices using a Strangler Application Pattern. This makes it compelling to create microservices and single-task functions. Breaking down a complex block of inter-dependent code into many isolated functions requires carefully considering coordination and dependency management, and Lambda offers several ways to decouple and invoke code, so choosing the right way to break dependencies is quite important. In this tutorial, we explore typical ways of coordinating and managing inter-dependent tasks.
There are several important aspects you should consider when choosing the dependency and deployment options:
- Reserved memory pricing
- Lambda pricing depends on time spent executing in the processor and reserved memory capacity. If two tasks share similar code, but one needs a lot more memory than another, bundling them into the same Lambda function will make execution more expensive for the smaller task. Spitting into two Lambda functions will allow you to reduce reserved memory for the smaller task and pay less for it.
- Data consistency needs
- It’s not possible to atomically deploy two separate functions, or even atomically reassign version aliases for two functions. Splitting code that touches the same piece of persistent data (files, database records) into two functions introduces a risk of data inconsistency. Essentially, if you change the common code that controls some data format, then deploy both functions, there will be a short period of time when two different versions of the common code run at the same time. This requires thinking about data versioning and handling data backwards/forwards compatibility in your function code to avoid data corruption. If you keep all the code that touches a common piece of data in a single function, this is not necessary, so the code will be simpler.
- Package size
- A single function package is limited to 50 MB, so two tasks that share common code but have a lot of different additional dependencies might not even be possible to package in the same function. You can work around this limitation for external resources and data files by storing them outside the package (for example in a S3 bucket), downloading (and caching) when the Lambda function loads, but that will increase startup time and may introduce significant latency for initial request processing.
- Deployment risk
- If two tasks depend on the same code, but carry different levels of risk, splitting them into two Lambda functions allows you to de-risk deployments easier. When the low-risk task changes, there’s no need to deploy and test the high-risk task as well.
- Deployment speed
- Uploading code, configuring event sources, re-wiring aliases all takes time, and it’s not atomic. Keeping two tasks in the same Lambda function allows you to deploy the changes faster, and worry less about external configuration. This is particularly important if you need to reconfigure API Gateway, as it has very low rate limits on configuration requests.
The deployment options and considerations differ slightly based on whether you are using just Lambda functions, or Lambda and API Gateway:
Options for deploying without a Web API
Two tasks, sharing some code, can be packaged in several ways for Lambda functions:
- One function, single project, two different event types
- The tasks could share the same source code, and the Lambda event handler could decide which task to invoke based on the type of the incoming event, or incoming event properties. For example, if the event comes from a S3 bucket, save the response back to the bucket, and if the event comes from a SNS queue send the result back to the queue. The Lambda handler needs to deal with choosing the right event parser and result processor. This makes it easy to update both tasks at once, but makes the package larger. It also increases the risk of a bug in one task impacting the other.
- Two functions, same project, different handlers
- The tasks could share the same source code, but use two different Lambda entry-points. For example,
s3.jscould parse S3 events, invoke common code to process them, and then save the results back to S3.
sns.jscould use a different parser and request processor. This makes it possible to share the same source code and have two different functions, that can be deployed and configured separately. This makes it easy to manage shared code, but includes all the files in both packages. If one task depends on a large set of external resources or additional dependencies, the other task will include lots of unnecessary files in the package as well (they are deployed from the same source tree).
To achieve this with Claudia.js, pass
--configwhen creating the functions so the function properties get saved to two different files, and use the same
--configargument when updating functions. For example:
- Two functions, two projects, shared dependency
- The two tasks could share the common code as a dependent library, either published to NPM, GitHub or included as a local directory. Claudia supports using local references such as
../shared-libin NPM project dependencies, so you don’t have to publish the shared code to NPM or GitHub just to include it in two places. This allows you to use two different functions but manage common code easily, and store in the same version control. Each function would have a separate NPM project, so individual dependencies and data files won’t be replicated twice. Because Claudia requires you to specify the Lambda handler separately from the NPM
package.jsonmain entry point, you could even keep the shared code in one project, and include it into the other project as a NPM dependency.
- Multiple functions invoking each-other, no shared code
- The two tasks could be deployed as separate functions, and one could invoke another, or the common code could be a separate Lambda function that gets invoked by both tasks. For example, instead of including code to manage records in a shared database in two functions, put all data management code in a single lambda function, and have the other tasks invoke it directly. This allows you to reduce dependency on shared code and assets, simplify data management, lock down and configure access to the database in a single place and centralise configuration. However, it introduces additional latency to each call, and one more potential point of failure. If dependent functions have to synchronously wait for the common code to complete, you’ll also be paying double for each call. You can invoke one Lambda from another easily using the AWS SDK, or use an external resource such as a SNS queue or S3 file store to decouple the Lambda functions even further.
Example: file conversion for MindMup
MindMup uses very similar code to convert maps to SVG and PDF. In fact, the PDF exporter works by first converting to SVG, and then uses an external tool (rsvg) to make the PDF file. The RSVG binary is huge. SVG conversion itself requires access to font files, which are also huge (30 MB). The files need to be saved to a well known location, on S3 so that clients could pick them up. The external PDF converter requires a lot of memory, which isn’t necessary for the SVG processor. The incoming data format is relatively stable, and already requires multi-versioning support for other uses. The PDF exporter is used by clients significantly more frequently than the SVG one.
Because the two processors have hugely different memory needs, financially it would be more beneficial to split them. They share the same input data format, but because they are not changing the same persistent data records, there’s no risk of data inconsistency if two versions of the shared code run at the same time. The SVG exporter is a lot less risky than the PDF one, so keeping them separate would also allow us to first validate changes on a low-risk format, then deploy the update to a higher-risk one. The solution we chose for decoupling is two functions, two projects, with a shared NPM dependency. Using the same function would increase cost (as the SVG converter could use less memory). Having the PDF exporter invoke the SVG exporter first would require us to either overcomplicate code and somehow post-process the response, or invoke the SVG exporter synchronously and pay double. Direct invocation of the same function would also not allow us to manage risk separately. Having both in the same project, and deploying two functions using
--configwould unnecessarily increase the size of the (already huge) SVG exporter, because it does not need the external tool for PDF conversions.
Options for deploying with a Web API
Having an API gateway in the picture complicates things a bit more. Reconfiguring API Gateway endpoints takes quite a bit of time because of the request rate limits, so bundling two different tasks together, that use different end-points, increases deployment time significantly. Web API endpoint tasks often need to share configuration as well as code. For example, GET, POST and PUT to
/documents/8761 all deal with the same persistent storage. Splitting such tasks into different Lambda functions would complicate configuration maintenance and reconfiguring. Similar to the concerns outlined above, using different functions would also increase the risk of data inconsistency if various endpoints write to the same persistent data entity. Finally, API consumers expect that various methods of the same API share the same basic URL/hostname. With API Gateway, each API instance has a unique URL, so bundling endpoints into the same API is often beneficial.
Claudia allows you to service multiple API Gateway endpoints from the same Lambda function, and can handle routing internally using
claudia-api-builder. For simpler scenarios, where you do not need complex routing, you can even use a proxy api. Although it’s technically possible to create an API that invokes multiple Lambda functions for different endpoints, Claudia assumes that a single Lambda function completely controls a single API Gateway API instance. This allows Claudia to simplify API deployment and reconfiguring for most cases, but makes it more difficult to share a common URL.
Here are the options for sharing code and URLs:
- Single project, single function, single API, multiple endpoints
- The two tasks would share the same source code, and the Lambda function would choose internally which one to execute based on the HTTP request. Using
claudia-api-builder, you just need to define two endpoint request processors. Using a proxy API, you can use the HTTP method and path of the request to decide how to handle it. Everything gets deployed at the same time, so you do not need to manage data consistency between different tasks/endpoints. Deployment is a bit slower for each update because multiple end-points need to be recreated. You can speed up deployment when the code changes, but end-point definitions do not change, using
- Single project, multiple functions, multiple APIs
- The tasks could share the same source code, but have different API definitions (for example
public-api.js). This makes it easy to keep code consistent and update shared code easily, and deploy the two APIs separately. The functions would be deployed from the same project directory using different
--config. Deploying separately using Claudia means that there would be two APIs in API Gateway, so they will be on different URL domains. You can configure a custom API Gateway domain and map various APIs to a common domain name. For example, the production stage of the admin API could be mapped to
https://api.myapp.com/admin, and the production stage of the public API could be mapped to
- Multiple projects, multiple functions, multiple APIs
- The tasks could share the same NPM dependency and be deployed completely separately. This decreases the risk of a problem in one project influencing another, but requires more complex dependency management for shared code. You can configure a custom API Gateway domain and map various APIs to a common domain name.
Example: Migrating a web server to Lambda
With Web applications running on a dedicated VM, such as Heroku apps, a single web server process normally handles all the web endpoints, so it can be deployed quickly, scaled easily and to save money. This server then handles user authentication, document management, serving web assets and so on. Groups of end-points share authorization and authentication, for example all
/publicpaths are open, all
/userend-points require end-user authentication, and all
/adminend-points require admin rights. Inside each group, some end-points apply additional authorization, for example a POST to
/user/123/preferencesshould only be allowed to user with ID 123. Almost all endpoints require some shared configuration for external resources such as a common DynamoDB database. Some end-points work on the same piece of data – for example user registration and updating user preferences share user records.
A typical way to convert this to Lambda functions would be to move static asset processing completely to S3 (images, static HTML), and to use Lambda functions for groups of API endpoints, based on the data they manage. For example, all
/user/...endpoints would go to a single Lambda function, and all
/document/...endpoints would go to another function. Splitting GET, POST and PUT for
/user/xxx/preferencesto separate Lambda functions would just overcomplicate deployment and configuration, with no particular benefit. If the structure of a user record changes, all
/userAPI end-points would likely have to be redeployed, so grouping them together saves time, and removes the need for each method to handle edge cases that might arise if individual endpoints work on different versions at the same time. That keeps the code simple. Likewise, there isn’t much point keeping
/documentendpoints together with
/userendpoints if they control different pieces of data. If end-points need to go live together, deploy them as a single function. If they don’t, split them. A Custom API Gateway Domain could then group all the various APIs under a common host name.
An exception to this would be for APIs intended for third-party consumption with billing and throttling. If you want to use shared API configuration across end-points, for example the same client API keys, throttling configuration or a shared API usage plans, there is a significant benefit to keeping all those endpoints in a single API (and by implication with Claudia.js, in a single Lambda function).
Shared authentication, and URL-level authorization could be extracted into a custom authorizer, or even a Cognito User Pool. Any additional shared functionality could be shared as a NPM dependency.