This is a follow up to our previous article on Managing multi-environment serverless architecture using AWS API Gateway. To see an implementation of the approach described in this article, see the sample project on GitHub.
Error handling in microservices or serverless architectures can be tricky. Different components may be integrated using different protocols and be built using different stacks, and yet any client facing error responses should honour the same API contract.
The API Gateway pattern relies on a facade/gated proxy service.
In this article we are going to look at a simple serverless To-Do app built using SAM (Serverless Application Model), AWS CloudFormation, API Gateway and Lambda functions written in Go.
Our goal is to meet the following requirements in relation to error handling:
This is our API contract for error responses. The clients expect any non 2XX
responses to contain an application/json body with this shape.
Let’s investigate the request flow with AWS API Gateway and AWS Lambda. As we can see below, there are two sources for errors, the API Gateway itself (Gateway Responses) and the integration function (Integration Responses).
Errors can originate from various sources within a serverless application
Gateway Responses represent errors that occur before reaching the integration (such as access control errors, internal configuration errors, etc), or when the integration response cannot be mapped to a method response. These can be customised to fit our error schema by using a simple mapping template. Here’s the relevant section of our CloudFormation template.
AWS provides a full list of response types that can be used to define response mappings.
For the purpose of this example we chose to map only the catch-all 4XX
and 5XX
types. In the case of 4XX
errors we return error.responseType
as our code, and error.messageString
as our message, which will provide validation and access control error details to clients. But in the case of 5XX
errors we hardcode the code and message in order to avoid leaking internal configuration error details to clients.
The strategy for handling errors returned by a Lambda function is dependent on how the function is integrated with the endpoint, which can be done using either a proxy integration or a custom integration. The former proxies HTTP requests to the Lambda whereas the latter decouples the function from the original HTTP request further and completely relies on request and response mappings.
Our Lambda handlers use the custom integration type. This allows us to write handlers with clearly defined inputs and outputs, without any knowledge of the HTTP request initially made to the API gateway.
Not only does this keep the function simple, it also makes it easier to invoke and test our command with a simple event payload and inspect the result. We found sam-local to be a useful tool during development and leverage it for integration testing, as will be explained later.
AWS Lambda uses its own error schema which can later be inspected and modified by API Gateway. This is why the following Lambda handler may not do what you expect.
Instead of outputting a simple error string, the Go Lambda runtime wraps the message string in a custom error type which results in the following response.
The Lambda error response includes the actual type of the original error, see Go source, and the error value.
However we want to retain error codes and reliably separate private and public error details. In order to do that our handler needs to return a structured error of an error type that produces a json encoded string upon a call of its Error()
method, thereby resulting in a json-within-json integration response.
We introduced a lambdaError
type, meant to be used by a Lambda handler function to wrap errors before returning them.
The error value is the JSON encoded structured error.
As you can see, our internal error schema contains a code
, a public_message
and a private_message
. The code
will be useful for matching on the gateway later, the public_message
is a human readable string that does not leak any technical details, and the private_message
is the detailed error string.
Invoking our handler now returns the following error response.
The handler does not know how this response will be processed by the gateway before being sent to the client.
This is not particularly elegant, but it’s the only way to return a structured error message when using custom Lambda integration in API Gateway.
A response received from the Lambda function then gets mapped by API Gateway in order to conform to our external API contract. To map errors we rely on integration response mappings, which regular expressions, to match error codes.
Our example app’s matching strategy is to first create a matching rule for the absence of error (success response) and then for any expected errors that can be proxied to the client, and finally a catch-all for unexpected errors that simply maps to a hardcoded 500 Internal error
.
Below is a snippet from the template that deals with integration response mapping, which deals with ‘not found’ errors.
Note that a 404 method response must also be defined for that endpoint.
The regular expression matches a specific error code inside the errorMessage
string value (which happens to be our json encoded structured error), then maps it to a 404
status code and uses the velocity template language to decode the json string and fit it to our expected output shape, keeping out the private message, which still gets logged.
The resulting error returned by the gateway to our client is just as we expect.
The most important thing to note here is that we can manipulate the integration errors in any way that we see fit, for example, changing a code like TASK_NOT_FOUND
to RESOURCE_NOT_FOUND
if that’s what the client expects.
Below is the flow diagram from before, now annotated with the error payloads at different stages:
The good news is that we were able to fulfil our error handling requirements using AWS API Gateway and Lambda, but we did find that aspects of the solution have some drawbacks.
The json-within-json hack used to return a structured error along with the regex matching seem particularly brittle. This could perhaps be ameliorated by the introduction of support for a protocol such as gRPC on the integration side.
In the end the trade-off is some clunkiness vs the time that it would take to roll out your own gateway service.
Full integration testing including all gateway mappings, validation, authorisers, etc. would have to be done in a testing CloudFormation stack. However, SAM Local helps considerably with testing the Lambda responses and gives us a sense of how our handler command will behave when executed in the real Lambda environment.
Using sam local invoke
we can execute a handler cmd in a local environment by providing it with an event file similar to what API Gateway would send it.
We decided to automate this process by writing tests that execute sam invoke with a prebuilt cmd binary and check the response payloads.
You can find our helper invoke function here.
Note: SAM Local is also able to bring up a local API gateway however custom integration is currently unsupported which meant that we couldn’t use that functionality.
This investigation was carried out with Christian Klotz at 2PAx (a startup that aims to revolutionise how restaurants allocate covers). Thanks to Christian for helping with this article.
Sample App Sourcehttps://github.com/smalleats/serverless-todo-example
AWS Lambda Gohttps://github.com/aws/aws-lambda-go
AWS SAM Localhttps://github.com/awslabs/aws-sam-local
Dave Cheney’s error packagehttps://github.com/pkg/errors
VTL Referencehttp://velocity.apache.org/engine/devel/vtl-reference.html