I am doing a lot more work with AWS Lambda and GoLang, primarily for automation and monitoring. There are a few important recommendations that I can provide at this point. And I’d like to do this in the context of real code. So I built an example of an AWS Lambda function written in GoLang and posted the code on GitHub: https://github.com/vmogilev/raptor/.
Let’s get to it (all links below take you to the relevant code snippets on GitHub):
Instrument Lambda Function
Prefix every line of your log with Lambda RequestID. It has the following format:
72861f13-3ec2-11e8-b266-3fbfa0ef4b01, you can then use the first hash
72861f13 to search the CloudWatch log stream.
Instrument your function with AWS XRAY and annotate XRAY with the same Request ID. You’ll then be able to find XRAY traces with a simple query:
service("raptor") AND annotation.RequestID = "72861f13-3ec2-11e8-b266-3fbfa0ef4b01".
Verify AWS Identity
Verify and log the AWS Identity your function is launched with. Having AWS Identity ARN clearly displayed in the log helps troubleshooting permission issues.
Build an integration test and make it a prerequisite step for production deployment. I think a full blown integration test is a necessity for any serious production code. But it’s especially important for high velocity Lambda deployments. You want to catch bugs early-on instead of spending long hours sifting through the CloudWatch logs later.
Deploy the function, it’s IAM Role and corresponding IAM Policies using an automation tool. Embrace the Infrastructure As Code. And under no circumstances create any of it by hand. I don’t even recommend using
aws cli. Instead, I build a Terraform module and place it in the same repo. Anyone reviewing the code will clearly understand the deployment story and it’ll be easy to destroy all artifacts if necessary.
There are few things that you, the reader can improve on:
- Setup DLQ Resource. AWS Lambda will automatically retry failed executions for asynchronous invocations. But to really make it bulletproof, you can forward payloads that were not processed to a dead-letter queue (DLQ), such as an SQS queue or an SNS topic.
- Setup CloudWatch Alarm to monitor execution errors.
- Incorporate all of the above in the Terraform module.
- Move Terraform state file to S3.