AWS GoLang Lambda with S3 XRAY and Terraform

I am doing a lot more work with AWS Lambda and GoLang, primarily for automation and monitoring.  There are a few important recommendations that I can provide at this point.  And I’d like to do this in the context of real code. So I built an example of an AWS Lambda function written in GoLang and posted the code on GitHub:

Raptor an AWS S3 evented GoLang Lambda Function

Let’s get to it (all links below take you to the relevant code snippets on GitHub):

Instrument Lambda Function

Prefix every line of your log with Lambda RequestID.  It has the following format: 72861f13-3ec2-11e8-b266-3fbfa0ef4b01, you can then use the first hash 72861f13 to search the CloudWatch log stream.

Instrument your function with AWS XRAY and annotate XRAY with the same Request ID. You’ll then be able to find XRAY traces with a simple query: service("raptor") AND annotation.RequestID = "72861f13-3ec2-11e8-b266-3fbfa0ef4b01".

Verify AWS Identity

Verify and log the AWS Identity your function is launched with. Having AWS Identity ARN clearly displayed in the log helps troubleshooting permission issues.

Integration Test

Build an integration test and make it a prerequisite step for production deployment.  I think a full blown integration test is a necessity for any serious production code.  But it’s especially important for high velocity Lambda deployments.  You want to catch bugs early-on instead of spending long hours sifting through the CloudWatch logs later.

Automate Deployment

Deploy the function, it’s IAM Role and corresponding IAM Policies using an automation tool.  Embrace the Infrastructure As Code.  And under no circumstances create any of it by hand.  I don’t even recommend using aws cli. Instead, I build a Terraform module and place it in the same repo.   Anyone reviewing the code will clearly understand the deployment story and it’ll be easy to destroy all artifacts if necessary.


There are few things that you, the reader can improve on:

  1. Setup DLQ Resource.  AWS Lambda will automatically retry failed executions for asynchronous invocations.  But to really make it bulletproof, you can forward payloads that were not processed to a dead-letter queue (DLQ), such as an SQS queue or an SNS topic.
  2. Setup CloudWatch Alarm to monitor execution errors.
  3. Incorporate all of the above in the Terraform module.
  4. Move Terraform state file to S3.

Blue/Green ECS-optimized AMI Update For ECS Instances

If you are running on Amazon EC2 Container Service (ECS) you are familiar with the concept of “ECS-optimized AMI update”. This is a notification Amazon sends you when there is a new Docker Engine version that addresses a certain vulnerability. This means it’s time to update Amazon ECS-optimized AMI.  And effectively switch to the new version of Docker Engine and ECS Agent.

What I’d like to show here is a safe method of updating ECS AMI.  This method is a proven Blue/Green Deployment strategy that keeps your old AMI Instances on standby just in case the new AMI Instances fail under load.  It gives you the time to burn-in the new Docker/ACS-Agent/AMI stack under real production load.  And when you feel solid – you simply terminate the old instances.

Blue/Green ECS-optimized AMI Update For ECS Instances

But before we look at the solution lets examine what is the problem.

The Problem – NO Blue/Green!

As of today (Jan-19-2017), if you simply swap the ECS Instances from underneath your ECS Cluster they go away for good.  There is no way to safely re-attach them back to the ECS Cluster.  Here’s an Issue I opened on this on GitHub.  Let me provide the summary here:

  1. I feel we should have a way to mark ECS Instances as StandBy and have the ECS Agent not schedule any tasks on them for as long as that status is active.  I don’t think “Deregister” functionality is sufficient here because there is no way that I know of to bring deregistered instances back into service.
  2. I also don’t like that a specific version of Docker/ECS Agent is not pinned to a specific version of Amazon ECS-optimized AMI.  If it were – this would not be an issue,  we could always bring back a known, good working set of versions into service.  But as it is now – even if we used an older AMI – it will pull in the most recent version of ECS Agent and Docker on instance launch.

However, the good news is that until above two points are addressed we have another solution – read below.


Just 1 week after I reported this issue on github – AWS team implemented a solution called “Container Instance Draining”. I’m impressed! Way to go AWS ECS Team!  see: GitHub issue update.  Now you have two solutions – the one below and the new Container Instance Draining.

Solution – Task Placement Constraints

Solution is to utilize Task Placement Constraints in combination with ECS Instance Platform Attribute ecs.ami-id. This combination forces ECS Agent to place Running Tasks on the instances with specific AMI that we designate for a Task. Here’s how it works:

Lets say you are updating from amzn-ami-2015.09.g-amazon-ecs-optimized (ami-33b48a59) to amzn-ami-2016.09.d-amazon-ecs-optimized (ami-a58760b3).  And lets say you register 4 NEW AMI Instances and 4 OLD AMI Instances to your ECS Cluster concurrently.  Now, Task Placement Constraint can filter these 8 instances by AMI attribute directly. You can test how this works via aws ecs list-container-instances API call using --filter flag:

## List 4 OLD AMI Instances
ubuntu@a01:~$ aws ecs list-container-instances --cluster "my-cluster" --filter "attribute:ecs.ami-id == ami-33b48a59"

    "containerInstanceArns": [

Armed with knowledge we can create two Task Definitions with placementConstraints = specific AMI and then assign the desired task definition to the ECS Service. Then, automagically, ECS Agent does all the heavy lifting by effectively moving Running Tasks from one set of AMI Instances to another, draining connections and updating ELB.  The best part is that we can go back and forth – effectively employing Blue/Green Deployment strategy.

The Runbook (AWS Console)

Here’s a step by step process using AWS Console.  Once you know how to do this manually – it’s possible to automate it.

For this example, lets pretend we have the following stack:

  1. We have an ECS Cluster with a Service and a Task that already runs under this Service.
  2. There is an Auto Scaling Group with 4 ECS Instances serving this ECS Cluster
  3. Each Task is placed on a single instance, this is achieved via a static Host Port / Container Port mapping (80->8080)
  4. The Service is defined with a Minimum healthy percent = 50 and a Maximum percent = 100
  5. These current ECS instances are running an “OLD AMI” ami-33b48a59
  6. The goal is to safely upgrade this ECS Cluster and switch it to “NEW AMI” ami-a58760b3

Lets get on with it:

1. Create New Task Definition (OLD AMI)

Go to: AWS Console -> Amazon ECS -> Task Definitions

  1. Click on the Task Definition
  2. Click [x] Next to the latest Task Definition Revision
  3. Click Create Revision
  4. Click (+) next to “Add constraint”
  5. Fill in the following:
    Type: memberOf (already pre-filled/can’t change this)
    Expression: attribute:ecs.ami-id == ami-33b48a59
  6. Click Create

Resulting JSON (relevant section):

  "placementConstraints": [
      "expression": "attribute:ecs.ami-id == ami-33b48a59",
      "type": "memberOf"

2. Update ECS Service with new Task Definition Revision (OLD AMI ID)

Go to: AWS Console -> Amazon ECS -> Clusters

  1. Click on Cluster Name
  2. Under Services Tab – Click On Service Name
  3. This Brings up Service Detail page – Click Update Button
  4. Under Task Definition column pull down the drop list and pick the Task Definition we created in step 1
  5. Click Update Service


At this point ECS Agent will start draining connection to old Tasks and start placing new Revision of tasks onto the same instances (it drops 4 tasks to 2 and then swaps them one at a time): QA_swap_tasks
End result: all tasks are running at latest revision, still on the OLD AMI ID.

3. Launch 4 Additional EC2 Instances (NEW AMI)

Go to: AWS Console -> Amazon EC2 -> Launch Configuration

  1. Select [x] Next to your launch configuration
  2. In the detail pane Click Copy launch configuration button
  3. Edit AMI – change from amzn-ami-2015.09.g-amazon-ecs-optimized – ami-33b48a59 to amzn-ami-2016.09.d-amazon-ecs-optimized – ami-a58760b3
  4. Click Yes to confirm AMI change and warnings about possible changes to instance type selection, Spot Instance configuration, storage configuration, and security group configuration
  5. Leave selection on existing instance size/type
  6. Change name to something new (we append a number to basename)
  7. Leave everything else as is
  8. Click Next (Storage) – leave as is
  9. Click Next (Security Groups) – leave as is
  10. Click Review
  11. Click Create launch configuration
  12. Confirm you have Key Pair
  13. Create New launch configuration

Go to: AWS Console -> Amazon EC2 -> Auto Scaling Groups

  1. Select [x] next to your Auto Scaling Group (ASG)
  2. Pull Down “Actions”
  3. Select Edit
  4. Change Desired: 8 (from 4); Change Max:     8 (from 4); Change Launch Configuration to the name you created in previous step
  5. Click Save

Wait Until 4 new Instances are added and their status is InService.

At this point you have 8 instances – 4 with OLD AMI and 4 with NEW AMI.  The reason this works is because ASG doesn’t do anything with existing running instances when you change it’s Launch Configuration (LC).  It just lets them run as-is unless you downsize ASG – at which point it’ll scale-in (terminate) the instances with old LC, which is exactly what we’ll use in the last step.

Alternative method is to create a whole new ASG with new LC – this is the way I would do this from now on – it’s a safer process.  However changing LC works as well and that’s what I did here.

Regardless of the method you use to add 4 new instances — they should register under ECS Cluster.  Lets verify this — go back to the ESC Cluster page and click on the Instances Tab – it should show 8 instances registered with 4 OLD AMI Instances and 4 NEW AMI Instances.  And all tasks are still running on the 4 OLD AMI Instances:

Outdated ECS Agent upgrade process

And now our next step is to migrate the Running Tasks to the 4 NEW AMI Instances.

4. Create New Task Definition (NEW AMI)

Go to: AWS Console -> Amazon ECS -> Task Definitions

  1. Click on the Task Definition
  2. Click [x] Next to the latest Task Definition Revision
  3. Click Create Revision
  4. Under “Constraint” Update memberOf to new AMI ID (ami-a58760b3)
  5. Click Create

Resulting JSON (relevant section):

  "placementConstraints": [
      "expression": "attribute:ecs.ami-id == ami-a58760b3",
      "type": "memberOf"

5. Update ECS Service with new Task Definition Revision (NEW AMI ID)

Go to: AWS Console -> Amazon ECS -> Clusters

  1. Click on Cluster Name
  2. Under Services Tab – Click On Service Name
  3. This Brings up Service Detail page – Click Update Button
  4. Under Task Definition column pull down the drop list and pick the Task Definition we created in step 4
  5. Click Update Service

End result:

  1. All tasks are running at latest revision and are placed on the NEW AMI Instances
  2. 4 OLD AMI Instances are still in service and we can switch to them by updating the ECS Service with the old Task Definition which is bound to use OLD AMI Instances via it’s Constraint

ECS Agent Rolling Upgrade AMI

6. Finally Switch Back The Auto Scaling Group to 4 Instances

Once we feel solid the new AMI/Docker/ECS-Agent stack performs under production load – we can terminate the old instances by setting ASG’s Max and Desired back to 4.   This automatically Terminates the Instances with OLD Launch Configuration and leaves the instances with new Launch Configuration active:



So far I am very happy with Amazon EC2 Container Service. It served us well for almost a year now. And during our first major upgrade we found a Zero-Downtime solution utilizing a classic, well proven Blue/Green Deployment strategy.

Upgrade Docker Engine to Specific Version

I always recommend upgrading Docker Engine to a specific version that matches the rest of your infrastructure. For example if you are running the latest Amazon ECS-optimized AMI (currently amzn-ami-2016.09.d-amazon-ecs-optimized) — it contains Docker Engine 1.12.6. This writeup will show you exactly how to match that version. The steps are specific to Ubuntu 14.04.3 LTS.

1. Update package information

This is to ensure that APT works with the https method, and that CA certificates are installed:

sudo apt-get update
sudo apt-get install apt-transport-https ca-certificates

2. Add the new GPG key

FYI: This commands downloads the key with the ID 58118E89F3A912897C070ADBF76221572C52609D from the keyserver hkp:// and adds it to the adv keychain. For more info, see the output of man apt-key.

sudo apt-key adv \
--keyserver hkp:// \
--recv-keys 58118E89F3A912897C070ADBF76221572C52609D

3. Add specific repo for your distro to docker.list

FYI available repos:

Ubuntu version Repository
Precise 12.04 (LTS) deb ubuntu-precise main
Trusty 14.04 (LTS) deb ubuntu-trusty main
Wily 15.10 deb ubuntu-wily main
Xenial 16.04 (LTS) deb ubuntu-xenial main

So in our case:

mkdir -p /etc/apt/sources.list.d
echo "deb ubuntu-trusty main" | sudo tee /etc/apt/sources.list.d/docker.list

4. Update the APT package index

sudo apt-get update

5. Verify that APT is pulling from the right repository

FYI: When you run the following command, an entry is returned for each version of Docker that is available for you to install. Each entry should have the URL The version currently installed is marked with ***.

apt-cache policy docker-engine

6. Install the linux-image-extra-* kernel packages

FYI: For Ubuntu Trusty, Wily, and Xenial, install the linux-image-extra-* kernel packages, which allows you use the aufs storage driver.

sudo apt-get update
sudo apt-get install linux-image-extra-$(uname -r) linux-image-extra-virtual

7. Finally Install a specific version of Docker Engine

7.1 List all available versions using apt-cache madison

apt-cache madison docker-engine


ubuntu@dev01:~$ apt-cache madison docker-engine
docker-engine | 1.12.6-0~ubuntu-trusty | ubuntu-trusty/main amd64 Packages
docker-engine | 1.12.5-0~ubuntu-trusty | ubuntu-trusty/main amd64 Packages
docker-engine | 1.12.4-0~ubuntu-trusty | ubuntu-trusty/main amd64 Packages
docker-engine | 1.12.3-0~trusty | ubuntu-trusty/main amd64 Packages
docker-engine | 1.12.2-0~trusty | ubuntu-trusty/main amd64 Packages
docker-engine | 1.12.1-0~trusty | ubuntu-trusty/main amd64 Packages
docker-engine | 1.12.0-0~trusty | ubuntu-trusty/main amd64 Packages

7.2 Install docker-engine 1.12.6

FYI: If you already have a newer version installed, you will be prompted to downgrade Docker. Otherwise, the specific version will be installed.

sudo apt-get install docker-engine=1.12.6-0~ubuntu-trusty

8. Start the docker daemon

sudo service docker start

9. Verify that docker is installed correctly by running the hello-world image

sudo docker run hello-world

Based on:


GOLANG: CI using Git Post Receive Hook

Continuous Integration (CI) using automated build and test tools was well covered by Jason McVetta – read his blog post to learn how to get Status and Test-Coverage badges on your GitHub project page. What I’d like to document here is a process of deploying your GOLANG code to a staging server using git push and then physically QA the live project (in this case a website) before deploying it to production.

There are other ways of building out a staging/QA site – you could cross compile the binary on your local development machine and then deploy a self contained binary to staging. However, my goal is to build the binary on an identical copy of production directly from source and then deploy the binary to production. Plus I’d like to have an option of multiple developers pushing to a centralized repo.

The process consists of the following steps:

  1. setup a bare GIT repo on the staging server
  2. setup remote repo called staging_qa on your dev machine and point it to 1)
  3. push to it from your development machine using git push staging_qa master
  4. have the post receive hook on staging server checkout the code to a local directory
  5. compile the GO code using go get and go install
  6. refresh/bounce QA live site with new code (it’s served via NGINX)


  • Development: OS X [10.9.5]
  • Staging: Ubuntu 14.04.2 LTS

Lets dive in!

Setup Staging

Setup GOLANG on Staging

Purge default golang installation using “apt-get –purge autoremove”

sudo apt-get --purge autoremove golang

Now install golang directly from the source to get the latest version (in this case it’s 1.4.2):

sudo tar -C /usr/local -xzf go1.4.2.linux-amd64.tar.gz

Test installation:

vi hello.go
--------- paste --------------
package main

import "fmt"

func main() {
    fmt.Printf("hello, world\n")

# go run hello.go

setup GO workspace [maps to GOPATH – see below]

mkdir -p /apps/dev/golang

setup go/bin env in global profile

sudo vi /etc/profile

add the following:

export PATH=$PATH:/usr/local/go/bin

setup GOPATH env in local .profile

vi $HOME/.profile

edit/add as follows:

GOPATH=/apps/dev/golang; export GOPATH

Create Directory Structure

I will use my GitHub project DOP – Day One Parser for this example. The directory structure will live under APP_TOP (/apps) and will have three major components as follows:

/apps                        <— APP_TOP
├── dev
|    └── golang              <— GOPATH
|        ├── bin
|        ├── pkg
|        └── src
|            └──
|                └── vmogilev
|                    └── dop <— [1] Source Code
├── stage
|    └── git
|        └── vmogilev
|            └── dop.git     <— [3] BARE Git Repo
     └── dop                 <— [2] http root
         ├── conf
         ├── static
         │   ├── css
         │   ├── fonts
         │   └── js
         └── templates

Here’s how to create this structure:

First set APP_TOP:

APP_TOP=/apps; export APP_TOP

Next create three directories/structure:

  1. ${GOPATH}/src/ – project’s source code directory in GOPATH (also under APP_TOP) so we can compile using go install, the code will be copied here by the git post-receive hook right after we push from development:
    mkdir -p ${GOPATH}/src/
  2. ${APP_TOP}/ – project’s http root directory where the site config files will live. We’ll also stage and serve the website assets from this directory using go’s http server and proxy it using NGINX (this allows running go’s http server on port other than 80 so we can have multiple apps running on the same server all sharing port:80 end-point via NGINX). Some of the assets in this directory (go templates, css and java script) are versioned in the git repo so they will be copied here by the git’s post-receive hook (see further down the writeup):
    mkdir -p ${APP_TOP}/
  3. ${APP_TOP}/stage/git/vmogilev/dop.git – bare git repo that we’ll push to from development [must end with .git]:
    mkdir -p ${APP_TOP}/stage/git/vmogilev/dop.git
    cd ${APP_TOP}/stage/git/vmogilev/dop.git
    git init --bare

Create Post Receive Hook

cd ${APP_TOP}/stage/git/vmogilev/dop.git
touch hooks/post-receive
chmod +x hooks/post-receive
vi hooks/post-receive

paste the following:


# ----------- EDIT BEGIN ----------- #

APP_TOP=/apps; export APP_TOP
GO=/usr/local/go/bin/go; export GO

## go [get|install] ${SRC_PATH}/${APP_NAME}; export SRC_PATH
APP_NAME=dop; export APP_NAME

## local http root directory served by go http - ${APP_TOP}/${WWW_PATH}
## for / directory use /root:
##        ->
##   ->
##    ->; export WWW_PATH

## local bare git repo path - ${SRC_NAME}/${APP_NAME}.git
SRC_NAME=vmogilev; export SRC_NAME

# ----------- EDIT END ----------- #

GOPATH=${APP_TOP}/dev/golang; export GOPATH
GIT_DIR=${APP_TOP}/stage/git/${SRC_NAME}/${APP_NAME}.git; export GIT_DIR

## pre-creating SOURCE DIR solves the issue with:
##  "remote: fatal: This operation must be run in a work tree"
mkdir -p ${SOURCE}
mkdir -p ${TARGET}

GIT_WORK_TREE=${SOURCE} git checkout -f

## do not prefix go get with GIT_WORK_TREE - it causes the following errors:
##  remote: # cd .; git clone /apps/dev/golang/src/
##  remote: fatal: working tree '/apps/dev/golang/src/' already exists.

unset GOBIN
unset GIT_DIR
$GO install ${SRC_PATH}/${APP_NAME}

if [ $? -gt 0 ]; then
    echo "ERROR: compiling ${APP_NAME} - exiting!"
    exit 1

sudo setcap 'cap_net_bind_service=+ep' $GOPATH/bin/${APP_NAME}

# ----------- DEPLOY BEGIN ----------- #

cp -pr ${SOURCE}/static     ${TARGET}/
cp -pr ${SOURCE}/templates  ${TARGET}/
cp -p ${SOURCE}/*.sh        ${TARGET}/

. ${TARGET}/conf/${APP_NAME}.env
${TARGET}/ >> ${TARGET}/server.log 2>&1 </dev/null
${TARGET}/ >> ${TARGET}/server.log 2>&1 </dev/null

# ----------- DEPLOY END ----------- #

What’s happening here? Lets break it down:

  1. APP_TOP – top level mount point where everything lives under
  2. GO – complete path to go binary
  3. SRC_PATH and APP_NAME – the combination of the two is what will be passed to go [get|install] ${SRC_PATH}/${APP_NAME}. APP_NAME is the actual binary name – $GOPATH/bin/${APP_NAME} on which we’ll set a special flag sudo setcap that allows to bind on privileged ports <1024
  4. WWW_PATH – since our app has static assets we need an http root directory to serve them from. Depending on your app you can serve these using GO’s http server or NGINX directly. I use GO’s http server and then proxy everything via NGINX to simply configuration. These assets are part of the git repo and will be copied to${APP_TOP}/${WWW_PATH} using post receive hook (see DEPLOY BEGIN|END section). The convention for top level domain is
  5. SRC_NAME – this becomes part of the GIT’s bare repo path in the following format ${SRC_NAME}/${APP_NAME}.git – this is what you’ll map to on the development machine using git remote add … (see Setup Git Repo further down)

Now lets talk about what’s going on in the DEPLOY section:

  1. Part of the source code are startup/shutdown scripts named: and and two assets directories named: static and templates – we copy all of this from go’s project directory to target located in WWW_PATH.
  2. We then expect an env file to be present in ${TARGET}/conf/${APP_NAME}.env that sets up our environmental variables for the app’s runtime on this staging box so that when we execute and these envs are passed to our app. Here are the contents of the env file:
    DOPROOT="/apps/"; export DOPROOT
    HTTPHOST="http://localhost"; export HTTPHOST
    HTTPMOUNT="/dop"; export HTTPMOUNT
    HTTPPORT="3001"; export HTTPPORT
  3. Here’s an excerpt of the that passes these to the app:
    nohup $GOPATH/bin/dop \
        -dopRoot="${DOPROOT}" \
        -httpHost="${HTTPHOST}" \
        -httpMount="${HTTPMOUNT}" \
        -httpPort="${HTTPPORT}" \
        -httpHostExt="${HTTPHOSTEXT}" >> ${DOPROOT}/server.log 2>&1 </dev/null &


I am using NGINX on Port 80 and proxy GO’s HTTP server that runs on higher port number – this allows running multiple go apps on different ports yet all accessible via regular http port on the same server:

nginx:80/app1 -> app1:3001
nginx:80/app2 -> app2:3002
nginx:80/app[n] -> app[n]:300[n]

Install nginx:

sudo apt-get install nginx
sudo service nginx start
sudo service nginx stop

Make sure that nginx starts automatically:

sudo update-rc.d nginx defaults

To set up NGINX can be as simple as this:

vi /etc/nginx/sites-available/default

edit as follows which will setup proxy mount point for your app (in this case /dop via port 3001):

server {
    listen 80 default_server;
    listen [::]:80 default_server ipv6only=on;

    root /usr/share/nginx/html;
    index index.html index.htm;

    # Make site accessible from http://localhost/
    server_name localhost;

    location / {
        # First attempt to serve request as file, then
        # as directory, then fall back to displaying a 404.
        try_files $uri $uri/ =404;
        # Uncomment to enable naxsi on this location
        # include /etc/nginx/naxsi.rules

    location /dop {
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $remote_addr;
            proxy_set_header Host $host;

next bounce nginx server:

    sudo service nginx restart

if you run into any problems check log file under /var/log/nginx/error.log (this is defined in /etc/nginx/nginx.conf.

Setup Development

Setup Password-less SSH

In order to git push via ssh we’ll need to paste our personal SSH Public KEY into ~/.ssh/authorized_keys on the staging server. Here’s how to do this:

  1. On your development machine copy the contents of your ~/.ssh/
  2. Go back to the staging server and paste it to ~/.ssh/authorized_keys
  3. Back on development machine make sure you can ssh user@my-staging-box without supplying the password

As a bonus point setup a bastion host on your network and only allow ssh traffic to pass through it. That’s what I am doing in our infrastructure.

Setup Git Repo

first we need to setup global git prefs (if not already):

git config --global "myusername"
git config --global ""
git config --global core.autocrlf input

next cd into your go project’s directory and setup git repo with a remote origin pointing to the staging’s bare repo we created earlier:

cd $GOPATH/src/
git init
git remote add staging_qa ubuntu@staging-box:/apps/stage/git/vmogilev/dop.git
git add .
git commit -a -m "Initial Commit"
git push staging_qa master