Upload AWS log files of various formats (Cloudfront, S3, Cloudtrail) to your log storage of choice via Winston in a best-effort manner using AWS Lambda and the Node.js v4.3 runtime.
Amazon allows you to log Cloudfront activity or S3 accesses to your buckets, but only stores them as log objects on S3, sometimes gzipped (as with Cloudfront logs). While you can set up CloudWatch alerts for e.g. 4xx errors in Cloudfront traffic, there's no easy way to tail or search these logs to track down issues. Now with this bit of Lambda glue, you can!
The module takes the following options and returns a function to serve as our Lambda handler:
- format - required, one of:
cloudfront
,s3
, orcloudtrail
- transport - required, a Winston transport object.
- reformatter - function that takes a json object. If null, the default format is a string of key=value pairs. If you wish to log json, just return the object.
- callback - function that takes an error param (may be null) and the Lambda function handler's callback. If this option is null, we log any error and call the handler's callback.
Test your script using the Lambda console, or run the function locally with a test event (step 2.3.2) that points to an object on S3:
// To run the code below, `npm install aws-sdk` then:
// node test.js [lambda_fn_file] test_event.txt
// test.js:
const lambda = require('./' + process.argv[2]);
require('fs').readFile(process.argv[3], (err, data) => {
if (err) throw err;
lambda.handler(JSON.parse(data), {}, (err) => {
if (err) console.log(`Error: ${err}`);
console.log('Finished.');
});
});
Ensure that AWS_SECRET_ACCESS_KEY
and AWS_ACCESS_KEY_ID
are set in your
environment for an IAM user with permissions to run these functions.
# ----- EXAMPLE -----
function=s3LogsToPapertrail
bucket=MY_BUCKET
accountid=MY_ACCOUNT_ID
region=us-west-2
indexname=papertrail
# Zip your js file and modules
zip -r $indexname.zip $indexname.js node_modules/
# Create your lambda function. Assumes you already created an execution role
# (see tutorial).
aws lambda create-function \
--region $region \
--function-name $function \
--zip-file fileb://`pwd`/$indexname.zip \
--role arn:aws:iam::$accountid:role/lambda-s3-execution-role \
--handler $indexname.handler \
--runtime nodejs4.3 \
--timeout 10 \
--memory-size 128
# Give S3 permission to invoke this lambda function. 'statement-id' is just
# some unique string.
aws lambda add-permission \
--function-name $function \
--region $region \
--statement-id $function \
--action "lambda:InvokeFunction" \
--principal s3.amazonaws.com \
--source-arn arn:aws:s3:::$bucket \
--source-account $accountid
# Need to update your package?
aws lambda update-function-code \
--region $region \
--function-name $function \
--zip-file fileb://`pwd`/$indexname.zip
For a static site hosted on S3 to publish logs to a bucket, the destination bucket must also be in the same region. For S3 to post event notifications to Lambda, the bucket must already be in a region supported by Lambda.
Therefore your static site must also be hosted in a lambda-supported region.
Currently in the US that is only us-east-1
and us-west-2
. Cloudfront
distributions fortunately can log to any S3 bucket, so it's easier to
reconfigure an existing setup to log to one of these regions.
You also can't have event notifications fire to two different lambda functions for overlapping object prefixes.
Lastly, there isn't any great error handling going on here. E.g. if we're unable to connect to the transport, we'll lose that object's batch of events. If you want to be more robust, you'll probably want to incorporate your own work queue solution.
From Amazon's docs:
The completeness and timeliness of server logging, however, is not guaranteed.
The log record for a particular request might be delivered long after the
request was actually processed, or it might not be delivered at all. The
purpose of server logs is to give you an idea of the nature of traffic against
your bucket. It is not meant to be a complete accounting of all requests. It
is rare to lose log records, but server logging is not meant to be a complete
accounting of all requests.
Adapted from WatchKeep and inspired by convox/papertrail; thanks!