In this post I’m going to show you how to use a step function to iterate S3 buckets
AWS Step functions is a product that helps you coordinate multiple AWS services into serverless workflows
You can start an execution of a step function from AWS Cloudwatch Rules, AWS Lambda or AWS API Gateway for example
If you want to know more about Step functions check the AWS Documentation because this post only shows this use case
Step function to iterate S3 use case
Imagine that you have a S3 bucket and you want to do a job with Step Functions iterating over all the objects in the bucket. There is no iterate function in a step function workflow so you have to do it on your own
Populating the S3 bucket
We are going to create 20 empty files to upload them to S3. If you’re using a Unix like operating system you can use touch obj{1..20}
to create the files
Now create a S3 bucket and upload all the files
Creating the State Machine
It’s time to create the State Machine in AWS Step Functions. Give it a name like IterateS3
for example. For now, we are going to keep the default definition
It’s time to create a new role for this State Machine. Give it a name, IterateS3Permissions
for example and it will be created with the default NoPermissionsAccessPolicy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"Action": "*",
"Resource": "*"
}
]
}
Your State machine is created. You can start an execution if you want but it won’t do nothing
Creating the Lambda Function
We have to create the Lambda function that do the real job, so let’s get started
Create a Lambda function from scratch. Give it the name IterateS3Function
for example and choose Node.js 10.x
runtime. Create a new execution role with basic Lambda permissions for the function. This is the function that the step function is going to use to iterate s3
We need to change the Lambda function role to give it permissions to list the bucket so we will add the next policy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "1",
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::your-bucket-name"
}
]
}
Now we can create the code that iterates the S3 bucket in the Lambda function
const AWS = require('aws-sdk')
const s3 = new AWS.S3()
exports.handler = async (event) => {
const response = await s3.listObjectsV2({
Bucket: process.env.S3_BUCKET,
MaxKeys: process.env.MAX_KEYS,
ContinuationToken: event.NextContinuationToken || null
}).promise()
return {
Files: response.Contents,
NextContinuationToken: response.NextContinuationToken || ""
}
};
In lines from 5 to 9, we invoke the GET Bucket (List Objects) Version 2 method from the AWS API. This method returns all the objects in a bucket
You must set an environment variable called S3_BUCKET
with your actual bucket name and another environment variable called MAX_KEYS
with the maximum number of keys returned by the API Method
As you can see, we are returning the Files
and the NextContinuationToken
. The NextContinuationToken
is an id used by AWS to know how to continue the iteration process of an API call
Updating the State Machine
We have the Lambda function but we need the Step Function. Let’s change the State machine definition as follows
{
"StartAt": "Configure",
"States": {
"Configure": {
"Type": "Pass",
"Result": {
"NextContinuationToken": ""
},
"ResultPath": "$.iterator",
"Next": "Iterator"
},
"Iterator": {
"Type": "Task",
"Resource": "your-iterator-lambda-function-arn",
"Parameters": {
"NextContinuationToken.$": "$.iterator.NextContinuationToken"
},
"ResultPath": "$.iterator",
"Next": "DoYourThing"
},
"DoYourThing": {
"Type": "Task",
"Resource": "your-lambda-function-to-get-your-thing-with-files-arn",
"Next": "HasNextElements"
},
"HasNextElements": {
"Type": "Choice",
"Choices": [
{
"Not": {
"Variable": "$.iterator.NextContinuationToken",
"StringEquals": ""
},
"Next": "Iterator"
}
],
"Default": "Done"
},
"Done": {
"Type": "Pass",
"End": true
}
}
}
In the State machine definition, we have the Configure
task where we set the initial variables. Then we have the Iterator
task which is responsible for invoking the lambda function to iterate the S3 bucket
The DoYourThing
task is where your actual job is done with the files. Don’t forget to return the iterator
object after you do your job, at least, the NextContinuationToken
property or the next taks won’t work
The HasNextElements
task evaluates the NextContinuationToken
property and if it has a value it will return to the Iterator task. Otherwise, it will finish
We need to change the IAM Role of the State machine to give it permissions to execute the Lambda function. So, let’s do it removing the NoPermissionsAccessPolicy
and adding a new one
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "1",
"Effect": "Allow",
"Action": "lambda:InvokeFunction",
"Resource": "your-lambda-function-arn"
}
]
}
Now you can execute your State machine and iterate through your S3 bucket
Conclusion
In conclusion, we can use this code to create loops inside a state machine. Anyway, AWS is always updating its services so let’s hope that they create a Loop Task
or something like that
You can see another example of a Loop iteration in the official documentation