batch-ops-invoke-lambda

Invoking a Lambda Function from Amazon S3 Batch Operations

Amazon S3 batch operations can invoke AWS Lambda functions to perform custom actions on objects that are listed in a manifest. This section describes how to create a Lambda function to use with Amazon S3 batch operations and how to create a job to invoke the function. The Amazon S3 batch operations job uses the LambdaInvoke operation to run a Lambda function on each object listed in a manifest.

You can work with Amazon S3 batch operations for Lambda using the AWS Management Console, AWS Command Line Interface (AWS CLI), AWS SDKs, or REST APIs. For more information about using Lambda, see Getting Started with AWS Lambda in the AWS Lambda Developer Guide.

The following sections explain how you can get started using Amazon S3 batch operations with Lambda.

Topics

Using Lambda with Amazon S3 Batch Operations

When using Amazon S3 batch operations with AWS Lambda, you must create new Lambda functions specifically for use with Amazon S3 batch operations. You can't reuse existing Amazon S3 event-based functions with Amazon S3 batch operations. Event functions can only receive messages; they don't return messages. The Lambda functions that are used with Amazon S3 batch operations must accept and return messages. For more information about using Lambda with Amazon S3 events, see Using AWS Lambda with Amazon S3 in the AWS Lambda Developer Guide.

You create an Amazon S3 batch operations job that invokes your Lambda function. The job runs the same Lambda function on all of the objects listed in your manifest. You can control what versions of your Lambda function to use while processing the objects in your manifest. Amazon S3 batch operations support unqualified Amazon Resource Names (ARNs), aliases, and specific versions. For more information, see Introduction to AWS Lambda Versioning in the AWS Lambda Developer Guide.

If you provide the Amazon S3 batch operations job with a function ARN that uses an alias or the $LATEST qualifier, and you update the version that either of those points to, Amazon S3 batch operations starts calling the new version of your Lambda function. This can be useful when you want to update functionality part of the way through a large job. If you don't want Amazon S3 batch operations to change the version that is used, provide the specific version in the FunctionARN parameter when you create your job.

Response and Result Codes

There are two levels of codes that Amazon S3 batch operations expect from Lambda functions. The first is the response code for the entire request, and the second is a per-task result code. The following table contains the response codes.


Response CodeDescription
SucceededThe task completed normally. If you requested a job completion report, the task's result string is included in the report.
TemporaryFailureThe task suffered a temporary failure and will be redriven before the job completes. The result string is ignored. If this is the final redrive, the error message is included in the final report.
PermanentFailureThe task suffered a permanent failure. If you requested a job-completion report, the task is marked as Failed and includes the error message string. Result strings from failed tasks are ignored.

Creating a Lambda Function to Use with Amazon S3 Batch Operations

This section provides example AWS Identity and Access Management (IAM) permissions that you must use with your Lambda function. It also contains an example Lambda function to use with Amazon S3 batch operations. If you have never created a Lambda function before, see Tutorial: Using AWS Lambda with Amazon S3 in the AWS Lambda Developer Guide.

You must create Lambda functions specifically for use with Amazon S3 batch operations. You can't reuse existing Amazon S3 event-based Lambda functions. This is because Lambda functions that are used for Amazon S3 batch operations must accept and return special data fields.

Example IAM Permissions

The following are examples of the IAM permissions that are necessary to use a Lambda function with Amazon S3 batch operations.

Example β€” Amazon S3 batch operations trust policy
The following is an example of the trust policy that you can use for the batch operations IAM role. This IAM role is specified when you create the job and gives batch operations permission to assume the IAM role.

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "batchoperations.s3.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}

Example β€” Lambda IAM policy
The following is an example of an IAM policy that gives Amazon S3 batch operations permission to invoke the Lambda function and read the input manifest.

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BatchOperationsLambdaPolicy",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion",
"s3:PutObject",
"lambda:InvokeFunction"
],
"Resource": "*"
}
]
}

Example Request and Response

This section provides request and response examples for the Lambda function.

Example Request
The following is a JSON example of a request for the Lambda function.

{
"invocationSchemaVersion": "1.0",
"invocationId": "YXNkbGZqYWRmaiBhc2.mdxW9hZHNmZGpmaGFzbGtkaGZza2RmaAo",
"job": {
"id": "f3cc4f60-61f6-4a2b-8a21-d07600c373ce"
},
"tasks": [
{
"taskId": "dGFza2lkZ29lc2hlcmUK",
"s3Key": "customerImage1.jpg",
"s3VersionId": "1",
"s3BucketArn": "arn:aws:s3:us-east-1:0123456788:awsexamplebucket"
}
]
}

Example Response
The following is a JSON example of a response for the Lambda function.

{
"invocationSchemaVersion": "1.0",
"treatMissingKeysAs" : "PermanentFailure",
"invocationId" : "YXNkbGZqYWRmaiBhc2.mdxW9hZHNmZGpmaGFzbGtkaGZza2RmaAo",
"results": [
{
"taskId": "dGFza2lkZ29lc2hlcmUK",
"resultCode": "Succeeded",
"resultString": "[\"Mary Major", \"John Stiles\"]"
}
]
}

Example Lambda Function for Amazon S3 Batch Operations

The following example Python Lambda function iterates through the manifest, copying and renaming each object.

As the example shows, keys from Amazon S3 batch operations are URL encoded. To use Amazon S3 with other AWS services, it's important that you URL decode the key that is passed from Amazon S3 batch operations.

import boto3
import urllib
from botocore.exceptions import ClientError
def lambda_handler(event, context):
# Instantiate boto client
s3Client = boto3.client('s3')
# Parse job parameters from Amazon S3 batch operations
jobId = event['job']['id']
invocationId = event['invocationId']
invocationSchemaVersion = event['invocationSchemaVersion']
# Prepare results
results = []
# Parse Amazon S3 Key, Key Version, and Bucket ARN
taskId = event['tasks'][0]['taskId']
s3Key = urllib.unquote(event['tasks'][0]['s3Key']).decode('utf8')
s3VersionId = event['tasks'][0]['s3VersionId']
s3BucketArn = event['tasks'][0]['s3BucketArn']
s3Bucket = s3BucketArn.split(':::')[-1]
# Construct CopySource with VersionId
copySrc = {'Bucket': s3Bucket, 'Key': s3Key}
if s3VersionId is not None:
copySrc['VersionId'] = s3VersionId
# Copy object to new bucket with new key name
try:
# Prepare result code and string
resultCode = None
resultString = None
# Construct New Key
newKey = rename_key(s3Key)
newBucket = 'destination-bucket-name'
# Copy Object to New Bucket
response = s3Client.copy_object(
CopySource = copySrc,
Bucket = newBucket,
Key = newKey
)
# Mark as succeeded
resultCode = 'Succeeded'
resultString = str(response)
except ClientError as e:
# If request timed out, mark as a temp failure
# and Amason S3 batch operations will make the task for retry. If
# any other exceptions are received, mark as permanent failure.
errorCode = e.response['Error']['Code']
errorMessage = e.response['Error']['Message']
if errorCode == 'RequestTimeout':
resultCode = 'TemporaryFailure'
resultString = 'Retry request to Amazon S3 due to timeout.'
else:
resultCode = 'PermanentFailure'
resultString = '{}: {}'.format(errorCode, errorMessage)
except Exception as e:
# Catch all exceptions to permanently fail the task
resultCode = 'PermanentFailure'
resultString = 'Exception: {}'.format(e.message)
finally:
results.append({
'taskId': taskId,
'resultCode': resultCode,
'resultString': resultString
})
return {
'invocationSchemaVersion': invocationSchemaVersion,
'treatMissingKeysAs': 'PermanentFailure',
'invocationId': invocationId,
'results': results
}
def rename_key(s3Key):
# Rename the key by adding additional suffix
return s3Key + '_new_suffix'

Creating an Amazon S3 Batch Operations Job That Invokes a Lambda Function

When creating an Amazon S3 batch operations job to invoke a Lambda function, you must provide the following:

  • The ARN of your Lambda function (which might include the function alias or a specific version number)
  • An IAM role with permission to invoke the function
  • The action parameter LambdaInvokeFunction

For more information about creating an Amazon S3 batch operations job, see Creating an Amazon S3 Batch Operations Job and Operations.

The following example creates an Amazon S3 batch operations job that invokes a Lambda function using the AWS CLI.

aws s3control create-job
--account-id <AccountID>
--operation '{"LambdaInvoke": { "FunctionArn": "arn:aws:lambda:Region:AccountID:function:LambdaFunctionName" } }'
--manifest '{"Spec":{"Format":"S3BatchOperations_CSV_20180820","Fields":["Bucket","Key"]},"Location":{"ObjectArn":"arn:aws:s3:::ManifestLocation","ETag":"ManifestETag"}}'
--report '{"Bucket":"arn:aws:s3:::awsexamplebucket","Format":"Report_CSV_20180820","Enabled":true,"Prefix":"ReportPrefix","ReportScope":"AllTasks"}'
--priority 2
--role-arn arn:aws:iam::AccountID:role/BatchOperationsRole
--region Region
--description β€œLambda Function"

Providing Task-Level Information in Lambda Manifests

When you use AWS Lambda functions with Amazon S3 batch operations, you might want additional data to accompany each task/key that is operated on. For example, you might want to have both a source object key and new object key provided. Your Lambda function could then copy the source key to a new S3 bucket under a new name. By default, Amazon S3 batch operations let you specify only the destination bucket and a list of source keys in the input manifest to your job. The following describes how you can include additional data in your manifest so that you can run more complex Lambda functions.

To specify per-key parameters in your Amazon S3 batch operations manifest to use in your Lambda function's code, use the following URL-encoded JSON format. The key field is passed to your Lambda function as if it were an Amazon S3 object key. But it can be interpreted by the Lambda function to contain other values or multiple keys, as shown following.

Note
The maximum number of characters for the key field in the manifest is 1,024.

Example β€” Manifest substituting the "Amazon S3 keys" with JSON strings
The URL-encoded version must be provided to Amazon S3 batch operations.

my-bucket,{"origKey": "object1key", "newKey": "newObject1Key"}
my-bucket,{"origKey": "object2key", "newKey": "newObject2Key"}
my-bucket,{"origKey": "object3key", "newKey": "newObject3Key"}

Example β€” Manifest URL-encoded
This URL-encoded version must be provided to Amazon S3 batch operations. The non-URL-encoded version does not work.

my-bucket,%7B%22origKey%22%3A%20%22object1key%22%2C%20%22newKey%22%3A%20%22newObject1Key%22%7D
my-bucket,%7B%22origKey%22%3A%20%22object2key%22%2C%20%22newKey%22%3A%20%22newObject2Key%22%7D
my-bucket,%7B%22origKey%22%3A%20%22object3key%22%2C%20%22newKey%22%3A%20%22newObject3Key%22%7D

Example β€” Lambda function with manifest format writing results to the job report
This Lambda function shows how to parse JSON that is encoded into the Amazon S3 batch operations manifest.

import json
from urllib.parse import unquote_plus
# This example Lambda function shows how to parse JSON that is encoded into the Amazon S3 batch
# operations manifest containing lines like this:
#
# bucket,encoded-json
# bucket,encoded-json
# bucket,encoded-json
#
# For example, if we wanted to send the following JSON to this Lambda function:
#
# bucket,{"origKey": "object1key", "newKey": "newObject1Key"}
# bucket,{"origKey": "object2key", "newKey": "newObject2Key"}
# bucket,{"origKey": "object3key", "newKey": "newObject3Key"}
#
# We would simply URL-encode the JSON like this to create the real manifest to create a batch
# operations job with:
#
# my-bucket,%7B%22origKey%22%3A%20%22object1key%22%2C%20%22newKey%22%3A%20%22newObject1Key%22%7D
# my-bucket,%7B%22origKey%22%3A%20%22object2key%22%2C%20%22newKey%22%3A%20%22newObject2Key%22%7D
# my-bucket,%7B%22origKey%22%3A%20%22object3key%22%2C%20%22newKey%22%3A%20%22newObject3Key%22%7D
#
def lambda_handler(event, context):
# Parse job parameters from Amazon S3 batch operations
jobId = event['job']['id']
invocationId = event['invocationId']
invocationSchemaVersion = event['invocationSchemaVersion']
# Prepare results
results = []
# S3 batch operations currently only passes a single task at a time in the array of tasks.
task = event['tasks'][0]
# Extract the task values we might want to use
taskId = task['taskId']
s3Key = task['s3Key']
s3VersionId = task['s3VersionId']
s3BucketArn = task['s3BucketArn']
s3BucketName = s3BucketArn.split(':::')[-1]
try:
# Assume it will succeed for now
resultCode = 'Succeeded'
resultString = ''
# Decode the JSON string that was encoded into the S3 Key value and convert the
# resulting string into a JSON structure.
s3Key_decoded = unquote_plus(s3Key)
keyJson = json.loads(s3Key_decoded)
# Extract some values from the JSON that we might want to operate on. In this example
# we won't do anything except return the concatenated string as a fake result.
newKey = keyJson['newKey']
origKey = keyJson['origKey']
resultString = origKey + " --> " + newKey
except Exception as e:
# If we run into any exceptions, fail this task so batch operations does not retry it and
# return the exception string so we can see the failure message in the final report
# created by batch operations.
resultCode = 'PermanentFailure'
resultString = 'Exception: {}'.format(e)
finally:
# Send back the results for this task.
results.append({
'taskId': taskId,
'resultCode': resultCode,
'resultString': resultString
})
return {
'invocationSchemaVersion': invocationSchemaVersion,
'treatMissingKeysAs': 'PermanentFailure',
'invocationId': invocationId,
'results': results
}