I have setup an SQS queue where S3 paths are being pushed whenever there is a file upload.
So I have a setup where I'll receive 10s of small csv files and I want to hold them in a SQS queue and trigger the lambda only once when all the files have arrived during a specific time let's say 5 minutes.
Here is my CF code
LambdaFunctionEventSourceMapping:
Type: AWS::Lambda::EventSourceMapping
Properties:
BatchSize: 5000
MaximumBatchingWindowInSeconds: 300
Enabled: true
EventSourceArn: !GetAtt EventQueue.Arn
FunctionName: !GetAtt QueueConsumerLambdaFunction.Arn
EventQueue:
Type: AWS::SQS::Queue
Properties:
QueueName: Event-Queue
DelaySeconds: 10
VisibilityTimeout: 125
ReceiveMessageWaitTimeSeconds: 10
QueueConsumerLambdaFunction:
Type: AWS::Lambda::Function
Properties:
FunctionName: queue-consumer
Runtime: python3.7
Code: ./queue-consumer
Handler: main.lambda_handler
Role: !GetAtt QueueConsumerLambdaExecutionRole.Arn
Timeout: 120
MemorySize: 512
ReservedConcurrentExecutions: 1
The deployment works fine but if I push 3 files to S3 bucket the SQS triggers 3 different lambda functions asynchronously which I don't want. I need one lambda function to contain all messages in the queue as a result of S3 event and process them. Is there something wrong in my SQS configuration?
and trigger the lambda only once when all the files have arrivedno, this is not how SQS and Lambda works. Yes, you may receive messages in a batch, but you don't have control over which batch and which messages. If you need such a grouping, you may use DynamoDB to identify and trigger a lambda on specific updates. (as well you may use Kinesis Analytics, but imho it would be overkill)