0

I have setup an SQS queue where S3 paths are being pushed whenever there is a file upload.

So I have a setup where I'll receive 10s of small csv files and I want to hold them in a SQS queue and trigger the lambda only once when all the files have arrived during a specific time let's say 5 minutes.

Here is my CF code

  LambdaFunctionEventSourceMapping:
    Type: AWS::Lambda::EventSourceMapping
    Properties:
      BatchSize: 5000
      MaximumBatchingWindowInSeconds: 300
      Enabled: true
      EventSourceArn: !GetAtt EventQueue.Arn
      FunctionName: !GetAtt QueueConsumerLambdaFunction.Arn

  EventQueue:
    Type: AWS::SQS::Queue
    Properties:
      QueueName: Event-Queue
      DelaySeconds: 10
      VisibilityTimeout: 125
      ReceiveMessageWaitTimeSeconds: 10

  QueueConsumerLambdaFunction:
    Type: AWS::Lambda::Function
    Properties:
      FunctionName: queue-consumer
      Runtime: python3.7
      Code: ./queue-consumer
      Handler: main.lambda_handler
      Role: !GetAtt QueueConsumerLambdaExecutionRole.Arn
      Timeout: 120
      MemorySize: 512
      ReservedConcurrentExecutions: 1

The deployment works fine but if I push 3 files to S3 bucket the SQS triggers 3 different lambda functions asynchronously which I don't want. I need one lambda function to contain all messages in the queue as a result of S3 event and process them. Is there something wrong in my SQS configuration?

5
  • Have you set lambda concurrency to 1? Commented Jul 27, 2021 at 10:32
  • @Marcin Yes I did. Actually SQS is acting weirdly. Sometimes it triggers 3 lambda functions with 3 different messages for each lambda. And sometimes it triggers 2 lambda functions with 1 one lambda having 2 messages and the other lambda with one message. Commented Jul 27, 2021 at 10:50
  • 1
    and trigger the lambda only once when all the files have arrived no, this is not how SQS and Lambda works. Yes, you may receive messages in a batch, but you don't have control over which batch and which messages. If you need such a grouping, you may use DynamoDB to identify and trigger a lambda on specific updates. (as well you may use Kinesis Analytics, but imho it would be overkill) Commented Jul 27, 2021 at 11:27
  • How did it go? Still unclear what is happening? Commented Jul 28, 2021 at 9:23
  • Thanks for the help and answer. Actually I am using yours and @gusto2 (both) recommendation to use dynamoDB and scheduled events i.e. more simple to implement. I am using S3 notifications to trigger the lambda which writes records to dynamoDB. And then after each hour I'm scheduling an event that triggers lambda and process my all files and then truncating the table after processing so that other new notifications don't duplicate or overlap. Commented Jul 28, 2021 at 9:40

1 Answer 1

1

What you are observing is likely due to five parallel threads that AWS is using to query your SQS queue. These threads are separate from concurrency setting, and you have no control over these threads. There are always 5 of them.

So each thread will get some msgs from the queue, then your function is going to be invoked with these msgs in turn. Sadly you can't change how it works, as this is how sqs and lambda work at AWS side.

Sign up to request clarification or add additional context in comments.

2 Comments

then I think this solution might not work for me. Can you suggest a better solution to my use case? I.e. Holding S3 events for a specific time and then triggering lambda to process all those S3 files at once
@muazfaiz You could maybe still use lambda, but don't setup automatic events between the sqs and lambda. Have a lambda run on schedule, every 5 minute, and query the queue yourself in the lambda function?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.