2

I want to build a product that can perform some Internet scans (in Python) to collect various kinds of data.

I want to design it with tasks that perform these collecting jobs.

There can be multiple scans that run parallel on different inputs, so tasks can be duplicated since they have different inputs to operate on.

I wonder which architecture would fit for it, and what technologies are the best.

I thought of using RabbitMQ to store the tasks and Redis to store inputs.

The initial inputs trigger the scan, then each task spits its output that might be the input for other tasks.

What do you think of this possible design? Can it be improved? Other technologies?

1
  • During the system design you should try to avoid using specific technologies. Rather than you should rely on abstract components (like message-queue, distributed cache, etc.). Drawing diagrams to depict the data/communication flow would also help you (and SO readers) to better understand your problem domain and your proposed solution. Commented Feb 2, 2023 at 11:58

2 Answers 2

1

It depends on the size of the inputs. If those are relatively small I would go with just message broker and sending everything in the message (i.e. the task type and its inputs) - otherwise some outside store is better used. Depending on the durability requirements possibly a persistent storage (like database) should be considered.

Sign up to request clarification or add additional context in comments.

Comments

0

One option is to use an existing orchestrator which hides most of the complexity instead of crafting a custom solution based on queues and storage. Look at temporal.io open source project which allows orchestrating tasks using high level programming language.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.