0

I currently have a scenario where i need to extract content from a number of web pages, process the content, the insert into MongoDB. I was pondering what would be the best way to do. Its gotten me a bit confused as there a bunch of way to approach this.

I could for instance do something like this:

jobs = group(visit.s(i) for i  in payload) 
res = ( jobs|process.s()|save.s())()

As far as I understand this would visit a page, then process it, then save it. All done async(maybe at the same time?). This way its full async and there is really no waiting going on, but each item is individual inserted into MongoDB.

On the other hand I could use a callback:

jobs = group(visit.s(i) for i  in payload) 
res = chord( jobs|process.s(),save.s())()

This would do the aync fetching and processing, but then wait for all the tasks to be completed and then async save the results, which will lead to a bulk insert into MongoDB.

So I was wondering if anyone has had a similar experience and which is a better approach in general. Should I just go full on async? Or have some sync code in there.

Also side question any comments on maybe using chunking instead of single page tasks would be great.

Thanks

1 Answer 1

0

Related question, which also contains answer. Not directly related, but the jist of it is that the bulk insert is still just a single insert in a loop and that Mongo can handle a large number of connections, so it seems it does not matter and async should work fine.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.