0

I have running celery with django. I import a stream of objects into my database by using tasks. each task imports one object. concurrency is 2. within the stream objects can be duplicated, but should not be inside my database.

code i'm running:

if qs.exists() and qs.count() == 1:
    return qs.get()
elif qs.exists():
    logger.exception('Multiple venues for same place')
    raise ValueError('Multiple venues for same place')
else:
    obj = self.create(**defaults)

problem is that if objects inside the stream are duplicate and very close to each other, the app still imports the same objects twice.

I assume that the db checks are not working properly with this concurrency setup. what architecture du you recommend to resolve this issue?

1 Answer 1

1

You have to use locking architecture, so will block the the two tasks from executing the object fetching part at the same time, you can use python-redis-lock to do that.

Sign up to request clarification or add additional context in comments.

1 Comment

wow, great advice. will go for it! fyi my interim solution was to put a unique constraint on two cols to have at least ensured that db stays consistent. the raised error kills the celery task. it's for now more a brute force-like attempt.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.