3

I have a celery periodic task but the task run only when either I run celery worker or I run celery beat. I have configured the task to run every 20 minutes in django settings.py file but after I check after 20 minutes the celery worker doesn't receive any task.

celery beat console

celery beat v4.4.4 (cliffs) is starting.
__    -    ... __   -        _
LocalTime -> 2020-07-16 12:10:18
Configuration ->
    . broker -> amqp://guest:**@localhost:5672//
    . loader -> celery.loaders.app.AppLoader
    . scheduler -> celery.beat.PersistentScheduler
    . db -> celerybeat-schedule
    . logfile -> [stderr]@%INFO
    . maxinterval -> 5.00 minutes (300s)
[2020-07-16 12:10:18,835: INFO/MainProcess] beat: Starting...

When I exit this beat and again restart then the celery worker receives the task immediately and executes.

I want the celery worker to receive and execute the task periodically every 20 minutes.How can I do this ?

I run the celery worker and celery beat in two different console. I used commands celery -A myproj worker -l info for worker celery -A myproj beat -l info --pidfile= for beat

tasks.py

app = Celery('myproj')
@app.task
def schedule_task():
    running_tasks = Task.objects.filter(Q(status=0) | Q(status=1))
    print(running_tasks)
    for task in running_tasks:
        unique_id = task.unique_id
        keywords = task.keywords.all()
        if task.scraping_end_date > timezone.now().date():
            settings = {
                'spider_count': len(task.targets.all()),
                'keywords': keywords,
                'scraping_end': task.scraping_end_date,
                'unique_id': unique_id,  # unique ID for each record for DB
                'USER_AGENT': 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
            }

            for site_url in task.targets.all():
                domain = urlparse(site_url.domain).netloc
                spider_name = domain.replace('.com', '')
                task = scrapyd.schedule('default', spider_name, settings=settings,
                                        url=site_url.domain, domain=domain, keywords=keywords)
              
        else:
            task.status = 2
            task.save()

settings.py

CELERY_BROKER_URL = 'amqp://localhost'
CELERY_RESULT_BACKEND = 'redis://localhost:6379'
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TASK_SERIALIZER = 'json'
CELERYBEAT_SCHEDULE = {
'crawl_sites': {
    'task': 'crawler.tasks.schedule_task',
    'schedule': crontab(minute='*/20'),
},

}

2
  • how do you run it ? pass -B parameter to the celery beat Commented Jul 21, 2020 at 17:02
  • start worker and beat in one call, set log level to debug and since you use django, try celery -A myproj worker --beat --loglevel=debug --scheduler django_celery_beat.schedulers:DatabaseScheduler Commented Jul 22, 2020 at 20:08

1 Answer 1

3

Replace the line

'schedule': crontab(minute=20),

to

'schedule': crontab(minute='*/20'),

Also restart celery beat.

Sign up to request clarification or add additional context in comments.

2 Comments

didn't worked. Is there any other things that might be missing ?
I need to restart the celery beat every time and then only the celery worker receives the task and executes. Why is this happening?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.