Celery eats up RAM.
We're using Celery in Django REST with Redis as the broker. Celery is used to send and retry sending callbacks, if unsuccessful (retry policy was to try sending callbacks with exponentially growing timeout between attempts, already removed).
Once around every 1m 40s the use of RAM increases by 48mb while there's spam of this in the logs for a few seconds:
celery_worker-1 | [2024-06-06 19:17:32,442: INFO/MainProcess] Task core.requests.celery_tasks.send_callback_task[6135c471-7f7e-471c-9f05-ba126418c002] received
celery_worker-1 | [2024-06-06 19:17:32,444: INFO/MainProcess] Task core.requests.celery_tasks.send_callback_task[6135c471-7f7e-471c-9f05-ba126418c002] received
celery_worker-1 | [2024-06-06 19:17:32,445: INFO/MainProcess] Task core.requests.celery_tasks.send_callback_task[6135c471-7f7e-471c-9f05-ba126418c002] received
... more 14 times
celery_worker-1 | [2024-06-06 19:17:32,468: INFO/MainProcess] Task core.requests.celery_tasks.send_callback_task[f4aaa14f-8bda-442a-8856-af40f1d68e6d] received
celery_worker-1 | [2024-06-06 19:17:32,469: INFO/MainProcess] Task core.requests.celery_tasks.send_callback_task[f4aaa14f-8bda-442a-8856-af40f1d68e6d] received
celery_worker-1 | [2024-06-06 19:17:32,471: INFO/MainProcess] Task core.requests.celery_tasks.send_callback_task[f4aaa14f-8bda-442a-8856-af40f1d68e6d] received
celery_worker-1 | [2024-06-06 19:17:32,472: INFO/MainProcess] Task core.requests.celery_tasks.send_callback_task[f4aaa14f-8bda-442a-8856-af40f1d68e6d] received
... more 55 times and many more with different IDs
After some time we even get this in the logs:
celery_worker-1 | [2024-06-07 17:32:56,200: INFO/MainProcess] Task core.requests.celery_tasks.send_callback_task[bb4b48c5-92d1-4226-aadf-bfdc387d4baf] received
celery_worker-1 | [2024-06-07 17:32:56,200: WARNING/MainProcess] QoS: Disabled: prefetch_count exceeds 65535
The new tasks get executed instantly with logs like this (with no warnings about prefetch_count):
celery_worker-1 | [2024-06-07 16:31:28,595: INFO/MainProcess] Task core.requests.celery_tasks.send_callback_task[7cdda9a2-ad10-434e-82c2-5232e64dc3b1] received
celery_worker-1 | [2024-06-07 16:31:28,736: INFO/ForkPoolWorker-72] core.requests.celery_tasks.send_callback_task[7cdda9a2-ad10-434e-82c2-5232e64dc3b1]: {'msg': 'Callback sent!'}
So I'm guessing these tasks are not executed, because no "sent" and no "failed" messages are present in the logs with these task IDs.
After the server reboot in the logs (22 is on a test server, maybe there's much more):
[2024-06-07 14:36:56,163: WARNING/MainProcess] Restoring 22 unacknowledged message(s)
[2024-06-07 14:41:37,392: INFO/MainProcess] Task core.requests.celery_tasks.send_callback_task[6b56c3b0-0828-46d6-86aa-34d0747ac30b] received
[2024-06-07 14:41:37,392: DEBUG/MainProcess] basic.qos: prefetch_count->9
[2024-06-07 14:41:37,394: INFO/MainProcess] Task core.requests.celery_tasks.send_callback_task[6b56c3b0-0828-46d6-86aa-34d0747ac30b] received
[2024-06-07 14:41:37,394: DEBUG/MainProcess] basic.qos: prefetch_count->10
...
In attempt to resolve this, we've tried
- removing retries for the
send_callback_task - adding a timeout for the POST request
Code:
@shared_task(bind=True)
def send_callback_task(self, url: 'str', data):
response = requests.post(url, json=data, timeout=5)
log_msg = {
"msg": "Callback sent!",
}
logger.info(msg=log_msg)
if response.status_code not in (200, 201, 202):
log_msg = {
"msg": "Callback failed!"
}
logger.info(msg=log_msg)
raise RequestException
New tasks are triggered using apply_async.
Running celery worker like this:
celery -A project worker -l info
16-core machine.
CELERY_WORKER_MAX_TASKS_PER_CHILD = 100
I am willing to provide more information if needed.