I have a scrapy project and I want to run my spider every day so I use celery to do that. this is my tasks.py file:
from celery import Celery, shared_task
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from scrapy_project.scrapy_project.spiders import myspider
app = Celery('tasks', broker='redis://localhost:6379/0')
@shared_task
def scrape_news_website():
print('SCRAPING RIHGT NOW!')
setting = get_project_settings()
process = CrawlerProcess(get_project_settings())
process.crawl(myspider)
process.start(stop_after_crawl=False)
I've set stop_after_crawl=False because when it is True then after the first scrape I get this error:
raise error.ReactorNotRestartable()
twisted.internet.error.ReactorNotRestartable
now with setting stop_after_crawl to False another problem shows up and the problem is that after four(it is four because concurrency is four) times of scraping celery worker doesn't work anymore and it doesn't do tasks because previous crawlprocesses are still running so there is no free worker child process. I don't know how to fix it. I would appreciate your help.