I'm writing a python program which needs data from the internet. So I wrote some Scrapy spiders which are going on several pages and scrape the data. After that they are storing the data in an Excel file that's kind of my database. For that I wrote an own class which is handling the datas inside the excel file the way I need it. So that works. Now to my question:
I want the spiders to start from another python script. I found some code to be able to do it. But I also need to import all the settings from the Scrapy project and the pipelines, items etc. as well. I can't use the
get_project_settings()
because the script is in another directory (the Scrapy project folder is in the same directory as the script I want it to start from) : That's what I got so far:
from scrapy.crawler import CrawlerProcess
from desktop.Project.bots.question.spider import spider_test
process = CrawlerProcess(settings={'Here I need to import the settings file from the spiders Project' })
process.crawl(spider_test)
process.start()
The spider runs but I need my settings. It works completely fine when I put that script in the same project folder as my settings are and use the following code:
from scrapy.crawler import CrawlerProcess
from desktop.question.spider import spider_test
process = CrawlerProcess(get_project_settings())
process.crawl(spider_test)
process.start()
I also do not want to rewrite all the settings from the settings file as a dict and implement it manually like this:
process = CrawlerProcess(settings={
"FEEDS": {
"items.json": {"format": "json"},
},
})
The last code is just an Example from the Scrapy docs obviously I don't need no Exporter. I already tried to just import the settings file I need and set it as the settings parameter but the parameter settings needs a python dictionary type.
process = CrawlerProcess(settings={})
I really hope somebody can help me with some explanation how to solve the problem.
custom_settingsinside each spidersubprocess.