0

Hello I am trying to build a simple crawler from scrapy.

The code works fine in scrapy shell but when I run it through console it doesn't write anything to json file.

I am running it from project top directory as

scrapy crawl filemare -o filemare.json


import scrapy


class FilemareSpider(scrapy.Spider):
    name = "filemare"
    allowed_domains = ['https://filemare.com/']
    start_urls = ["https://filemare.com/en-
                   us/search/firmware%20download/632913359"]

    def parse(self, response):
        items = response.xpath('//div[@class="f"]/text()').extract()
        #items = response.css('div.f::text').extract()

        for url in items:
            print(url)
            yield url

1 Answer 1

1

parse method has to return a dict, Scrapy Item or a Request object (see the documentation). In your case, you yield a string. If you run the spider, you'll see an error in the output.

Change the corresponding part of the code like this:

...
def parse(self, response):
    items = response.xpath('//div[@class="f"]/text()').extract()

    for url in items:
        print(url)
        yield {'url': url}
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks that was useful. But my crawler was getting disabled because of robots.txt file. Need to change the setting file to ROBOTSTXT_OBEY = False
Really? When I tried your code myself, crawling proceeded and yielding was the only problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.