Scrapy Crawler working in shell but not in code

Question

Hello I am trying to build a simple crawler from scrapy.

The code works fine in scrapy shell but when I run it through console it doesn't write anything to json file.

I am running it from project top directory as

scrapy crawl filemare -o filemare.json


import scrapy


class FilemareSpider(scrapy.Spider):
    name = "filemare"
    allowed_domains = ['https://filemare.com/']
    start_urls = ["https://filemare.com/en-
                   us/search/firmware%20download/632913359"]

    def parse(self, response):
        items = response.xpath('//div[@class="f"]/text()').extract()
        #items = response.css('div.f::text').extract()

        for url in items:
            print(url)
            yield url

Tomáš Linhart · Accepted Answer · 2018-01-12 19:17:03Z

1

parse method has to return a dict, Scrapy Item or a Request object (see the documentation). In your case, you yield a string. If you run the spider, you'll see an error in the output.

Change the corresponding part of the code like this:

...
def parse(self, response):
    items = response.xpath('//div[@class="f"]/text()').extract()

    for url in items:
        print(url)
        yield {'url': url}

answered Jan 12, 2018 at 19:17

Tomáš Linhart

10.2k1 gold badge30 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

nitin pundir Over a year ago

Thanks that was useful. But my crawler was getting disabled because of robots.txt file. Need to change the setting file to ROBOTSTXT_OBEY = False

Tomáš Linhart Over a year ago

Really? When I tried your code myself, crawling proceeded and yielding was the only problem.

Collectives™ on Stack Overflow

Scrapy Crawler working in shell but not in code

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related