Scrapy Xpath returning null but working fine in Chrome

Question

I am quite new to Scrapy but am designing a web scrape to pull certain information from GoFundMe, specifically in this case the amount of people who have donated to a project. I have written an xpath statement which works fine in Chrome but returns null in Scrapy.

A random example project is https://www.gofundme.com/f/passage/donations, which at present has 22 donations. The below when entered in Chrome inspect gives me "Donations(22)" which is what I need -

//h2[@class="heading-5 mb0"]/text()

However in my Scrapy spider the following yields null -

class DonationsSpider(scrapy.Spider):
name = 'get_donations'

start_urls = [
    'https://www.gofundme.com/f/passage/donations'
]

def parse(self, response):
    amount_of_donations = response.xpath('//h2[@class="heading-5 mb0"]/text()').extract_first()

    yield{
        'Donations': amount_of_donations
    }

Does anyone know why Scrapy is unable to see this value?

I am doing this in an attempt to find out how many times the rest of the spider needs to loop, as when I hard code this value it works with no problems and yields all of the donations.

first: Scrapy can't run JavaScript and this page may use JavaScript to create elements. Second: server may check header "User-Agent" and send different content for scripts and bots - you should check what you get from server. You could use scrapy shell http://... to run interactive shell with your url. — furas
– furas, Commented Feb 12, 2020 at 19:46

Yash Pokar · Accepted Answer · 2020-02-14 14:12:16Z

1

Well because there are many requests going on the fulfil the request "https://www.gofundme.com/f/passage/donations". Where

your chrome is smart enough to under stand javascript, using that smartness it reads the JavaScript code and fetches all the responses from different different endpoints to fulfil your request

there's one request to the endpoint "https://gateway.gofundme.com/web-gateway/v1/feed/passage/counts" which loads the data you're looking for. which your python script can't do and also it's not recommend.

Instead you can call directly to that api and you'll get the data, good news is that endpoint responds JSON data which is very structured, easy to parse.

and I'm sure you're also looking for the data which is coming from this endpoint "https://gateway.gofundme.com/web-gateway/v1/feed/passage/donations?limit=20&offset=0&sort=recent"

for more information you may refer to one of my blog by clicking here

answered Feb 14, 2020 at 14:12

Yash Pokar

5,5511 gold badge15 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Andrew Bruce Over a year ago

Hi Yash, thank you for your reply. You're exactly right, however the maximum responses at one time seem to be 100. For this reason I have code that loops with this URL altering the offset by 100 each time and parsing the 100 responses into a file, which works fine if I manually set the number of loops, however, I don't know how to stop the loop when it's done. My idea was to find how many donations have been made, then base the number of loops on that, hence the question. If you know a better way to do this your help would be much appreciated!

Yash Pokar Over a year ago

@AndrewBruce if you'll ping gateway.gofundme.com/web-gateway/v1/feed/passage/counts it will reply with total_donations, now simply you can read those and loop till those counts achieved.

Moses Schwartz Over a year ago

@YashPokar Any Idea which call would respond with the total raised?

Collectives™ on Stack Overflow

Scrapy Xpath returning null but working fine in Chrome

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related