Python script using lxml, xpath returning null list

Question

I tried to scrape href links from an html tag using xpath with lxml. But the xpath is returning null list whereas it was tested separately and it seems to work.

The code is returning a null value whereas the xpath seems to work fine.

page = self.opener.open(link).read()
doc=html.fromstring(str(page))
ref = doc.xpath('//ul[@class="s-result-list s-col-1 s-col-ws-1 s-result-list-hgrid s-height-equalized s-list-view s-text-condensed s-item-container-height-auto"]/li/div/div[@class="a-fixed-left-grid"]/div/div[@class="a-fixed-left-grid-col a-col-left"]/div/div/a')
for post in ref:
    print(post.get("href"))

I'm using a proxy server, for accessing the links and it seems to work, as the "doc" variable is getting populated with the html content. I've checked the links and I'm on the proper page to fetch this xpath.

This is the link from which I'm trying to fetch data: https://www.amazon.com/s/ref=lp_266162_nr_n_0?fst=as%3Aoff&rh=n%3A283155%2Cn%3A%211000%2Cn%3A1%2Cn%3A173508%2Cn%3A266162%2Cn%3A3564986011&bbn=266162&ie=UTF8&qid=1550120216&rnid=266162

SIM · Accepted Answer · 2019-02-14 08:55:38Z

1

I suppose you are after the links within Books : Arts & Photography : Architecture : Buildings : Landmarks & Monuments. I used xpath within the script to fetch the links. Give it a go:

import requests
from lxml.html import fromstring

link = 'https://www.amazon.com/s/ref=lp_266162_nr_n_0?fst=as%3Aoff&rh=n%3A283155%2Cn%3A%211000%2Cn%3A1%2Cn%3A173508%2Cn%3A266162%2Cn%3A3564986011&bbn=266162&ie=UTF8&qid=1550120216&rnid=266162'
r = requests.get(link,headers={"User-Agent":"Mozilla/5.0"})
htmlcontent = fromstring(r.text)
itemlinks = htmlcontent.xpath('//*[@id="mainResults"]//*[contains(@class,"s-access-detail-page")]')
for link in itemlinks:
    print(link.get('href'))

If you wanted to go for css selector, then the following should work:

'#mainResults .s-access-detail-page'

answered Feb 14, 2019 at 8:55

SIM

22.5k6 gold badges45 silver badges116 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

anjaneyulubatta505 · Accepted Answer · 2019-02-14 05:46:47Z

1

Your xpath selector is invalid. try css selctor like below

import requests
import lxml, lxml.html

url = 'https://www.amazon.com/s/ref=lp_266162_nr_n_0?fst=as%3Aoff&rh=n%3A283155%2Cn%3A%211000%2Cn%3A1%2Cn%3A173508%2Cn%3A266162%2Cn%3A3564986011&bbn=266162&ie=UTF8&qid=1550120216&rnid=266162'
r = requests.get(url)
html = lxml.html.fromstring(r.content)
links = html.cssselect('.a-fixed-left-grid-col .a-col-left a')
for link in links:
    print(link.attrib['href'])

output

https://www.amazon.com/Top-500-Instant-Pot-Recipes/dp/1730885209
https://www.amazon.com/Monthly-Budget-Planner-Organizer-Notebook/dp/1978202865
https://www.amazon.com/Edge-Order-Daniel-Libeskind/dp/045149735X
https://www.amazon.com/Man-Glass-House-Johnson-Architect/dp/0316126438
https://www.amazon.com/Versailles-Private-Invitation-Guillaume-Picon/dp/2080203371
https://www.amazon.com/Palm-Springs-Modernist-Tim-Street-Porter/dp/0847861872
https://www.amazon.com/Building-Chicago-Architectural-John-Zukowsky/dp/0847848701
https://www.amazon.com/Taverns-American-Revolution-Adrian-Covert/dp/160887785X
https://www.amazon.com/TRAVEL-MOSAIC-Color-Number-Relaxation/dp/1717562221
https://www.amazon.com/Understanding-Cemetery-Symbols-Historic-Graveyards/dp/1547047216
https://www.amazon.com/Soviet-Bus-Stops-Christopher-Herwig/dp/099319110X
https://www.amazon.com/Famous-Movie-Scenes-Dot-Dot/dp/1977747043

pip requirements

certifi==2018.11.29
chardet==3.0.4
cssselect==1.0.3
idna==2.8
lxml==4.3.1
requests==2.21.0
urllib3==1.24.1

answered Feb 14, 2019 at 5:46

anjaneyulubatta505

11.9k1 gold badge61 silver badges68 bronze badges

4 Comments

Ajay Victor Over a year ago

But the xpath selector is giving results when I tested it with JS in the console page, please refer the attached image.

anjaneyulubatta505 Over a year ago

@AjayVictor I've tried it with JS selector also it didn't work as well. Try to refresh the page and try again.

Ajay Victor Over a year ago

still, the result is coming, I'm looking for an xpath result, if I'm not getting one, then will accept your answer. thanks for your efforts.

anjaneyulubatta505 Over a year ago

try to open the above link and inspect the element and then right click on it then choose selector then xpath then try it. I'm unable to get it it might be due to dynamic javascript.

Collectives™ on Stack Overflow

Python script using lxml, xpath returning null list

2 Answers 2

Comments

output

pip requirements

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

output

pip requirements

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related