extracting text from css node scrapy

Question

I'm trying to scrape a catalog id number from this page:

from scrapy.selector import Selector
from scrapy.http import HtmlResponse

url = 'http://www.enciclovida.mx/busquedas/resultados?utf8=%E2%9C%93&busqueda=basica&id=&nombre=astomiopsis+exserta&button='

response = HtmlResponse(url=url)

using the css selector (which works in R with rvest::html_nodes)

".result-nombre-container > h5:nth-child(2) > a:nth-child(1)"

I would like to retrieve the catalog id, which in this case should be:

I'm ok if it is done easier with the xpath

can you post complete code that you are using. May be I can help. — kartheek
– kartheek, Commented Aug 12, 2018 at 4:55

Kevin Kamonseki · Accepted Answer · 2018-08-12 06:05:12Z

1

I don't have scrapy here, but tested this xpath and it will get you the href:

//div[contains(@class, 'result-nombre-container')]/h5[2]/a/@href

If you're having too much trouble with scrapy and css selector syntax, I would also suggest trying out BeautifulSoup python package. With BeautifulSoup you can do things like

link.get('href')

answered Aug 12, 2018 at 6:05

Kevin Kamonseki

1411 silver badge6 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

gangabass · Accepted Answer · 2018-08-12 07:57:45Z

1

If you need to parse id from href:

catalog_id = response.xpath("//div[contains(@class, 'result-nombre-container')]/h5[2]/a/@href").re_first( r'(\d+)$' )

answered Aug 12, 2018 at 7:57

gangabass

10.7k2 gold badges26 silver badges36 bronze badges

Comments

Thomas Strub · Accepted Answer · 2018-08-13 07:47:24Z

0

There seems to be only one link in the h5 element. So in short:

response.css('h5 > a::attr(href)').re('(\d+)$')

answered Aug 13, 2018 at 7:47

Thomas Strub

1,2858 silver badges20 bronze badges

Collectives™ on Stack Overflow

extracting text from css node scrapy

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related