Extracting links with scrapy that have a specific css class

Question

Conceptually simple question/idea.

Using Scrapy, how to I use use LinkExtractor that extracts on only follows links with a given CSS?

Seems trivial and like it should already be built in, but I don't see it? Is it?

It looks like I can use an XPath, but I'd prefer using CSS selectors. It seems like they are not supported?

Do I have to write a custom LinkExtractor to use CSS selectors?

alecxe · Accepted Answer · 2015-06-17 14:41:23Z

1

From what I understand, you want something similar to restrict_xpaths, but provide a CSS selector instead of an XPath expression.

This is actually a built-in feature in Scrapy 1.0 (currently in a release candidate state), the argument is called restrict_css:

restrict_css

a CSS selector (or list of selectors) which defines regions inside the response where links should be extracted from. Has the same behaviour as restrict_xpaths.

The initial feature request:

CSS support in link extractors

edited Jun 17, 2015 at 14:41

answered Jun 17, 2015 at 14:35

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

lostdorje Over a year ago

That's very good news! Thanks for the info. Also, any idea if 1.0 will support Python 3? I know this might be not be possible yet because of the twisted dependency, but still curious. Would love to have this available in Python 3.

alecxe Over a year ago

@lostdorje yeah, from what I know Scrapy devs are working on the Python 3 support but twisted is far from being there, see rawgit.com/mythmon/twisted-py3-graph/master/index.html. See also: github.com/scrapy/scrapy/issues/263.

Collectives™ on Stack Overflow

Extracting links with scrapy that have a specific css class

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related