1

Conceptually simple question/idea.

Using Scrapy, how to I use use LinkExtractor that extracts on only follows links with a given CSS?

Seems trivial and like it should already be built in, but I don't see it? Is it?

It looks like I can use an XPath, but I'd prefer using CSS selectors. It seems like they are not supported?

Do I have to write a custom LinkExtractor to use CSS selectors?

1 Answer 1

1

From what I understand, you want something similar to restrict_xpaths, but provide a CSS selector instead of an XPath expression.

This is actually a built-in feature in Scrapy 1.0 (currently in a release candidate state), the argument is called restrict_css:

restrict_css

a CSS selector (or list of selectors) which defines regions inside the response where links should be extracted from. Has the same behaviour as restrict_xpaths.

The initial feature request:

Sign up to request clarification or add additional context in comments.

2 Comments

That's very good news! Thanks for the info. Also, any idea if 1.0 will support Python 3? I know this might be not be possible yet because of the twisted dependency, but still curious. Would love to have this available in Python 3.
@lostdorje yeah, from what I know Scrapy devs are working on the Python 3 support but twisted is far from being there, see rawgit.com/mythmon/twisted-py3-graph/master/index.html. See also: github.com/scrapy/scrapy/issues/263.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.