7

Is there any API for Node.js to get and query html from URLs and static html?

I like to do something like this to use with webscrape:

Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");

I have a look at this Question and looked most of those APIs, but I haven't found (perhaps I couldn't identify) anything so similar.

1 Answer 1

7

Jsdom is probably what you want https://github.com/tmpvar/jsdom You can use it in combination with jquery to query the dom. Here's an example on how I've been using it on one of my projects https://github.com/gabesoft/seryth/blob/master/lib/sanitizer.js You'll probably also need request to get the html from urls https://github.com/request/request

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.