0

I need to write a scraper in Java + Groovy..

I was wondering if something able to parse HTML documents and select the informations I need through simple CSS selectors (instead that going through the whole document tree and manually select what I need) exists? Something like Nokogiri for Ruby, just to give you the idea of what I need..

thanks in advance!

4
  • My first thought: Finally, someone who didn't ask this question in relation to regular expressions.;) Of course, this has been covered in detail. Commented Nov 15, 2010 at 22:40
  • possible duplicate of Options for HTML scraping? Commented Nov 15, 2010 at 22:40
  • I've been using C# for scraping. I've written a jQuery port, but I don't dare post it here for fear of being down-voted into oblivion due to self-promotion. Commented Nov 17, 2010 at 5:13
  • so what if you get marked down. I would be interested to see it and I wouldn't be the only one. Commented Nov 18, 2010 at 4:55

3 Answers 3

1

I do something like this by loading a page with Qt Webkit and including JQuery.

It's a hack but works well for my use case. I needed a solution that requires no configuration - just sudo apt-get install libqt4-webkit and you're ready to go.

Sign up to request clarification or add additional context in comments.

Comments

0

If you can be backed by a browser (as in use a browser to render and create the pages), selenium would be perfect. this would have the added benefit of having full support for Ajax websites.

If not, something like webdriver would probably work.

I've only used Selenium.

Comments

0

I use Selenium RC + jQuery for screen scraping.

Example code: HERE

While I use PHP as the client, but you can implement it using any language you like (as long as it can talk to Selenium RC).

I have tried several CSS selector libraries before, but honestly, the best parser is your browser, Selenium RC approach is not fast but superb reliable.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.