I found the HTML Agility Pack useful and easy to use for screen scraping web sites. What's the equivalent library for HTML screen scraping in Java, Ruby, Python?
2 Answers
Found what I was looking for: Options for HTML scraping?
1 Comment
Fatih Enes
The link is broken, could you please share what did you find years ago?
BeautifulSoup is the standard Python screen scraping tool.
Recently, however, I used the (incomplete at the moment) pyQuery, which is more or less a rewrite of jQuery into python, and found it to be very useful.
1 Comment
emish
I would also suggest Scrapy for a robust infrastructure.