I have the following method:
public Article buildArticle(SNSpecific specific, String urlToScrape) throws IOException {
Document page = Jsoup.connect(urlToScrape).timeout(10*1000).get();
Article a = new Article();
a.setWebsite("http://www.svensktnaringsliv.se/");
a.setUrl(urlToScrape);
a.setTitle(page.select(specific.getTitleSelector()).text());
a.setDiscoveryTime(page.select(specific.getDateAndTimeSelector()).text());
if(isPdfPage(urlToScrape))
{
Elements e = page.select("div.indepth-content > div.content > ul.indepth-list a");
a.setText(page.select("div.readmoreSummary").text() + "For full article: " +
e.first().attr("href"));
}else {
a.setText(page.select(specific.getContentSelector()).text());
}
return a;
}
The problem is that sometimes it cannot connect to the urlToScrape even I changed the timeout, and I dont want to wait too long for a page and thats why I am looking for an alternative solution except the timeout() method, what could be another approach to handle this problem?(I have about 200 pages to scrape).