0

I am trying to extract the text and links from an html file. At the moment i can extract both easily using JSoup but i can only do it seperately.

Here is my code:

try {
          doc = (Document) Jsoup.parse(new File(input), "UTF-8");
          Elements paragraphs = ((Element) doc).select("td.text");

          for(Element p : paragraphs){
           // System.out.println(p.text()+ "\r\n" + "***********************************************************" + "\r\n");
            getGui().setTextVers(p.text()+ "\r\n" + "***********************************************************" + "\r\n");

          }
          Elements links = doc.getElementsByTag("a");
          for (Element link : links) {
            String linkHref = link.attr("href");
            String linkText = link.text();
            getGui().setTextVers("\n\n"+link.text() + ">\r\n" +linkHref + "\r\n");
          }
}

I have placed a .text class on the outer most td where there is text. what i would like to achieve is: When the program finds a td with the .text class it checks it for any links and extracts them from that section in order. So you would have:

Text

Link

Text

Link

I tried putting an inner for each loop into the first foreach loop but this only printed the full list of links for the page, can anyone help?

1 Answer 1

1

Try

Document doc = (Document) Jsoup.parse(new File(input), "UTF-8");
Elements paragraphs = ((Element) doc).select("td.text");

for (Element p : paragraphs) {
    System.out.println(p.text());
    Elements links =  p.getElementsByTag("a");
    for (Element link : links) {
        String linkHref = link.attr("href");
        String linkText = link.text();
        System.out.println("\n\n" + linkText + ">\r\n" + linkHref + "\r\n");
    }
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.