1

I am trying to parse html tags here using jsoup. I am new to jsoup. Basically I need to parse the tags and get the text inside those tags and apply the style mentioned in the class attribute.

I am creating a SpannableStringBuilder for that I can create substrings, apply styles and append them together with texts that have no styles.

String str = "There are <span class='newStyle'> two </span> workers from the <span class='oldStyle'>Front of House</span>";

SpannableStringBuilder text = new SpannableStringBuilder();
    if (value.contains("</span>")) {
        Document document = Jsoup.parse(value);
        Elements elements = document.getElementsByTag("span");
        if (elements != null) {
            int i = 0;
            int start = 0;
            for (Element ele : elements) {
                String styleName =  type + "." + ele.attr("class");
                text.append(ele.text());
                int style = context.getResources().getIdentifier(styleName, "style", context.getPackageName());
                text.setSpan(new TextAppearanceSpan(context, style), start, text.length(), Spannable.SPAN_EXCLUSIVE_EXCLUSIVE);
                text.append(ele.nextSibling().toString());
                start = text.length();
                i++;
            }
        }
        return text;
    }

I am not sure how I can parse the strings that are not between any tags such as the "There are" and "worker from the".

Need output such as:

- There are
- <span class='newStyle'> two </span>
- workers from the
- <span class='oldStyle'>Front of House</span>
2
  • See stackoverflow.com/questions/59594261/… Commented Jan 9, 2020 at 21:34
  • @KrystianG: thanks for that. From a node how can I get the text stripping the html , like the text "two" ? Commented Jan 10, 2020 at 0:04

1 Answer 1

1

Full answer: you can get the text outside of the tags by getting childNodes(). This way you obtain List<Node>. Note I'm selecting body because your HTML fragment doesn't have any parent element and parsing HTML fragment with jsoup adds <html> and <body> automatically.
If Node contains only text it's of type TextNode and you can get the content using toString().
Otherwise you can cast it to Element and get the text usingelement.text().

    String str = "There are <span class='newStyle'> two </span> workers from the <span class='oldStyle'>Front of House</span>";
    Document doc = Jsoup.parse(str);
    Element body = doc.selectFirst("body");
    List<Node> childNodes = body.childNodes();
    for (int i = 0; i < childNodes.size(); i++) {
        Node node = body.childNodes().get(i);
        if (node instanceof TextNode) {
            System.out.println(i + " -> " + node.toString());
        } else {
            Element element = (Element) node;
            System.out.println(i + " -> " + element.text());
        }
    }

output:

0 -> 
There are 
1 -> two
2 ->  workers from the 
3 -> Front of House

By the way: I don't know how to get rid of the first line break before There are.

Sign up to request clarification or add additional context in comments.

1 Comment

thanks, I used Parser.xmlParser to avoid the tags Jsoup adds.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.