0

Hey I want to get tags from a html document.
That is everything that is contained within the angle brackets with the angle brackets inclusive. How can I do this in Java ? Thanks

1

2 Answers 2

3
<!-- Read carefully -->
<b><![CDATA[<Everything in angle brackets ("<>") is a tag?>]]></b>

... and use an html parser.


If you want to do it manually, iterate over the input chars and decide for each and every < and > whether it belongs to a tag element or not. There are some rules (processing instructions, comments, CDATA content, angle brackets in attribute values(!)) to follow.

Most parsers use some switch/case pattern for evaluating each token (char in your case).

Sign up to request clarification or add additional context in comments.

Comments

2

I used jsoup recently. Nice API, easy to use and no problems so far. Don 't even try to parse html yourself. See Andreas_D answer.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.