0

If I have the following string:

< asd="testJava"><a href="/title/text/">BLA BLA <asddead>

How can I get only the string BLA BLA.

I tried split but it removes me all the chars, I need to remove only those from ">" to "<". Once I get the string, I'm gonna add it to an ArrayList with array.add(); Can someone help me with the code that removes the strings? Thank you!

3
  • Is it HTML? Is it some other XML? Commented Jun 3, 2014 at 19:23
  • Im gonna use that in Java. I need to remove the html code and conserve only the string. Commented Jun 3, 2014 at 19:24
  • Don't use regex to parse HTML. Use an HTML parser. Commented Jun 3, 2014 at 19:25

2 Answers 2

2

Use regex to replace everything between < and > by nothing:

String newText = oldText.replaceAll("<[^>]*>", "").trim();

2 more notes:

  1. This wouldn't work on something like <a href="foo>com">BLA BLA</a>, since regex would match the > in foo>com and not the corrent one. In such case, I would reccomend a proper HTML / XML parser.

  2. add .trim() to erase any whitespaces before / after your text. Without it, <img> <br> BLA BLA would not resolve into 'BLA BLA', but ' BLA BLA'

Sign up to request clarification or add additional context in comments.

Comments

1

Ignoring the implications of expanding this solution to a full HTML parser... you could use replaceAll with a regex.

str = str.replaceAll("<[^>]*>","");

should replace all the html with nothing, leaving just your labelof BLABLA

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.