2

Have the following string:

text TOKEN text TOKEN text <elem attr="text TOKEN text">text TOKEN text TOKEN</elem> text TOKEN text

Need a RegEx that will give me all TOKEN but NOT in any part of any elem element

EDIT: I do have the RegEx to gather all elem elements.

<elem(.*?)>(.*?)</elem>

I cannot figure out how to exclude these from the RegEx to find TOKEN

6
  • What flavour of regex are you using? Commented Apr 16, 2013 at 16:45
  • Not sure what you mean by "flavour" but this will be used in Java Commented Apr 16, 2013 at 16:50
  • 2
    Don't use Regex to handle HTML parsing. Commented Apr 16, 2013 at 16:57
  • This is not pure HTML. This is text that happens to include some XML elements. I need a RegEx to exclude these from the search for TOKEN Commented Apr 16, 2013 at 17:04
  • You should include more details, maybe actual searched TOKEN. It's not very clear what you ask here. Commented Apr 16, 2013 at 17:09

1 Answer 1

2

The following will not match "TOKEN" when "</elem>" appears before "<elem" ahead in the string.

"TOKEN(?!(?:(?!<elem).)*</elem>)"

If the string may contain newline characters, add the Pattern.DOTALL flag.

Further explanation on request.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.