3

Say I have a string 'ad>ad>ad>>ad' and I want to split on this on the '>' (not the '>>' chars). Just picked up regex and was wondering if there is a way (special character) to split on a specific part of the matched expression, rather than splitting on the whole matched expression, for example the regex could be:

re.split('[^>]>[^>]', 'ad>ad>ad>>ad')

Can you get it to split on the char in parenthesis [^>](>)[^>] ?

2 Answers 2

2

You need to use lookarounds:

re.split(r'(?<!>)>(?!>)', 'ad>ad>ad>>ad')

See the regex demo

The (?<!>)>(?!>) pattern only matches a > that is not preceded with a < (due to the negative lookbehind (?<!>)) and that is not followed with a < (due to the negative lookahead (?!>)).

Since lookarounds do not consume the characters (unlike negated (and positive) character classes, like [^>]), we only match and split on a < symbol without "touching" the symbols around it.

Sign up to request clarification or add additional context in comments.

Comments

1

Try with \b>\b

This will check for single > surrounded by non-whitespace characters. As the string in the question is continuous stream of characters checking word boundary with \b is simplest method.

Regex101 Demo

4 Comments

What if a single > is enclosed with spaces, or there is just a space on the right or left? Then, it will not work.
@WiktorStribiżew: That's not in OP's specification. If that is the case then OP should specify all such cases.
I marked the other as correct as it reflects the exact match I was after (but without consumption, which is new to me..)
@dpdenton: Whatever suits your case :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.