1

My string looks like this:

bo_1
bo_1
bo_2
bo_2
bo_3
bo_3
bo_4
bo_4
bo_5
bo_5
bo_6
bo_6
bo_7
bo_7
bo_8
bo_8
bo_9
bo_9
bo_10
bo_10

I want to match the first instance of each digit and ignore the next duplicate line. My regex is as follows:

(bo_\d)(?![\s\S]*\1)

which returns the following:

'bo_2'
'bo_3'
'bo_4'
'bo_5'
'bo_6'
'bo_7'
'bo_8'
'bo_9'
'bo_1'

How would I modify the regex to return a result like this instead (to include 'bo_1' at the start and 'bo_10' at the end):

'bo_1'
'bo_2'
'bo_3'
'bo_4'
'bo_5'
'bo_6'
'bo_7'
'bo_8'
'bo_9'
'bo_10'

2 Answers 2

1

Technically you don't need regex for that (you can use set() for instance):

>>> # Assume your string is in the variable called "text"
>>> result = set(text.split('\n'))
>>> result
{'bo_7', 'bo_3', 'bo_1', 'bo_6', 'bo_5', 'bo_8', 'bo_9', 'bo_2', 'bo_4', 'bo_10'}

Anyway, the issue with your regex is that bo_1 is also matching bo_10, so it will be seen as a duplicate by the regex. You can solve it using word boundaries to ensure that the full 'word' is tested for a match:

\b(bo_\d+)\b(?![\s\S]*\b\1\b)

regex101 demo

Sign up to request clarification or add additional context in comments.

Comments

0

Use

(bo_\d+$)(?![\s\S]*^\1$)

Since you want to include bo_10, you should use \d+ and not just \d in the initial group. Then, in your negative lookahead, put the backrefrence between start-of-line and end-of-line anchors, so that, for example, bo_1 does not get excluded because it's followed by a bo_10.

https://regex101.com/r/8khbcc/1

1 Comment

If the regex isn't important, you can add them all into a set and it will remove duplicates by nature. I typically wouldn't rely on regex to do your duplication prevention when there are a lot of other tools at your disposal, but that'd be outside the scope of the question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.