0

I have the existing regex:

self.metadata = re.sub('(?<=<file_name>)\d{4,}','', self.metadata)

This will do the following conversion:

<file_name>1232434_FILE.mov --> <file_name>FILE.mov

However, I do not want it to strip a filename if there is a . (period) directly after it.

So, the result should be:

<file_name>1232434_FILE.mov --> <file_name>FILE.mov
<file_name>123445.mov --> <file_name>123445.mov

What would be the new correct regular expression to use?

0

2 Answers 2

2

You should add a lookahead :

self.metadata = re.sub('(?<=<file_name>)\d{4,}(?![.\d])','', self.metadata)
<file_name>1232434_FILE.mov => <file_name>_FILE.mov
<file_name>1232434FILE.mov  => <file_name>FILE.mov
<file_name>123445.mov       => <file_name>123445.mov

Regular expression visualization

Debuggex Demo

Sign up to request clarification or add additional context in comments.

1 Comment

Lol, I guess a negative lookahead makes more sense than a lookahead with a negatiave character class +1
0

Add a lookahead to assert that your file name is followed by a non-digit, non . character. Note, we can't just look for a non . character, because that could be your 5th digit (since we match 4+ digits).

self.metadata = re.sub('(?<=<file_name>)\d{4,}(?=[^\d.])','', self.metadata)

RegEx

(?<=<file_name>)\d{4,}(?=[^\d.])

Note, this will replace <file_name>1232434_FILE.mov to <file_name>_FILE.mov but your current solution does that as well..

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.