Python re.search re.sub string

Question

I have a file name which always ends on a number proceeding with its file extension such as:

filename = 'photo_v_01_20415.jpg'

From its filename I need to extract the file_extension and the last number that sits right before the file extension itslelf. As a result of split I should have two strings:

original_string = 'photo_v_01_20415.jpg'

string_result_01 = `photo_v_01_`  (first half of the file name)

string_result_02 = `20415.jpg`    (second half of the file name).

The problem is that the incoming filenames will be inconsistent. The last number could be separated from its file_name by underscore "_", by empty space " ", by period "." or anything else. Example of possible file names:

photo_v_01_20415.jpg
photo_v_01.20415.jpg
photo_v_01 20415.jpg
photo_v_01____20415.jpg

It appears I need to be using re. expressions with re.search or re.sub. I would appreciate for any suggestions!

Antti Haapala · Accepted Answer · 2013-09-25 21:16:25Z

3

Use re.match instead of re.search to match all of the string to the pattern. Thus

import re

def split_name(filename):
    match = re.match(r'(.*?)(\d+\.[^.]+)', filename)
    if match:
        return match.groups()
    else:
        return None, None

for name in [ 'foo123.jpg', 'bar;)234.png', 'baz^_^456.JPEG', 'notanumber.bmp' ]:
    prefix, suffix = split_name(name)
    print("prefix = %r, suffix = %r" % (prefix, suffix))

Prints:

prefix = 'foo', suffix = '123.jpg'
prefix = 'bar;)', suffix = '234.png'
prefix = 'baz^_^', suffix = '456.JPEG'
prefix = None, suffix = None

Works for arbitrary suffixes; if the filename does not match the pattern, then the match fails, and None, None is returned.

edited Sep 25, 2013 at 21:16

answered Sep 25, 2013 at 20:59

Antti Haapala

135k23 gold badges298 silver badges349 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

unutbu · Accepted Answer · 2013-09-25 21:16:21Z

3

import re

names = '''\
photo_v_01_20415.jpg
photo_v_01.20415.jpg
photo_v_01 20415.jpg
photo_v_01____20415.jpg'''.splitlines()

for name in names:
    prefix, suffix = re.match(r'(.+?[_. ])(\d+\.[^.]+)$', name).groups()
    print('{} --> {}\t{}'.format(name, prefix, suffix))

yields

photo_v_01_20415.jpg --> photo_v_01_    20415.jpg
photo_v_01.20415.jpg --> photo_v_01.    20415.jpg
photo_v_01 20415.jpg --> photo_v_01     20415.jpg
photo_v_01____20415.jpg --> photo_v_01____  20415.jpg

The regex pattern r'(.+?[_. ])(\d+\.[^.]+)$' means

r'             define a raw string
(              with first group
     .+?           non-greedily match 1-or-more of any character
     [_. ]         followed by a literal underscore, period or space
)              end first group 
(              followed by second group
     \d+           1-or-more digits in [0-9]
     \.            literal period
     [^.]+         1-or-more of anything but a period
)              end second group 
$              match the end of the string
'              end raw string

edited Sep 25, 2013 at 21:16

answered Sep 25, 2013 at 20:56

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

1 Comment

unutbu Over a year ago

I've corrected my answer using parts of Antti Haapala solution; apologies to Antti Haapala, I just couldn't stand my answer being wrong. I'll leave my answer up mainly because it explains what the regex means.

Michael Jones · Accepted Answer · 2013-09-25 21:03:49Z

0

import re

matcher = re.compile('(.*[._ ])(\d+.jpg)')
result = matcher.match(filename)

Add other options to the [._ ] as necessary.

answered Sep 25, 2013 at 21:03

Michael Jones

362 bronze badges

1 Comment

alphanumeric Over a year ago

This solution works very well: prefix, suffix = re.search(r'(.+?[_. ])(\d+.jpg)$', seq_name).groups() But the file extension will not be always 'jpg'. How could I tweak this expression to make it valid for any other than jpg file formats?

Collectives™ on Stack Overflow

Python re.search re.sub string

3 Answers 3

Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related