How to extract specific string from filename?

Question

How to extract specific string from file name using just "one line" of code? I can do it in two lines (if we consider only lines with extracted and extracted2) but can't figure out if it is possible to do it in one line?

I would like to extract "this" from filename text__text_numberandtext_text_text_this.xlsx

Here is the "2 line" code I have at the moment:

s = "text__text_numberandtext_text_text_this.xlsx"
extracted = '_'.join(s.split('_')[6:7])
extracted2 = '.'.join(extracted.split('.')[:1])
print(extracted2)

Maroun · Accepted Answer · 2020-09-22 07:26:40Z

5

You can use regex and do something like:

>>> s
'text__text_numberandtext_text_text_this.xlsx'
>>> re.search('.*_(\w+)\.xlsx', s).group(1)
'this'

In the regex above we capture the last word characters after the "_" and before the ".xlsx" extension.

Don't look for "one line" of code. Think about the cleanest solution instead.

answered Sep 22, 2020 at 7:26

Maroun

96.3k30 gold badges195 silver badges249 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Dustin Over a year ago

"Don't look for "one line" of code. Think about the cleanest solution instead." +1. Pythonic does not imply one line of code does all the work. Remember the pythonic mantra: "Simple is better than complex, flat is better than nested". @OP Your code is readable and everyone understands it. EDIT: It is called the Zen of Python, not mantra

Wups Over a year ago

If you count the added code from importing the re module, this is waaay more than one line.

buran · Accepted Answer · 2020-09-22 07:50:48Z

2

spam = "text__text_numberandtext_text_text_this.xlsx"
eggs = spam.split('_')[-1].split('.')[0]
print(eggs)

output

this

EDIT: it's interesting to benchmark the 3 alternatives.

from timeit import timeit

print(timeit("s.split('_')[-1].split('.')[0]", setup="s='text__text_numberandtext_text_text_this.xlsx'"))
print(timeit("re.search('.*_(\w+)\.xlsx', s).group(1)", setup="import re; s='text__text_numberandtext_text_text_this.xlsx'"))
print(timeit("s[s.rfind('_')+1:s.rfind('.')]", setup="s='text__text_numberandtext_text_text_this.xlsx'"))

output:

0.8729359760000079
2.0453107610010193
0.6893644140000106

edited Sep 22, 2020 at 7:50

answered Sep 22, 2020 at 7:25

buran

14.4k13 gold badges45 silver badges76 bronze badges

1 Comment

C.J1990 Over a year ago

Although all three solutions give desired results, I am accepting this as an answer as personally I find it easiest to understand. Thanks to all!

Jacob · Accepted Answer · 2020-09-22 07:26:18Z

2

s[s.rfind('_')+1:s.rfind('.')]

output:

'this'

It's not what your code does, but if I'm understanding correctly, it's what your description is asking for. This only works if you know the text you're looking for is immediately between the last underscore and the last period.

answered Sep 22, 2020 at 7:26

Jacob

5682 silver badges8 bronze badges

Comments

Tomerikoo · Accepted Answer · 2020-09-29 12:41:20Z

Just to add another perspective: because in the end you are dealing with a path (or filename for that matter), a good idea can be to use pathlib. Using the stem attribute you easily get the name without the extension, and then just need to take the last _ part using rsplit:

from pathlib import Path

s = Path("text__text_numberandtext_text_text_this.xlsx")
print(s.stem.rsplit('_', 1)[-1])

A short explanation on that rsplit:

It works the same as split, only if you give it a maxsplit argument, it splits from the right end.
Because we are only interested in the last part, we use maxsplit=1.
rsplit returns a list, in this case with just 2 elements. We then take the last element with [-1].

A somewhat more efficient version is using rpartition instead of rsplit:

s.stem.rpartition('_')[-1]

Collectives™ on Stack Overflow

How to extract specific string from filename?

4 Answers 4

2 Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related