How to split a string on multiple pattern using pythonic way (one liner)?

Question

I am trying to extract file name from file pointer without extension. My file name is as follows:

this site:time.list,this.list,this site:time_sec.list, that site:time_sec.list and so on. Here required file name always precedes either whitespace or dot.

Currently I am doing this to get file from file name preceding white space and dot in file name.

search_term = os.path.basename(f.name).split(" ")[0]

and

search_term = os.path.basename(f.name).split(".")[0]

Expected file name output: this, this, this, that.

How can i combine above two into one liner kind and pythonic way?

Thanks in advance.

Case #2, perhaps, re module is probably a better bet here. — AChampion
– AChampion, Commented Jan 15, 2018 at 5:25
@cᴏʟᴅsᴘᴇᴇᴅ, to split the file-name string on dot. — Om Prakash
– Om Prakash, Commented Jan 15, 2018 at 5:26
But if you're already splitting on space, then what more do you need? — cs95
– cs95, Commented Jan 15, 2018 at 5:27

Skycc · Accepted Answer · 2018-01-15 06:33:15Z

2

using regex as below, [ .] will split either on a space or a dot char

re.split('[ .]', os.path.basename(f.name))[0]

answered Jan 15, 2018 at 6:33

Skycc

3,5551 gold badge15 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

tripleee · Accepted Answer · 2018-01-15 05:48:31Z

1

If you split on one and splitting on the other still returns something smaller, that's the one you want. If not, what you get is what you got from the first split. You don't need regex for this.

search_term = os.path.basename(f.name).split(" ")[0].split(".")[0]

answered Jan 15, 2018 at 5:48

tripleee

192k37 gold badges318 silver badges367 bronze badges

Comments

Darkonaut · Accepted Answer · 2018-01-15 15:41:35Z

0

Use regex to get the first word at the beginning of the string:

import re

re.match(r"\w+", "this site:time_sec.list").group()
# 'this'

re.match(r"\w+", "this site:time.list").group()
# 'this'

re.match(r"\w+", "that site:time_sec.list").group()
# 'that'

re.match(r"\w+", "this.list").group()
# 'this'

try this:

pattern = re.compile(r"\w+")
pattern.match(os.path.basename(f.name)).group()

Make sure your filenames don't have whitespace inside when you rely on the assumption that a whitespace separates what you want to extract from the rest. It's much more likely to get unexpected results you didn't think up in advance if you rely on implicit rules like that instead of actually looking at the strings you want to extract and tailor explicit expressions to fit the content.

edited Jan 15, 2018 at 15:41

answered Jan 15, 2018 at 5:34

Darkonaut

21.9k7 gold badges61 silver badges73 bronze badges

8 Comments

tripleee Over a year ago

This will split on any character which isn't \w so semicolon, comma, dash, etc. Maybe specifically extract r'^[^ .]+' to avoid this.

Darkonaut Over a year ago

@tripleee Isn't this exactly what he wants? Just the filename?

tripleee Over a year ago

The example is probably not representative of any real-world data. If this or that could contain a dash, for example, you are extracting the wrong thing.

Darkonaut Over a year ago

@tripleee I understand now what you mean, yes he would write something like pattern = re.compile(r"[\w\\]+") to catch dashes. I think it's general the better approach to be explicit about what you search if possible.

tripleee Over a year ago

Huh? No, being explicit about what you search means pattern = re.compile(r'^[^ .]+') like I wrote in a previous comment

|

Collectives™ on Stack Overflow

How to split a string on multiple pattern using pythonic way (one liner)?

3 Answers 3

Comments

Comments

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related