2

I am trying to extract file name from file pointer without extension. My file name is as follows:

this site:time.list,this.list,this site:time_sec.list, that site:time_sec.list and so on. Here required file name always precedes either whitespace or dot.

Currently I am doing this to get file from file name preceding white space and dot in file name.

search_term = os.path.basename(f.name).split(" ")[0]

and

search_term = os.path.basename(f.name).split(".")[0]

Expected file name output: this, this, this, that.

How can i combine above two into one liner kind and pythonic way?

Thanks in advance.

9
  • Why are you splitting on a dot? Commented Jan 15, 2018 at 5:21
  • Case #2, perhaps, re module is probably a better bet here. Commented Jan 15, 2018 at 5:25
  • @cᴏʟᴅsᴘᴇᴇᴅ, to split the file-name string on dot. Commented Jan 15, 2018 at 5:26
  • But if you're already splitting on space, then what more do you need? Commented Jan 15, 2018 at 5:27
  • 1
    re.split('[ .]', os.path.basename(f.name))[0] ??? Commented Jan 15, 2018 at 5:30

3 Answers 3

2

using regex as below, [ .] will split either on a space or a dot char

re.split('[ .]', os.path.basename(f.name))[0]
Sign up to request clarification or add additional context in comments.

Comments

1

If you split on one and splitting on the other still returns something smaller, that's the one you want. If not, what you get is what you got from the first split. You don't need regex for this.

search_term = os.path.basename(f.name).split(" ")[0].split(".")[0]

Comments

0

Use regex to get the first word at the beginning of the string:

import re

re.match(r"\w+", "this site:time_sec.list").group()
# 'this'

re.match(r"\w+", "this site:time.list").group()
# 'this'

re.match(r"\w+", "that site:time_sec.list").group()
# 'that'

re.match(r"\w+", "this.list").group()
# 'this'

try this:

pattern = re.compile(r"\w+")
pattern.match(os.path.basename(f.name)).group()

Make sure your filenames don't have whitespace inside when you rely on the assumption that a whitespace separates what you want to extract from the rest. It's much more likely to get unexpected results you didn't think up in advance if you rely on implicit rules like that instead of actually looking at the strings you want to extract and tailor explicit expressions to fit the content.

8 Comments

This will split on any character which isn't \w so semicolon, comma, dash, etc. Maybe specifically extract r'^[^ .]+' to avoid this.
@tripleee Isn't this exactly what he wants? Just the filename?
The example is probably not representative of any real-world data. If this or that could contain a dash, for example, you are extracting the wrong thing.
@tripleee I understand now what you mean, yes he would write something like pattern = re.compile(r"[\w\\]+") to catch dashes. I think it's general the better approach to be explicit about what you search if possible.
Huh? No, being explicit about what you search means pattern = re.compile(r'^[^ .]+') like I wrote in a previous comment
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.