1

I'm trying to filter out strings in file names that appear in a for loop

if search == "List":
        onlyfiles = [f for f in listdir("path") if isfile(join("path", f))]
        for i in onlyfiles:
            print(i)

now it will output all the filenames, as expected and wanted, but I want to filter out the .json at the end of the file as well as a few other elements in the name of the file so that I can just see the file name.

For example: filename-IDENTIFIER.json I want to filter out "-IDENTIFIER.json" out from the for loop's output

Thanks for any help

3
  • What is the format of IDENTIFIER? Commented Jan 16, 2019 at 14:29
  • If it always has a dash, you can split the file name using the dash as a separator. Commented Jan 16, 2019 at 14:35
  • @meowgoesthedog it can be a number or letters or a combination Commented Jan 16, 2019 at 15:02

3 Answers 3

2

There are a few approaches here, based on how much your data can vary: So let's try to build a get_filename(f) method

Quick and dirty

If you know that f always ends in exactly the same way, then you can directly try to remove those characters. So here we have to remove the last 16 characters. It's useful to know that in Python, a string can be considered as an (immutable) array of characters, so you can use list indexing as well.

get_filename(f: str):
    return f[:-16]

This will however fail if the Identifier or suffix changes in length.

Varying lenghts

If the suffix changes based on the length, then you should split the string on a fixed delimiter and return the relevant part. In this case you want to split on -.

get_filename(f: str):
    return f.split("-")[0]

Note however that this will fail if the filename also contains a -. You can fix that by dropping the last part and rejoining all the earlier pieces, in the following way.

get_filename(f: str):
    return "-".join(f.split("-")[:-1])

Using regexes to match the format

The most general approach would be to use python regexes to select the relevant part. These allow you to very specifically target a specific pattern. The exact regex that you'll need will depend on the complexity of your strings.

Sign up to request clarification or add additional context in comments.

2 Comments

so using this technique the filename can only contain one "-", correct?
It did, but I've added an alternative way that can deal multiple "-".
0

Split the string on "-" and get the first element:

filename = f.split("-")[0]

This will get messed up case filename contains "-" though.

Comments

0

This should work:

i.split('-')[0].split('.')[0]

Case 1: filename-IDENTIFIER.json

It takes the substring before the dash, so output will become filename

Case 2: filename.json

There is no dash in the string, so the first split does nothing (full string will be in the 0th element), then it takes the substring before the point. Output will be filename

Case 3: filename

Nothing to split, output will be filename

If it's always .json and -IDENTIFIER, then it's safer to use this:

i.split('-IDENTIFIER')[0].split('.json')[0]

Case 4: filename-blabla.json

If the filename has an extra dash in it, it won't be a problem, output will be filename-blabla

1 Comment

filenames will always have the -IDENTIFIER, so I don't need the '.' split, but good to know in case.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.