0

I have a directory with a large number of files. The file names are similar to the following: the(number)one(number), where (number) can be any number. There are also files with the name: the(number), where (number) can be any number. I was wondering how I can count the number of files with the additional "one(number)" at the end of their file name.

Let's say I have the list of file names, I was thinking of doing

for n in list:
    if n.startswith(the(number)one):
        add one to a counter

Is there anyway for it to accept any number in the (number) space when doing a startswith?

Example: the34one5 the37one2 the444one3 the87one8 the34 the32

This should return 4.

3 Answers 3

8

Use a regex matching 'one\d+' using the re module.

import re
for n in list:
    if re.search(r"one\d+", n):
        add one to a counter

If you want to make it very accurate, you can even do:

for n in list:
    if re.search(r"^the\d+one\d+$", n):
        add one to a counter

Which will even take care of any possible non digit chars between "the" and "one" and won't allow anything else before 'the' and after the last digit'.

You should start learning regexp now:

  • they let you make some complex text analysis in a blink that would be hard to code manually
  • they work almost the same from one language to another, making you more flexible
  • if you encounter some code using them, you will be puzzled if you didn't cause it's not something you can guess
  • the sooner you know them, the sooner you'll learn when NOT (hint) to use them. Which is eventually as important as knowing them.
Sign up to request clarification or add additional context in comments.

2 Comments

I think the regex should be "one\d+$" because the OP specified that he want to match "one(number)" at the end of their file name or maybe a complete regex "the\d+one\d+" and using match() instead of search().
You are right, I added a second example with a more accurate matching.
0

The easiest way to do this probably is glob.glob():

number = len(glob.glob("/path/to/files/the*one*"))

Note that * here will match any string, not just numbers.

4 Comments

This is clever, but it will fail if any char that is not a number is between "the" and "one"
glob.glob() will match files in the current working directory, though. You porbably mean fnmatch.fnmatch().
@Thomas: No, I don't mean fnmatch.fnmatch(), because glob.glob() is much easier to use here. Thanks for pointing out the issue with the directory!
@e-satis: I'm puzzled how your answer will be any different in case there is a non-digit character between the and one.
0

The same as a one-liner and also answering the question as it should match 'the' as well:

import re
count = len([name for name in list if re.match('the\d+one', name)])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.