How to ignore numbers when doing string.startswith() in python?

Question

I have a directory with a large number of files. The file names are similar to the following: the(number)one(number), where (number) can be any number. There are also files with the name: the(number), where (number) can be any number. I was wondering how I can count the number of files with the additional "one(number)" at the end of their file name.

Let's say I have the list of file names, I was thinking of doing

for n in list:
    if n.startswith(the(number)one):
        add one to a counter

Is there anyway for it to accept any number in the (number) space when doing a startswith?

Example: the34one5 the37one2 the444one3 the87one8 the34 the32

This should return 4.

Community · Accepted Answer · 2017-05-23 11:55:37Z

8

Use a regex matching 'one\d+' using the re module.

import re
for n in list:
    if re.search(r"one\d+", n):
        add one to a counter

If you want to make it very accurate, you can even do:

for n in list:
    if re.search(r"^the\d+one\d+$", n):
        add one to a counter

Which will even take care of any possible non digit chars between "the" and "one" and won't allow anything else before 'the' and after the last digit'.

You should start learning regexp now:

they let you make some complex text analysis in a blink that would be hard to code manually
they work almost the same from one language to another, making you more flexible
if you encounter some code using them, you will be puzzled if you didn't cause it's not something you can guess
the sooner you know them, the sooner you'll learn when NOT (hint) to use them. Which is eventually as important as knowing them.

edited May 23, 2017 at 11:55

CommunityBot

11 silver badge

answered Jun 10, 2011 at 15:50

Bite code

601k118 gold badges310 silver badges336 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

mouad Over a year ago

I think the regex should be "one\d+$" because the OP specified that he want to match "one(number)" at the end of their file name or maybe a complete regex "the\d+one\d+" and using match() instead of search().

Bite code Over a year ago

You are right, I added a second example with a more accurate matching.

Sven Marnach · Accepted Answer · 2011-06-10 15:49:17Z

0

The easiest way to do this probably is glob.glob():

number = len(glob.glob("/path/to/files/the*one*"))

Note that * here will match any string, not just numbers.

answered Jun 10, 2011 at 15:49

Sven Marnach

608k123 gold badges969 silver badges866 bronze badges

4 Comments

Bite code Over a year ago

This is clever, but it will fail if any char that is not a number is between "the" and "one"

Thomas Wouters Over a year ago

glob.glob() will match files in the current working directory, though. You porbably mean fnmatch.fnmatch().

Sven Marnach Over a year ago

@Thomas: No, I don't mean fnmatch.fnmatch(), because glob.glob() is much easier to use here. Thanks for pointing out the issue with the directory!

Sven Marnach Over a year ago

@e-satis: I'm puzzled how your answer will be any different in case there is a non-digit character between the and one.

badzil · Accepted Answer · 2011-06-10 15:59:46Z

0

The same as a one-liner and also answering the question as it should match 'the' as well:

import re
count = len([name for name in list if re.match('the\d+one', name)])

answered Jun 10, 2011 at 15:59

badzil

3,6204 gold badges23 silver badges29 bronze badges

Collectives™ on Stack Overflow

How to ignore numbers when doing string.startswith() in python?

3 Answers 3

2 Comments

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related