2

I want to read all files in a folder except a file named "xyz". When I reach to this file, I want to skip it and read the next one.

Currently I have the following code:

for file in glob.glob('*.xml'):
    data = open(file).read()
    print(file)

Obviously, this will read all files in that folder. How should I skip the file "xyz.xml"

1
  • 2
    there are bunch of simple ways to make it through. you should think about it yourself first. Commented Aug 25, 2014 at 20:57

5 Answers 5

5

The continue keyword is useful for skipping an iteration of a for loop:

for file in glob.glob('*.xml'):
    if file=="xyz.xml":
        continue
    data = open(file).read()
    print(file)
Sign up to request clarification or add additional context in comments.

1 Comment

As per suggestion of @SylvainLeroux (a comment in my answer), you can use glob.iglob to use an iterator, if that is a concern. +1
2
for file in [f for f in glob.glob('*.xml') if f != "xyz.xml"]:
    do_stuff()

2 Comments

Use a generator expression, so you don't create the whole array.
@utdemir Might even use iglob so it will never store the entire list into memory.
2

For sake of completeness as no one posted the most obvious version:

for file in glob.glob('*.xml'):
    if file != 'xyz.xml':
        data = open(file).read()
        print(file)

4 Comments

I like this option too, but in python it unfortunately requires an extra level of indendation for the whole block.
@AndrewJohnson Yes. But I don't know if the OP cares about that, though ;)
wondering the difference between this option and Andrew's answer. Any advantages on running time or memory allocation?
@ahri According to dis, Andrew's answer takes 3 extra bytes once compiled (an extra JUMP_FORWARD opcode). But honestly, this is sooooo marginal...
2

Try this, assuming that the element to be removed is in the list returned by glob.glob() (if that's not guaranteed, put the remove() line inside a try block):

lst = glob.glob('*.xml')
lst.remove('xyz.xml') # assuming that the element is present in the list
for file in lst:
    pass

Or if you care about memory usage, use a generator:

for file in (file for file in glob.glob('*.xml') if file != 'xyz.xml'):
    pass

4 Comments

Might be because this throws an error if xyz.xml isn't in the list? I'm not the downvoter, though.
Downvoter here. Allocating the list and removing an element afterwards is pretty unnecessary(Probably glob returns a list, but one shouldn't rely on it). You're allocating O(n) memory, and you're traversing the list to search for "xyz.xml", which passes over the list, and calling remove which moves data on memory, which has also linear complexity. The whole thing can be simply done on constant memory and one pass over resulting list.
@utdemir read the documentation of glob.glob() : "Return a possibly-empty list of path names". The list was already allocated, that function returns a list. It's evean cheaper to remove the element in the first place (as I did above) than to create a new list comprehension or generator expression
@ÓscarLópez, list.remove has the possibility to reallocating the whole array, probably it won't, but why should we rely on that, since we can easily skip the element via generators/continue statement? I think one should always aim for worst-case complexity.
1

You can use glob for fairly simple pattern matching but remember that pattern rules for glob are not regular expressions! Below code can help you exclude all xml files that start with 'X'

files = glob.glob('[!X]*.xml')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.