0

I'm attempting to loop through a directory and any nested directories within. It seemed like recursion would be a good way to go about it.

I ended up with this code:

def get_file_list(directory=os.getcwd()):
    for i in os.listdir(directory):
        if os.path.isdir(i):
            get_file_list(i)
            continue
        print i

This prints everything beautifully -- exactly the output I expected. However, I wanted to take this list of files and pass it to another function for further processing. So I tried compiling everything into a list.

def get_file_list(directory=os.getcwd()):
    files = []
    for i in os.listdir(directory):
        if os.path.isdir(i):
            get_file_list(i)
            continue
        files.append(i)
    return files

So now, the problem is that it only returns the files from the current working directory. After some thinking, I guess this is a scoping issue. A new files variable is being created in a unique piece of memory each time get_file_list() is called, right? So how do you get around something like this? How do you assemble the results from nested calls?

1
  • 1
    You're just throwing away the results of all but the first call. Commented May 20, 2013 at 2:30

5 Answers 5

4
all_files =[]
for current_dir,files,directories in os.walk("C:\\"):
      current_files = [os.path.join(current_dir,file) for file in files]
      all_files.extend(current_files)


print all files

I would think would work better

Sign up to request clarification or add additional context in comments.

3 Comments

any reason you picked map with lambda over a list comprehension here?
unless its homework maybe .... but even then I think ... still I suppose it doesnt hurt to know how to implement yourself recursively
@JeffTratner flipped a coin in my head :P
3

Use extend:

def get_file_list(directory='.'):
    files = []
    for i in os.listdir(directory):
        if os.path.isdir(i):
            files.extend(get_file_list(i))
        else:
            files.append(i)
    return files

Also, I changed your os.getcwd() call to just . since you probably want it to default to the current current working directory, not the working directory at the point at which the function was defined.

Comments

2

Use generators! They're very powerful and make things easy to read. Here are some references.

Basically, you use "yield" to return values instead of "return". When the function encounters a "yield" statement, it returns the value and pauses the execution of the function, meaning when the function is called again later, it picks up where it left off!

And to top it off, you can tell python to iterate over generator functions using "for x in my_generator_function()". Very handy.

import os


#this is a "generator function"
def get_files(directory='.'):
    for item in os.listdir(directory):
        item = os.path.join(directory, item)
        if os.path.isdir(item):
            for subitem in get_files(item):
                yield subitem
                # The fact that there's a "yield" statement here
                #     tells python that this is a generator function
        else:
            yield item

for item in get_files():
    print item  # Do something besides printing here, obviously ;)

1 Comment

This would be a perfect place to use Python 3.3's yield from.
1

A common way to do this recursively in the spirit of your original question is to pass in the list you are appending to as a parameter. Pass the empty list to the very first call to the function. A recursive "helper" (often implemented as a nested function) can accumulate the files.

EDIT:

Here is a complete script (fixed from a previous version):

import os

def get_file_list(directory=os.getcwd()):
    def file_list(directory, files):
        for i in os.listdir(directory):
            if os.path.isdir(i):
                file_list(i, files)
                continue
            files.append(i)
        return files
    return file_list(directory, [])

print get_file_list()

11 Comments

That'll work the first time, but a second time, it'll do something completely unexpected due to default arguments being evaluated at function definition. Try it.
@ZackYoshyaro: The first time you call it, you should get the expected results. Let's say that was ['a', 'b']. The second time you call it, you'll get ['a', 'b', 'a', 'b']. The third time, ['a', 'b', 'a', 'b', 'a', 'b'], and so on.
@ZackYoshyaro The first solution I gave you was wrong (sorry!) because the default parameters are evaluated at function definition time. So there is only ONE list! It started off empty, but each time you call the function you are appending filenames to the exact same list. Not pretty. Sorry about that, I tested it only with one call. :( The new one will work over multiple calls, but see the other answers for more Pythonic approaches.
It does indeed work now, but there was a much simpler solution: make it default to None and start it out with an if files is None: files = [].
@RayToal: No, because someone might want to pass in an empty list for it to populate, ignoring the return value. That's not even just a theoretical: you'll do that yourself if the first item of the root is a directory.
|
0
import os
def get_file_list(files,directory=os.getcwd()):
    for i in os.listdir(directory):
        if os.path.isdir(i):
            get_file_list(files,i) #note me needed to amend this call to pass the reference down the calls
            continue
        files.append(i) #insert the file name into our referenced list.

myfiles = [] #the list we want to insert all the file names into
get_file_list(myfiles) #call the function and pass a reference to myfiles in
print('\n'.join(myfiles))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.