1

Hey all, this is my first time recently trying to get into the file and os part of Python. I am trying to search a directory then find all sub directories. If the directory has no folders, add all the files to a list. And organize them all by dict.

So for instance a tree could look like this

  • Starting Path
    • Dir 1
      • Subdir 1
      • Subdir 2
      • Subdir 3
        • subsubdir
          • file.jpg
          • folder1
            • file1.jpg
            • file2.jpg
          • folder2
            • file3.jpg
            • file4.jpg

Even if subsubdir has a file in it, it should be skipped because it has folders in it.

Now I can normally do this if I know how many directories I am going to be looking for, using os.listdir and os.path.isdir. However if I want this to be dynamic it will have to compensate for any amount of folders and subfolders. I have tried using os.walk and it will find all the files easily. The only trouble I am having is creating all the dicts with the path names that contain file. I need the foldernames organized by dict, up until the starting path.

So in the end, using the example above, the dict should look like this with the files in it:

dict['dir1']['subdir3']['subsubdir']['folder1'] = ['file1.jpg', 'file2.jpg']

dict['dir1']['subdir3']['subsubdir']['folder2'] = ['file3.jpg', 'file4.jpg']

Would appreciate any help on this or better ideas on organizing the information. Thanks.

1
  • what are you going to use the directory tree for? Commented Dec 10, 2009 at 2:35

3 Answers 3

5

Maybe you want something like:

def explore(starting_path):
  alld = {'': {}}

  for dirpath, dirnames, filenames in os.walk(starting_path):
    d = alld
    dirpath = dirpath[len(starting_path):]
    for subd in dirpath.split(os.sep):
      based = d
      d = d[subd]
    if dirnames:
      for dn in dirnames:
        d[dn] = {}
    else:
      based[subd] = filenames
  return alld['']

For example, given a /tmp/a such that:

$ ls -FR /tmp/a
b/  c/  d/

/tmp/a/b:
z/

/tmp/a/b/z:

/tmp/a/c:
za  zu

/tmp/a/d:

print explore('/tmp/a') emits: {'c': ['za', 'zu'], 'b': {'z': []}, 'd': []}.

If this isn't exactly what you're after, maybe you can show us specifically what the differences are supposed to be? I suspect they can probably be easily fixed, if need be.

Sign up to request clarification or add additional context in comments.

2 Comments

Unfortunately this produces KeyErrors so I am unable to test this. However from the returned dict in your example, sounds about right.
This was the closest clear solution to my very similar problem. Thanks.
2

I don't know why you would want to do this. You should be able to do your processing using os.path.walk, but in case you really need such a structure, you can do (untested):

import os

def dirfunc(fdict, dirname, fnames):
    tmpdict = fdict
    keys = dirname.split(os.sep)[:-1]
    for k in keys:
        tmpdict = tmpdict.setdefault(k, {})

    for f in fnames:
        if os.path.isdir(f):
            return

    tmpdict[dirname] = fnames

mydict = {}
os.walk(directory_to_search, dirfunc, mydict)

Also, you should not name your variable dict because it's a Python built-in. It is a very bad idea to rebind the name dict to something other than Python's dict type.

Edit: edited to fix the "double last key" error and to use os.walk.

5 Comments

Aye. The dict variable was just me being lazy. Anyways this does work except it creates a duplicate key if there is a file. In other words (using the above example) dictvar['dir1']['subdir3']['subsubdir']['folder2']['folder2'] = ['file3.jpg', 'file4.jpg']
That's the danger in pasting untested code. You can fix it by doing: keys = dirname.split(os.sep)[:-1].
use os.walk(), not os.path.walk
Thanks for the comment. Should have used os.walk().
Also that's not the proper use of os.walk(), it is different from os.path.walk.
1

There is a basic problem with the way you want to structure the data. If dir1/subdir1 contains subdirectories and files, should dict['dir1']['subdir1'] be a list or a dictionary? To access further subdirectories with ...['subdir2'] it needs to be a dictionary, but on the other hand dict['dir1']['subdir1'] should return a list of files.

Either you have to build the tree from custom objects that combine these two aspects in some way, or you have to change the tree structure to treat files differently.

1 Comment

Well that's why I want it so if it finds a folder it will just skip adding files.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.