0

I am trying to build a script that can help me in traversing through all the files in a directory and to identify its file type. At the end the result should print the total count of each file type that were identified. I am using the magic library to identify the file type based on MIME.

for filename in os.listdir(os.getcwd()):
    print filename
    with magic.Magic(flags=magic.MAGIC_MIME_TYPE) as m:
        t = m.id_filename(filename)
        print t

The identification piece is pasted above which seems to be working fine but I am not sure how to store the identified filetypes and their count. The output should look like: filetype1 count filetype2 count ... ...

Please guide me as to what should be the ideal way of doing it.

2
  • throw all your filenames into a list new_list, and from collections import Counter to Counter(new_list) Commented Mar 13, 2017 at 17:56
  • and by filenames I meant file types :P Commented Mar 13, 2017 at 18:02

2 Answers 2

1

You can create a dictionary containing a mapping of each file type to its count. e.g.

file_types = {'filetype1' : 10, 'filetype2': 20, ...}

Note that your current solution will only work on the current directory and not subdirectories.

file_types = {}

for filename in os.listdir(os.getcwd()):
    with magic.Magic(flags=magic.MAGIC_MIME_TYPE) as m:
        t = m.id_filename(filename)
        file_types.setdefault(t, 0)
        file_types[t] += 1
...

Should append and count for you.

Sign up to request clarification or add additional context in comments.

Comments

0

You could use the Counter class from the collections module. It is basically a variant of a dictionary, with a few additional methods and the advantage that you don't need to initialize it with 0 when counting.

I don't have that magic that you mention, so here's an example using my_magic as a substitute:

import collections
import os

def my_magic(filename):
    """
    This function is just a placeholder to be used in place of your id_filename()
    method.
    """
    if filename.endswith(".txt"): 
        return "TXT"
    elif filename.endswith(".pdf"):
        return "PDF"
    else:
        return "other"

# initialize the counter object:
counter = collections.Counter()

for filename in os.listdir(os.getcwd()):
    print filename

    # substitute the next line with whatever you use to determine the 
    # type of the file:
    t = my_magic(filename)
    print t

    # increase the count for the current value of 't':
    counter[t] += 1

# output what is in counter:
for ext, n in counter.items():
    print ext, n

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.