4

I have this code that counts files in a directory with the same first two letters. I want to amend it so that it does it by the date modified. So if there were 10 files that started with PR and 10 files that started with FM, 5 each on 5/17/2013 and 5 each on 5/18/2013 the output would be:

17
FM 5
PR 5
18
FM 5
PR 5
import os
from collections import Counter

path = '/My/path/to/the/directory/test'

counts = Counter(fname[:2] for fname in os.listdir(path) if
                      os.path.isfile(os.path.join(path, fname)) 
                  and 'blue' in fname 
                  or 'green' in fname 
                  or 'yellow' in fname 
                  or 'red' in fname 
                  or 'purple' in fname)

for initials, count in counts.most_common():
    print '{}: {:>20}'.format(initials,count)

I can print out the date modified, but not in conjunction with the count. I would appreciate any help. I originally wanted to use a scheduler ( have a good example to follow), but bogged down in its usage and getting it to trigger. Since I have been reading about regular expressions and how to extract the day of the month in the filename, but mostly confused as to how it would all connect.

10
  • All the or tests could be simplified to: and any(c in fname for c in ('blue', 'green', 'yellow', 'red', 'purple') Commented May 31, 2013 at 21:37
  • 1
    Actually, that would be more correct, as it stands it's (isfile and blue) or ... Commented May 31, 2013 at 21:50
  • I get a syntax error at the end of counts.most_common() when I change it Commented May 31, 2013 at 22:10
  • You're only printing out the day of the month. What would you want printing and order-wise if some of the files had modification dates of 2013-05-31 and others had 2013-06-01? Commented Jun 1, 2013 at 0:23
  • At the end of the path where it says 'test' is an actual month folder that I can change month to month, I change April to May etc. Going to try the reply below. Commented Jun 1, 2013 at 4:15

2 Answers 2

1

One approach would be to build a dictionary from the files, keyed by their modification date, with an associated Counter object similar what you're doing in your code. To simplify things slightly, I also used a defaultdict of Counters.

So, given a folder with these files & modification dates in it for testing:

blue1       05/30/2013  06:37 PM
green1      05/30/2013  06:37 PM
green2      05/30/2013  06:37 PM
purple1     05/30/2013  06:37 PM
purple2     05/30/2013  06:37 PM
purple3     05/30/2013  06:37 PM
purple4     05/30/2013  06:37 PM
purple5     05/30/2013  06:37 PM
red1        05/31/2013  06:38 PM
red2        05/31/2013  06:38 PM
red3        05/31/2013  06:38 PM
red4        05/31/2013  06:38 PM
yellow1     05/31/2013  06:38 PM
yellow2     05/31/2013  06:38 PM
yellow3     05/31/2013  06:38 PM

This code:

from collections import defaultdict, Counter
from datetime import date
from operator import itemgetter
import os

COLORS = ('blue', 'green', 'yellow', 'red', 'purple')
NUM_LETTERS = 2
path = 'testdir'

date_counters = defaultdict(Counter)

for filename, filepath in ((name, os.path.join(path, name))
                                for name in os.listdir(path)):
    if (os.path.isfile(filepath) and any(color in filename for color in COLORS)):
        mod_date = date.fromtimestamp(os.stat(filepath).st_mtime)
        date_counters[mod_date].update((filename[:NUM_LETTERS],))

for mod_date in sorted(date_counters):  # sort by file group's modification date
    print mod_date.day
    for initials, count in sorted(date_counters[mod_date].iteritems(),
                                  key=itemgetter(1)):
        print initials, count

Produced this output:

30
bl 1
gr 2
pu 5
31
ye 3
re 4
Sign up to request clarification or add additional context in comments.

Comments

1

You can use groupby to do organize the files:

First you need a function that maps a file to its mtime, then get a list of the files, sorted by that value:

from collections import Counter
from itertools import groupby
import os
import datetime

def find_mod_date(basedir):
    return lambda filename: datetime.date.fromtimestamp(
                            os.stat(os.path.join(basedir, filename)).st_mtime)

path="/tmp"
mod_dates_in_path = find_mod_date(path)

files = [fname for fname in os.listdir(path) 
         if os.path.isfile(os.path.join(path, fname))
             and any(name in fname for name in ['red', 'blue'])]
files = sorted(files, key=mod_dates_in_path)

Then group the files by mtime:

grouping_by_date = groupby(files, key=mod_dates_in_path)

Iterate over the results and count by name prefix:

results = {}
for day, group in grouping_by_date:
    results[day] = Counter(name[:2] for name in group)

for day, prefix_counts in results.iteritems():
    print day
    for prefix, count in prefix_counts.iteritems():
        print "{}: {}".format(prefix, count)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.