2

I want to take a path for a file, open the file and read the data within it. Upon doing so, I would like to count the number of occurrences of each letter in the alphabet.

Of what I have read and heard, using try/except would be best here. I've tried my best in this, but I only managed to count the occurrences of what letters were in a string within the program, and not within the file.

I haven't a clue how to do this now, and my brain is starting to hurt....this is what I have so far:

import sys
print "Enter the file path:"
thefile = raw_input()
f = open(thefile, "r")
chars = {}
for c in f:
    try:
        chars[c]+=1
    except:
        chars[c]=1
print chars

Any help will be highly appreciated. Thank you.

EDIT: I forgot to say that the result I get at the minute says that the whole file is one character. The file consists of "abcdefghijklmnopqrstuvwxyz" and the resulting output is: {'"abcdefghijklmnopqrstuvwxyz"\n': 1} which it shouldn't be.

4 Answers 4

4

A slightly more elegant approach is this:

from __future__ import with_statement

from collections import defaultdict

print "Enter the file path:"
thefile = raw_input()

with open(thefile, "r") as f:
    chars = defaultdict(int)

    for line in f:
        for c in line:
            chars[c] += 1

    print dict(chars)

This uses a defaultdict to simplify the counting process, uses two loops to make sure we read each character separately without needing to read the entire file into memory, and uses a with block to ensure that the file is closed properly.

Edit:

To compute a histogram of the letters, you can use this version:

from __future__ import with_statement

from string import ascii_letters

print "Enter the file path:"
thefile = raw_input()

chars = dict(zip(ascii_letters, [0] * len(ascii_letters)))

with open(thefile, "r") as f:

    for line in f:
        for c in line:
            if c in ascii_letters:
                chars[c] += 1

for c in ascii_letters:
    print "%s: %d" % (c, chars[c])

This uses the handy string.ascii_letters constant, and shows a neat way to build the empty dictionary using zip() as well.

Sign up to request clarification or add additional context in comments.

6 Comments

It's faster and a bit shorter to use defaultdict(int) since that doesn't involve calling a Python function every time.
Hmm, I receive an error when running that: Traceback (most recent call last): File "********************", line 14, in <module> chars[c] += 1 KeyError: 'a' I'm pretty new to Python, so it's taking a while to sink in. And it's 2am!
@Emlyn: Works for me. Are you perhaps overwriting the chars dictionary with an empty dict?
@Daniel It's working for me now. I used the version before you updated it and received the error. I'll read more into what you have shown me, so I fully understand it. So using this method is better than using try and except?
@Emlyn: There's nothing wrong with using except KeyError if you really aren't sure what keys are in the dictionary. But if you know ahead of time exactly what keys there are, then it can be faster to check. So instead of the if c in ascii_letters line, we could have used try: chars[c] += 1; except KeyError: pass to ignore characters that weren't in ascii_letters. Also note that my if c in ascii_letters is actually sub-optimal as well -- if c in chars would actually be faster because it can use a hashtable lookup instead. (Although the difference is minimal in this case.)
|
1

The for c in f: statement is processing your file line by line (that's what the for operation on a file object is designed to do). Since you want to process it character by character, try changing that to:

data = f.read()
for c in data:

The .read() method reads the entire contents of the file into one string, assigns it to data, then the for loop considers each individual character of that string.

Comments

1

You're almost there, actually; the most important thing you're missing is that your c is not a character, instead it's a line: iterating through a Python file gives you a line at a time. You can solve the problem by adding another loop:

print "Enter the file path:"
thefile = raw_input()
f = open(thefile, "r")
chars = {}
for line in f:
    for c in line:
        try:
            chars[c]+=1
        except:
            chars[c]=1
print chars

(Reading the entire file into a string also works, as another answer mentions, if your file is small enough to fit in memory.)

While it does work in this case, it's not a terribly good idea to use a raw except: unless you're actually trying to catch all possible errors. Instead, use except KeyError:.

What you're trying to do is pretty common, so there's a Python dictionary method and data type that can remove the try/except from your code entirely. Take a look at the setdefault method and the defaultdict type. With either, you can essentially specify that missing values start at 0.

4 Comments

Thank you all for the quick responses. Nicholas, thanks. It does work. :) How would I go about displaying the stats for all occurrences of the alphabet, even if there aren't any occurrences? For example, if the file contained the text "Hello. How are you?", I would like it to show that there are 0 occurrences of the letter b, etc. Ahh, would the setdefault method and defaultdict type solve that?
Nope, but you can do something like this: from string import ascii_letters; for letter in ascii_letters: chars[letter] = 0. That'll give you A-Z, a-z.
@Emlyn: What Nicholas said. But a simpler way to do the same thing is chars = dict(zip(ascii_letters, [0] * len(ascii_letters))) like I show in my updated answer.
Heh, yeah, good point (that's getting a bit scary though. :-)
0

Let's put a more pythonic way for PEP8's sake:

import collections 
with open(raw_input(), 'rb') as f:
    count = collections.Counter(f.read())
    print count

Batteries included! :)

2 Comments

Collections is only available if you have python >= 2.7
@mike: I don't think we are talking about a production environment here, so I'd say that's not an issue.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.