2

(I am brand new to any kind of programming so please be as specific as you can when you answer) Problem: I have written a program to solve pythonchallenge.com level 2. The program works but the results are messy. I want to sort the results of the character count into a nice looking list. When I try to sort the results of the character count using sorted() it removes all the counts and just gives me a list of the characters that were in my string. I need to be able to keep the ability to see how much of each character was in my file. Anyway here is the code:

countstring = open('pagesource.txt').read()

charcount = {}

for x in countstring:
    charcount[x] = charcount.get(x, 0) + 1

print charcount

this is what i get in cmd:

>>> {'\n': 1219, '!': 6079, '#': 6115, '%': 6104, '$': 6046, '&': 6043, ')': 6186, '
(': 6154, '+': 6066, '*': 6034, '@': 6157, '[': 6108, ']': 6152, '_': 6112, '^':
 6030, 'a': 1, 'e': 1, 'i': 1, 'l': 1, 'q': 1, 'u': 1, 't': 1, 'y': 1, '{': 6046
, '}': 6105}

if I add a sorted() function such as print sorted(charcount) to it I get this in cmd:

>>> ['\n', '!', '#', '$', '%', '&', '(', ')', '*', '+', '@', '[', ']', '^', '_', 'a'
, 'e', 'i', 'l', 'q', 't', 'u', 'y', '{', '}']

Thanks for your solutions and if you can take the time to add comments to your code explaining what everything does I would greatly appreciate it!

6 Answers 6

3

You should really use the Counter class instead of reinventing your own wheel.

charcount is a dictionary, and dictionaries have no implicit sort order. Therefore, we'll have to convert it to a list, which can be sorted. Each entry in that list will be a tuple of count and character.

charcount.items() already gives us a list that looks like [('\n', 1219), ('!', 6079)]. Unfortunately, if we would sort this list, it would sort by character first and then (if characters were ever equal) by count instead of the other way round. Therefore, we need a key function to tell sort to look at count first, and then (if counts are equal) the character. Fortunately, our key function is really simple; it just swaps around the tuple:

lambda (char,count): (count, char)

Alternatively, we could use a list comprehension to swap the values, to get something like: [('\n', 1219), ('!', 6079)], then sort, and then swap the values again.

charcount_list = sorted(charcount.items(), key=lambda (char,count):(count, char))

charcount_list will now be:

[('a', 1), ('e', 1), ('i', 1), ('l', 1), ('q', 1), ('t', 1), ('u', 1), ('y', 1),
 ('\n', 1219), ('^', 6030), ('*', 6034), ('&', 6043), ('$', 6046), ('{', 6046),
 ('+', 6066), ('!', 6079), ('%', 6104), ('}', 6105), ('[', 6108), ('_', 6112),
 ('#', 6115), (']', 6152), (' (', 6154), ('@', 6157), (')', 6186)]

If you want the reverse order, simply specify the reverse=True argument to sorted.

Sign up to request clarification or add additional context in comments.

4 Comments

Or at least a defaultdict on pre-2.7 versions.
That is an ugly key function -- either use the list comprehension you originally had here, or itemgetter(1, 0) if you really want to sort by both the value and key.
@agf Why exactly is that lambda ugly? Every Python programmer will immediately understand what's happening here. I agree that using itemgetter makes it a little bit shorter (if you don't count the import), but not necessarily more readable.
It just looks ugly to me compared to sorted((v, k) for k, v in charcount.iteritems()) or the itemgetter version.
2
>>> from operator import itemgetter
>>> sorted(charcount.items(), key=itemgetter(1))
[('a', 1), ('e', 1), ('i', 1), ('l', 1), ('q', 1), ('u', 1), ('t', 1), ('y', 1), ('\n', 1219), ('^', 6030), ('*', 6034), ('&', 6043), ('$', 6046), ('{', 6046), ('+', 6066), ('!', 6079), ('%', 6104), ('}', 6105), ('[', 6108), ('_', 6112), ('#', 6115), (']', 6152), (' (', 6154), ('@', 6157), (')', 6186)]

2 Comments

While this is neat and short, the question asks: "I need to be able to keep the ability to see how much of each character was in my file".
@phihag, yeah i saw that and fixed it
0

charcount is a dict (dictionary). Iterating a dictionary iterates over it's keys, that's why sorted() results in a sorted list of keys.

You need to get list of items then sort it by the second value:

sorted(charcount.items(), key=lambda t: t[1])

Comments

0

Dictionaries ( what {} means) are unordered collections. Which means you can't sort them in any kind of meaningful way. I suggest storing the information as a list of tuples [(), ...] and then sorting them based on that.

foo = [('a', 123), ('b', 345)]

def key_function(x):
    return x[1]

sorted_list = sorted(foo, key_function)
print sorted_list

As you can see, sorted takes an optional second parameter. The purpose of that parameter is to provide a function that tells sorted how to sort something. All you're doing is breaking down the information in each tuple in the list to provide a value that can be ordered, since you can't really order a list of tuples in any meaningful way.

Make sense?

It can also be written like: print sorted(foo, key=lambda (x,y): y)

lambda just means an inline function with no name, and it allows you to break down the tuple in a different way.

You can see how this works by doing print [y for (x,y) in sorted_list]

You can even redefine the key function from before like this:

def key_function(x):
    x,y = x
    return y

BTW, I only put in the parentheses before for clarity. If you're not defining a function then the comma is the tuple constructor.

1 Comment

You should really call it key_function as it's not sorting, just returning the key.
0
sorted(charcount.items(), key=lambda item: item[1])

1 Comment

you would have to use charcount.items() instead of charcount
0

Dictionary is iterated by key, so you get a sorted list of keys when you pass the dictionary to sorted. Sort the dictionary's item tuples by value to get a list of sorted tuples.

sorted_charcount = sorted(charcount.items(), key=lambda item: item[1])

If you're using Python 2.7+, then you can use the list of tuples to initialize an OrderedDict, which will maintain the sorted order of item tuples.

3 Comments

He said "I want to sort the results of the character count into a nice looking list." Also, "You cannot sort a dictionary" is wrong, you can sort a dictionary (as you and every other answer shows), dictionaries just aren't sorted.
There are already several other answers giving this exact solution.
When I started writing my answer, the question was unanswered. Not much point in either deleting or keeping the answer, so I just kept it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.