1

Here is data.txt file like this:

{'wood', 'iron', 'gold', 'silver'}
{'tungsten', 'iron', 'gold', 'timber'}

I want to get two type of result like below:

#FIRST TYPE: sorted by item
gold: 33.3%
iron: 33.3%
silver: 16.7%
timber: 16.7%
tungsten: 16.7%

#SECOND TYPE: sorted by percentage
silver: 16.7%
timber: 16.7%
tungsten: 16.7%
gold: 33.3%
iron: 33.3%

I show my code for this result

import collections
counter = collections.Counter()

keywords = []
with open("data.txt") as f:
     for line in f:
         if line.strip():
             for keyword in line.split(','):
                 keywords.append(keyword.strip())
     counter.update(keywords)

     for key in counter:
         print "%s: %.1f%s" %(key, (counter[key]*1.0 / len(counter))*100, '%')

However my result show like this

'silver'}: 16.7%
'iron': 33.3%
....

I wan to get rid of curly brackets, apostrophe in the result.

How do I change or rewrite to show a result what I want ? I'll waiting for your help!!

0

3 Answers 3

2

Dictionaries/Counters/sets are not ordered. You must first convert it to a list and sort the list.

For example:

for key, val in sorted(counter.items()):  #or with key=lambda x:x[0]
    print "%s: %.1f%s" % (key, float(val) * 100 / len(counter), "%")

Prints the values sorted by key, while:

for key, val in sorted(counter.items(), key=lambda x: (x[1], x[0])):
    print "%s: %.1f%s" % (key, float(val) * 100 / len(counter), "%")

Sorts them by percentage(if two items have the same percentage they are sorted also by name).

Update

Regarding your parsing problem you have to strip also the { and }:

for line in f:
    if line.strip():
        for keyword in line.strip().strip('{}').split(','):
            keyword = keyword.strip("'")

If you are using a recent python version(like 2.7 and/or 3) you can use ast.literal_eval instead:

import ast
...
for line inf f:
    stripped = line.strip()
    if stripped:
        for keyword in ast.literal_eval(stripped):

Note however that this will remove duplicate keys on the same line! (From your example this seems okay...)

Otherwise you could do:

import ast
...
for line inf f:
    stripped = line.strip()
    if stripped:
        for keyword in ast.literal_eval('[' + stripped[1:-1] + ']'):

Which will preserve duplicates.

Sign up to request clarification or add additional context in comments.

5 Comments

Why a downvote? If the OP later modified his question you should have commented that my answer is now incomplete
key=lambda x: x[1], x[0] this method got error like this: for key, val in sorted(counter.items(), key=lambda x: x[0],x[1]): SyntaxError: non-keyword arg after keyword arg
@PrimingRyan Fixed. You have to put the brackets aroung the lambda expression: key=(lambda x: x[1], x[0]).
for key, val in sorted(counter.items(), key=(lambda x: x[1], x[0])): NameError: name 'x' is not defined - another error message was shown.
@PrimingRyan Ouch. Parentheses goes inside the lambda, sorry: key=labda x: (x[1], x[0]).
1

Use sorted to sort the items based on keys/percentage, because dicts don't have any order.

from collections import Counter
counter = Counter()
import ast
keywords = []
with open("abc") as f:
    for line in f:
        #strip {} and split the line at ", " 
        line = line.strip("{}\n").split(", ")
        counter += Counter(x.strip('"') for x in line)

le = len(counter)    
for key,val in sorted(counter.items()):
    print "%s: %.1f%s" %(key, (val*1.0 / le)*100, '%')

print

for key,val in sorted(counter.items(), key = lambda x :(x[1],x[0]) ):
    print "%s: %.1f%s" %(key, (val*1.0 / le)*100, '%')

output:

'gold': 33.3%
'iron': 33.3%
'silver': 16.7%
'timber': 16.7%
'tungsten': 16.7%
'wood': 16.7%

'silver': 16.7%
'timber': 16.7%
'tungsten': 16.7%
'wood': 16.7%
'gold': 33.3%
'iron': 33.3%

1 Comment

@PrimingRyan It is 6 only.
1

The reason for the stray { and } is that you are not getting rid of them.
To do that just change your for loop to something like:

 for line in f:
     line = line.strip().strip('{}') # get rid of curly braces
     if line:
         ....

As far as printing is concerned:

print "Sorted by Percentage"
for k,v in sorted(c.items(), key=lambda x: x[1]):
    print '{0}: {1:.2%}'.format(k, float(v)/len(c))
print 
print "Sorted by Name"
for k,v in  sorted(c.items(), key=lambda x :x[0]):
    print '{0}: {1:.2%}'.format(k, float(v)/len(c))

1 Comment

This breaks if a line has only tabs(\t) in it. Use line = line.strip().strip('{}') to catch all the whitespace.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.