1

I have a large list of strings and I would like to create a dictionary out of this.

Every different word is a key and the value is the number of times the word is present in the whole list of various strings.

I am new to Python still and am bit lost. I am sure I have to do the loop, in which I would have to:

  1. Check if the next word is not a duplicate
  2. maintain iterator to calculate the number of times each word exists in a dictionary

What if I use set() first to get all unique words and than loop through them and count the frequency?

Would be greatly appreciative of any advice

 [u'retw', u'folivi_jochan', u':', u'rt', u'newsycombinator', u':', u'uber',  u'is', u'taking', u'millions', u'of', u'manhattan', u'rides', u'away', u'from',  u'taxis', u'http', u':', u'//t.co/zluyq3f6cc'] [u'retw', u'chr1sa', u':', u'rt',  u'newsycombinator', u':', u'uber', u'is', u'taking', u'millions', u'of',  u'manhattan', u'rides', u'away', u'from', u'taxis', u'http', u':',  u'//t.co/zluyq3f6cc'] [u'retw', u'olutosinfashusi', u':', u'rt', u'newsycombinator', u':', u'uber', u'is', u'taking', u'millions', u'of', u'manhattan', u'rides', u'away', u'from', u'taxis', u'http', u':', u'//t.co/zluyq3f6cc'] [u'retw', u'shakycode', u':', u'rt', u'newsycombinator', u':', u'uber', u'is', u'taking', u'millions', u'of', u'manhattan', u'rides', u'away', u'from', u'taxis', u'http', u':', u'//t.co/zluyq3f6cc'] [u'an', u'interesting', u'read', u'manhattan', u'is', u'the', u'best', u'tv', u'show', u'that', u'hardly', u'anybody', u'is', u'watching', u'http', u':', u'//t.co/psfmauuwfg'] [u'tmr', u'am', u':', u'lunch', u'at', u'the', u'arts', u'!', u'from', u'11-2pm', u'at', u'1935', u'manhattan', u'beach', u'blvd', u'in', u'redondo', u'beach', u'!', u'map', u':', u'http', u':', u'//t.co/x6x2eeijbh'] [u's1', u'was', u'superb', u'.', u'``', u'manhattan', u'is', u'the', u'best', u'tv', u'show', u'that', u'hardly', u'anybody', u'is', u'watching', u"''", u'http', u':', u'//t.co/q6iazmtaam'] [u'taylor', u'swift', u'seen', u'leaving', u'msr', u'studios', u'in', u'manhattan', u'on', u'october', u'07', u',', u'2015', u'in', u'new', u'york', u',', u'new', u'york', u'.', u'http', u':', u'//t.co/3cwxrapr38'] [u'viva', u'a1054665', u'manhattan', u'acc', u'estimated', u'to', u'be', u'7', u'yrs', u'old', u'american', u'staff', u'mix', u',', u'white', u'/', u'brown', u',', u'spayed', u'female', u'...', u'http', u':', u'//t.co/sloopljyxq'] [u'#', u'3d', u'taevision', u"'showroom", u'in', u'the', u'night', u'#', u'porsche', u'996', u"'", u'#', u'automotive', u'#', u'fashion', u'#', u'makeup', u'#', u'ny', u'#', u'nyc', u'#', u'manhattan', u'http', u':', u'//t.co/eftvytqedk']

Thank you

2

2 Answers 2

4

For python 2.7 and above use Counter from the collections module:

from collections import Counter
mylist = [u'retw', u'folivi_jochan', u':', u'rt', u'newsycombinator', u':', u'uber', u'is', u'taking', u'millions', u'of', u'manhattan', u'rides', u'away', u'from', u'taxis', u'http', u':', u'//t.co/zluyq3f6cc', u'retw', u'chr1sa', u':', u'rt', u'newsycombinator', u':', u'uber', u'is', u'taking', u'millions', u'of', u'manhattan', u'rides', u'away', u'from', u'taxis', u'http', u':', u'//t.co/zluyq3f6cc', u'retw', u'olutosinfashusi', u':', u'rt', u'newsycombinator', u':', u'uber', u'is', u'taking', u'millions', u'of']
c = Counter(mylist)
print dict(c)
[(u':', 8),
 (u'rt', 3), 
 (u'uber', 3), 
 (u'newsycombinator', 3), 
 (u'of', 3), 
 (u'is', 3), 
 (u'retw', 3), 
 (u'taking', 3), 
 (u'millions', 3), 
 (u'from', 2), 
 (u'//t.co/zluyq3f6cc', 2), 
 (u'manhattan', 2), 
 (u'away', 2),
 (u'http', 2),
 (u'taxis', 2), 
 (u'rides', 2),
 (u'olutosinfashusi', 1),
 (u'chr1sa', 1), 
 (u'folivi_jochan', 1)]

If you have three separate lists try using chain from itertools:

one,two,three = [u'retw', u'folivi_jochan', u':', u'rt', u'newsycombinator', u':', u'uber',   u'is', u'taking', u'millions', u'of', u'manhattan', u'rides', u'away', u'from', u'taxis', u'http', u':', u'//t.co/zluyq3f6cc'],[u'retw', u'chr1sa', u':', u'rt', u'newsycombinator', u':', u'uber', u'is', u'taking', u'millions', u'of', u'manhattan', u'rides', u'away', u'from', u'taxis', u'http', u':', u'//t.co/zluyq3f6cc'], [u'retw', u'olutosinfashusi', u':', u'rt', u'newsycombinator', u':', u'uber', u'is', u'taking', u'millions', u'of']
from itertools import chain
from collections import Counter
c=Counter(chain(one,two,three))

Counter is a high performance class for counting occurences of elements in iteratables. Its most_common() method returns a list of tuples (element,count). This list of tuples can be used to construct a dict

Sign up to request clarification or add additional context in comments.

6 Comments

This will get me the most common element. I need a complete dictionary where the Key = a unique word from the list of strings, Value = word's frequency in the list of strings
Awesome! For some rewason on the same set I am getting: {'!': 2, ' ': 209, '#': 8, '"': 6, "'": 418, '-': 1 Can not see the error, which, I am sure I made somewhere. This is such a GREAT solution! Thanks!
Good answer. But what's the point of most_common()? Counter is already a subclass of dict. There's not much need to convert to dict at all; and if you want to you can do it directly: d = dict(c).
@SebastianWozny - still can not replicate for some reason. Tried everything I can think of. Including chaining my 9 lists. Not a very practical approach!
Can you edit your first post to show your input data exactly? it looks like you have no lists, but just a huge string and it counts the occurrences of the individual letters. maybe you're missing a .split(" ")?
|
0

Alternative approach, using your for loop:

for word in strings:
if word not in dict.keys():
    dict[word]=1
else:
    dict[word] += 1

Above assumes that string is your list of words that you want to iterate.

1 Comment

No need for .keys(). Just check for membership directly against the dict.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.