7

I have a list of lists as follows.

sentences = [
    ["my", "first", "question", "in", "stackoverflow", "is", "my", "favorite"], 
    ["my", "favorite", "language", "is", "python"]
]

I want to get the count of each word in the sentences list. So, my output should look as follows.

{
    'stackoverflow': 1,
     'question': 1,
     'is': 2,
     'language': 1,
     'first': 1,
     'in': 1,
     'favorite': 2,
     'python': 1,
     'my': 3
}

I am currently doing it as follows.

frequency_input = [item for sublist in sentences for item in sublist]
frequency_output = dict(
    (x,frequency_input.count(x)) 
    for x in set(frequency_input)
)

However, it is not efficient at all for long lists. I have a really long list with about 1 million sentences in the list. It took me two days to run it, and it is still running.

In that case I would like to make my programme more efficient. My current first line of the code is O(n^2) and my second line is O(n). Please let me know if there is a more efficient way of doing this in python. It would be reaaly ideal if I could run it with lesser time than now. I am not worried about space complexity.

I am happy to provide more details if needed.

3 Answers 3

9

A simpler and more performant approach would be to flatten the lists using itertools.chain, and to count the strings with collections.Counter:

from collections import Counter
from itertools import chain

Counter(chain.from_iterable(sentences))

Counter({'my': 3,
         'first': 1,
         'question': 1,
         'in': 1,
         'stackoverflow': 1,
         'is': 2,
         'favorite': 2,
         'language': 1,
         'python': 1})
Sign up to request clarification or add additional context in comments.

3 Comments

@yatu thank you for the great answer. I tried your code in my real dataset and it works like magic. Its definitely effective. I think it is O(n). Thank you very much :)
@EmJ It's actually O(n log m), where m is the number of distinct words to be counted (inserting into a set is a O(log n) operation).
@TobySpeight thank you. It is interesting to know that :)
1

You can use Counter class from collections module.

if you want to learn the number of words in each sentence separately you can do as follows

from collections import Counter

sentences = [["my", "first", "question", "in", "stackoverflow", "is", "my", "favorite"], ["my", "favorite", "language", "is", "python"]]

counter_list = [dict(Counter(sentence)) for sentence in sentences]
print(counter_list)

Output:

[{'my': 2, 'first': 1, 'question': 1, 'in': 1, 'stackoverflow': 1, 'is': 1, 'favorite': 1}, {'my': 1, 'favorite': 1, 'language': 1, 'is': 1, 'python': 1}]

Or if you want total word counts you can use chain method from itertools module.

import itertools
from collections import Counter

sentences = [["my", "first", "question", "in", "stackoverflow", "is", "my", "favorite"], ["my", "favorite", "language", "is", "python"]]

sentences = list(itertools.chain.from_iterable(sentences))
word_counts = Counter(sentences)
print(word_counts)

Output:

Counter({'my': 3, 'is': 2, 'favorite': 2, 'first': 1, 'question': 1, 'in': 1, 'stackoverflow': 1, 'language': 1, 'python': 1})

The complexity of Counter object as documentation show, Counter is a dict subclass for counting hashable objects. So constructing counter object from an iterable has the time complexity of O(n)

Comments

0

sentences = [["my", "first", "question", "in", "stackoverflow", "is", "my", "favorite"], ["my", "favorite", "language", "is", "python"]]

combinedList = []

combine the list of array of words into a single array

def my_function (my_list): for list in my_list: combinedList.extend(list) print(combinedList) my_function(sentences)

use count functionality over the array of words

countData = {}

for word in combinedList: countData[word] = combinedList.count(word)

countData will have the count for each of the words

1 Comment

Approach here is use to spread the list data. Then simply use the count method to get the number for each word.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.