Count strings in nested list

Question

I have a list of lists as follows.

sentences = [
    ["my", "first", "question", "in", "stackoverflow", "is", "my", "favorite"], 
    ["my", "favorite", "language", "is", "python"]
]

I want to get the count of each word in the sentences list. So, my output should look as follows.

{
    'stackoverflow': 1,
     'question': 1,
     'is': 2,
     'language': 1,
     'first': 1,
     'in': 1,
     'favorite': 2,
     'python': 1,
     'my': 3
}

I am currently doing it as follows.

frequency_input = [item for sublist in sentences for item in sublist]
frequency_output = dict(
    (x,frequency_input.count(x)) 
    for x in set(frequency_input)
)

However, it is not efficient at all for long lists. I have a really long list with about 1 million sentences in the list. It took me two days to run it, and it is still running.

In that case I would like to make my programme more efficient. My current first line of the code is O(n^2) and my second line is O(n). Please let me know if there is a more efficient way of doing this in python. It would be reaaly ideal if I could run it with lesser time than now. I am not worried about space complexity.

I am happy to provide more details if needed.

yatu · Accepted Answer · 2019-09-06 07:57:03Z

9

A simpler and more performant approach would be to flatten the lists using itertools.chain, and to count the strings with collections.Counter:

from collections import Counter
from itertools import chain

Counter(chain.from_iterable(sentences))

Counter({'my': 3,
         'first': 1,
         'question': 1,
         'in': 1,
         'stackoverflow': 1,
         'is': 2,
         'favorite': 2,
         'language': 1,
         'python': 1})

answered Sep 6, 2019 at 7:57

yatu

88.7k12 gold badges93 silver badges148 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

EmJ Over a year ago

@yatu thank you for the great answer. I tried your code in my real dataset and it works like magic. Its definitely effective. I think it is O(n). Thank you very much :)

Toby Speight Over a year ago

@EmJ It's actually O(n log m), where m is the number of distinct words to be counted (inserting into a set is a O(log n) operation).

EmJ Over a year ago

@TobySpeight thank you. It is interesting to know that :)

EmreAydin · Accepted Answer · 2019-09-06 08:34:32Z

You can use Counter class from collections module.

if you want to learn the number of words in each sentence separately you can do as follows

from collections import Counter

sentences = [["my", "first", "question", "in", "stackoverflow", "is", "my", "favorite"], ["my", "favorite", "language", "is", "python"]]

counter_list = [dict(Counter(sentence)) for sentence in sentences]
print(counter_list)

Output:

[{'my': 2, 'first': 1, 'question': 1, 'in': 1, 'stackoverflow': 1, 'is': 1, 'favorite': 1}, {'my': 1, 'favorite': 1, 'language': 1, 'is': 1, 'python': 1}]

Or if you want total word counts you can use chain method from itertools module.

import itertools
from collections import Counter

sentences = [["my", "first", "question", "in", "stackoverflow", "is", "my", "favorite"], ["my", "favorite", "language", "is", "python"]]

sentences = list(itertools.chain.from_iterable(sentences))
word_counts = Counter(sentences)
print(word_counts)

Output:

Counter({'my': 3, 'is': 2, 'favorite': 2, 'first': 1, 'question': 1, 'in': 1, 'stackoverflow': 1, 'language': 1, 'python': 1})

The complexity of Counter object as documentation show, Counter is a dict subclass for counting hashable objects. So constructing counter object from an iterable has the time complexity of O(n)

Sabin Yadav · Accepted Answer · 2019-09-06 08:52:29Z

0

sentences = [["my", "first", "question", "in", "stackoverflow", "is", "my", "favorite"], ["my", "favorite", "language", "is", "python"]]

combinedList = []

combine the list of array of words into a single array

def my_function (my_list): for list in my_list: combinedList.extend(list) print(combinedList) my_function(sentences)

use count functionality over the array of words

countData = {}

for word in combinedList: countData[word] = combinedList.count(word)

countData will have the count for each of the words

edited Sep 6, 2019 at 8:52

answered Sep 6, 2019 at 8:36

Sabin Yadav

12 bronze badges

1 Comment

Sabin Yadav Over a year ago

Approach here is use to spread the list data. Then simply use the count method to get the number for each word.

Collectives™ on Stack Overflow

Count strings in nested list

3 Answers 3

3 Comments

Comments

combine the list of array of words into a single array

use count functionality over the array of words

countData will have the count for each of the words

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

combine the list of array of words into a single array

use count functionality over the array of words

countData will have the count for each of the words

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related