I have a list of lists as follows.
sentences = [
["my", "first", "question", "in", "stackoverflow", "is", "my", "favorite"],
["my", "favorite", "language", "is", "python"]
]
I want to get the count of each word in the sentences list. So, my output should look as follows.
{
'stackoverflow': 1,
'question': 1,
'is': 2,
'language': 1,
'first': 1,
'in': 1,
'favorite': 2,
'python': 1,
'my': 3
}
I am currently doing it as follows.
frequency_input = [item for sublist in sentences for item in sublist]
frequency_output = dict(
(x,frequency_input.count(x))
for x in set(frequency_input)
)
However, it is not efficient at all for long lists. I have a really long list with about 1 million sentences in the list. It took me two days to run it, and it is still running.
In that case I would like to make my programme more efficient. My current first line of the code is O(n^2) and my second line is O(n). Please let me know if there is a more efficient way of doing this in python. It would be reaaly ideal if I could run it with lesser time than now. I am not worried about space complexity.
I am happy to provide more details if needed.