Count frequency of words and string

Question

I need to count number of words in sentence. I do it with

word_matrix[i][j] = sentences[i].count([*words_dict][j])

But it also counts when a word is included in other word, for example 'in' is included in 'interactive'. How to avoid it?

Please provide full code together with sample data. Most probably you're doing it in inefficient way. — Slam
– Slam, Commented Feb 11, 2019 at 13:17
word_matrix = np.zeros(shape=(n, d)) for i in range(n): for j in range(d): word_matrix[i][j] = sentences[i].count([*words_dict][j]) — Debra
– Debra, Commented Feb 11, 2019 at 13:34
I try to get matrix, where element [i][j] means number of j element in i sentence — Debra
– Debra, Commented Feb 11, 2019 at 13:36

yatu · Accepted Answer · 2019-02-11 13:20:22Z

1

You could use collections.Counter for this:

from collections import Counter
s = 'This is a sentence'

Counter(s.lower().split())

# Counter({'this': 1, 'is': 1, 'a': 1, 'sentence': 1})

answered Feb 11, 2019 at 13:20

yatu

88.7k12 gold badges93 silver badges148 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

hhaefliger Over a year ago

I don't think counter is the most efficient way to do this

yatu Over a year ago

No its not if the purpose is to only count the amount of words, which is a very trivial task. From what I've posted I've obviously understood counting in the sense of word count. I might have missunderstood.

hhaefliger Over a year ago

It is more efficient to use len() on the array returned by the split() function as this is a built in function and no import is required.

yatu Over a year ago

Yes I'm aware of that. As I've already stated the purpose of using Count is not to count how many words, but rather how many times each word occurs....

yatu Over a year ago

And from what OP has posted, I suspect that is what he wants. Again I might be wrong. So if you dowvoted me because my attempt was to obtain the same as in your solution I'll point out that your downvote is unjustified, as the question is ambiguous, and I clearly interpreted something else than you did

|

hhaefliger · Accepted Answer · 2019-02-11 13:32:07Z

0

You can just do this:

sentence = 'this is a test sentence'
word_count = len(sentence.split(' '))

in this case word_count would be 5.

answered Feb 11, 2019 at 13:32

hhaefliger

5213 silver badges18 bronze badges

Comments

Pradeep Pandey · Accepted Answer · 2019-02-11 13:50:31Z

0

use split to tokenise the words of statement, then use logic if word exist in dict then increment the value by one otherwise add the word with count as one :

paragraph='Nory was a Catholic because her mother was a Catholic, and Nory’s mother was a Catholic because her father was a Catholic, and her father was a Catholic because his mother was a Catholic, or had been' 
words=paragraph.split()
word_count={}
counter=0
for i in words:
    if i in word_count:
        word_count[i]+=1
    else:
        word_count[i]=1

print(word_count)

answered Feb 11, 2019 at 13:50

Pradeep Pandey

3072 silver badges7 bronze badges

Comments

Gsk · Accepted Answer · 2019-02-11 14:06:48Z

0

Depending on the situation, the most efficient solution would be using collection.Counter, but you will miss all the words with a symbol:
i.e. in will be different from interactive (as you want), but will also be different from in:.
An alternative solution that consider this problem could be counting the matched pattern of a RegEx:

import re

my_count = re.findall(r"(?:\s|^)({0})(?:[\s$\.,;:])".format([*words_dict][j]), sentences[i])
print(len(my_count))

What is the RegEx doing?
For a given word, you match:
the same word preceded by a space or start of line (\s|^)
and followed by a space, end of the line, a dot, comma, and any symbol in the square brackets ([\s$\.,;:])

edited Feb 11, 2019 at 14:06

answered Feb 11, 2019 at 13:46

Gsk

2,9655 gold badges25 silver badges30 bronze badges

Collectives™ on Stack Overflow

Count frequency of words and string

4 Answers 4

6 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

6 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related