1

I need to count number of words in sentence. I do it with

word_matrix[i][j] = sentences[i].count([*words_dict][j])

But it also counts when a word is included in other word, for example 'in' is included in 'interactive'. How to avoid it?

3
  • 1
    Please provide full code together with sample data. Most probably you're doing it in inefficient way. Commented Feb 11, 2019 at 13:17
  • word_matrix = np.zeros(shape=(n, d)) for i in range(n): for j in range(d): word_matrix[i][j] = sentences[i].count([*words_dict][j]) Commented Feb 11, 2019 at 13:34
  • I try to get matrix, where element [i][j] means number of j element in i sentence Commented Feb 11, 2019 at 13:36

4 Answers 4

1

You could use collections.Counter for this:

from collections import Counter
s = 'This is a sentence'

Counter(s.lower().split())

# Counter({'this': 1, 'is': 1, 'a': 1, 'sentence': 1})
Sign up to request clarification or add additional context in comments.

6 Comments

I don't think counter is the most efficient way to do this
No its not if the purpose is to only count the amount of words, which is a very trivial task. From what I've posted I've obviously understood counting in the sense of word count. I might have missunderstood.
It is more efficient to use len() on the array returned by the split() function as this is a built in function and no import is required.
Yes I'm aware of that. As I've already stated the purpose of using Count is not to count how many words, but rather how many times each word occurs....
And from what OP has posted, I suspect that is what he wants. Again I might be wrong. So if you dowvoted me because my attempt was to obtain the same as in your solution I'll point out that your downvote is unjustified, as the question is ambiguous, and I clearly interpreted something else than you did
|
0

You can just do this:

sentence = 'this is a test sentence'
word_count = len(sentence.split(' '))

in this case word_count would be 5.

Comments

0

use split to tokenise the words of statement, then use logic if word exist in dict then increment the value by one otherwise add the word with count as one :

paragraph='Nory was a Catholic because her mother was a Catholic, and Nory’s mother was a Catholic because her father was a Catholic, and her father was a Catholic because his mother was a Catholic, or had been' 
words=paragraph.split()
word_count={}
counter=0
for i in words:
    if i in word_count:
        word_count[i]+=1
    else:
        word_count[i]=1

print(word_count)

Comments

0

Depending on the situation, the most efficient solution would be using collection.Counter, but you will miss all the words with a symbol:
i.e. in will be different from interactive (as you want), but will also be different from in:.
An alternative solution that consider this problem could be counting the matched pattern of a RegEx:

import re

my_count = re.findall(r"(?:\s|^)({0})(?:[\s$\.,;:])".format([*words_dict][j]), sentences[i])
print(len(my_count))

What is the RegEx doing?
For a given word, you match:
the same word preceded by a space or start of line (\s|^)
and followed by a space, end of the line, a dot, comma, and any symbol in the square brackets ([\s$\.,;:])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.