1

I want to find frequencies for the certain words in wanted, and while it finds me the frequecies, the displayed result contains lots of unnecessary data.

Code:

from collections import Counter
import re
wanted = "whereby also thus"
cnt = Counter()
words = re.findall('\w+', open('C:/Users/user/desktop/text.txt').read().lower())
for word in words:
    if word in wanted:
        cnt[word] += 1
print (cnt)

Results:

Counter({'e': 131, 'a': 119, 'by': 38, 'where': 16, 's': 14, 'also': 13, 'he': 4, 'whereby': 2, 'al': 2, 'b': 2, 'o': 1, 't': 1})

Questions:

  1. How do i omit all those 'e', 'a' 'by', 'where', etc.?
  2. If I then wanted to sum up the frequencies of words (also, thus, whereby) and divide them by total number of words in text, would that be possible?

disclaimer: this is not school assignment. i jut got lots of free time at work now and since i spend a lot of time with reading texts i decided to do this little project of mine to remind myself a bit of what i've been taught couple years ago.

Thanks in advance for any help.

3
  • 2
    Please properly indent your code. Commented Dec 26, 2017 at 14:00
  • The problem is that word in wanted checks to see if the word can be found anywhere in the string. You can find "t" in there at the start of "thus", and the same for the other words. Try wanted = ["whereby", "also", "thus"] instead. Commented Dec 26, 2017 at 14:05
  • yes, thank you, that helped a lot. Commented Dec 26, 2017 at 14:16

2 Answers 2

1

As others have pointed out, you need to change your string wanted to a list. I just hardcoded a list, but you could do use str.split(" ") if you were passed a string in a function. I also implemented you the frequency counter. Just as a note, make sure you close your files; it's also easier (and recommended) that you use the open directive.

from collections import Counter
import re
wanted = ["whereby", "also", "thus"]
cnt = Counter()
with open('C:/Users/user/desktop/text.txt', 'r') as fp:
    fp_contents = fp.read().lower()
words = re.findall('\w+', fp_contents)
for word in words:
    if word in wanted:
        cnt[word] += 1
print (cnt)

total_cnt = sum(cnt.values())

print(float(total_cnt)/len(cnt))
Sign up to request clarification or add additional context in comments.

3 Comments

apprectiate, all it took was to change a string to a list. big thanks!
I think you mean float(total_cnt)/len(cnt) instead of float(total_cnt)/len(fp_contents). Also, you could add parentheses to that print statement for Python 3 compatibility.
@anonymoose yes, you're completely right, I made the changes. I'm still stuck on Python 2.7* ;)
1

Reading from the web

I made this little mod of the code of Axel to read from a txt on the web, Alice in wonderland, to apply the code (as I don't have your txt file and I wanted to try it). So, I publish it here in case someone should need something like this.

from collections import Counter
import re
from urllib.request import urlopen
testo = str(urlopen("https://www.gutenberg.org/files/11/11.txt").read())
wanted = ["whereby", "also", "thus", "Alice", "down", "up", "cup"]
cnt = Counter()
words = re.findall('\w+', testo)
for word in words:
    if word in wanted:
        cnt[word] += 1
print(cnt)

total_cnt = sum(cnt.values())

print(float(total_cnt) / len(cnt))

output

Counter({'Alice': 334, 'up': 97, 'down': 90, 'also': 4, 'cup': 2})
105.4
>>> 

How many times the same word is found in adjacent sentences

This answer to the request (from the author of the question) of looking for how many times a word is found in adjacent sentences. If in a sentence there are more same words (ex.: 'had') and in the next there is another equal, I counted that for 1 ripetition. That is why I used the wordfound list.

from collections import Counter
import re


testo = """There was nothing so VERY remarkable in that; nor did Alice think it so? Thanks VERY much. Out of the way to hear the Rabbit say to itself, 'Oh dear! Oh dear! I shall be late!' (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed. Quite natural); but when the Rabbit actually TOOK A WATCH OUT OF ITS? WAISTCOAT-POCKET, and looked at it, and then hurried on.
Alice started to her feet, for it flashed across her mind that she had never before seen a rabbit. with either a waistcoat-pocket, or a watch to take out of it! and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop? Down a large rabbit-hole under the hedge.
Alice opened the door and found that it led into a small passage, not much larger than a rat-hole: she knelt down and looked along the passage into the loveliest garden you ever saw. How she longed to get out of that dark hall, and wander about among those beds of bright flowers and those cool fountains, but she could not even get her head through the doorway; 'and even if my head would go through,' thought poor Alice, 'it would be of very little use without my shoulders. Oh, how I wish I could shut up like a telescope! I think I could, if I only knew how to begin.'For, you see, so many out-of-the-way things had happened lately, that Alice had begun to think that very few things indeed were really impossible. There seemed to be no use in waiting by the little door, so she went back to the table, half hoping she might find another key on it, or at any rate a book of rules for shutting people up like telescopes: this time she found a little bottle on it, ('which certainly was not here before,' said Alice,) and round the neck of the bottle was a paper label, with the words 'DRINK ME' beautifully printed on it in large letters. It was all very well to say 'Drink me,' but the wise little Alice was not going to do THAT in a hurry. 'No, I'll look first,' she said, 'and see whether it's marked "poison" or not'; for she had read several nice little histories about children who had got burnt, and eaten up by wild beasts and other unpleasant things, all because they WOULD not remember the simple rules their friends had taught them: such as, that a red-hot poker will burn you if you hold it too long; and that if you cut your finger VERY deeply with a knife, it usually bleeds; and she had never forgotten that, if you drink much from a bottle marked 'poison,' it is almost certain to disagree with you, sooner or later. However, this bottle was NOT marked 'poison,' so Alice ventured to taste it, and finding it very nice, (it had, in fact, a sort of mixed flavour of cherry-tart, custard, pine-apple, roast turkey, toffee, and hot buttered toast,) she very soon finished it off. """


frasi = re.findall("[A-Z].*?[\.!?]", testo, re.MULTILINE | re.DOTALL)

print("How many times this words are repeated in adjacent sentences:")
cnt2 = Counter()
for n, s in enumerate(frasi):
    words = re.findall("\w+", s)
    wordfound = []
    for word in words:
        try:
            if word in frasi[n + 1]:
                wordfound.append(word)
                if wordfound.count(word) < 2:
                    cnt2[word] += 1
        except IndexError:
            pass
for k, v in cnt2.items():
    print(k, v)

output

How many times this words are repeated in adjacent sentences:
had 1
hole 1
or 1
as 1
little 2
that 1
hot 1
large 1
it 5
to 5
a 6
not 3
and 2
s 1
me 1
bottle 1
is 1
no 1
the 6
how 1
Oh 1
she 2
at 1
marked 1
think 1
VERY 1
I 2
door 1
red 1
of 1
dear 1
see 1
could 2
in 2
so 1
was 1
poison 1
A 1
Alice 3
all 1
nice 1
rabbit 1

7 Comments

Cool feature, thanks! While doing this I wondered whether I can check sentence 1 and 2 to find repeating words, if there are none, check 2 and 3, 3 and 4 and so on. I know how to find similar words across the file, but I fail to come up with the steps to narrow the search to neighbouring sentences.
Can you explain better what you want to do? You want to know if there are similar words in 2 sentences that are adjacent or near?
yes, i want to know whether there are similar words in 2 sentences that are adjacent (and not 2 specific sentences, but all across the text, as if: it checks sentence 1 and 2, no similarities, it then checks 2-3 and so on. If there is, counter gets +1< and it goes on). Hope that sounds clearer
And you want to know if there are specific words that are similar or any words ... ?
I want to look for any similar words
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.