1

My code is behaving strangely, and I have a feeling it has to do with the regular expressions i'm using.

I'm trying to determine the number of total words, number of unique words, and number of sentences in a text file.

Here is my code:

import sys
import re

file = open('sample.txt', 'r')


def word_count(file):
    words = []
    reg_ex = r"[A-Za-z0-9']+"
    p = re.compile(reg_ex)
    for l in file:
        for i in p.findall(l):
            words.append(i)
    return len(words), len(set(words))

def sentence_count(file):
    sentences = []
    reg_ex = r'[a-zA-Z0-9][.!?]'
    p = re.compile(reg_ex)
    for l in file: 
        for i in p.findall(l):
            sentences.append(i)
    return sentences, len(sentences)

sentence, sentence_count = sentence_count(file)
word_count, unique_word_count = word_count(file)

print('Total word count:  {}\n'.format(word_count) + 
    'Unique words:  {}\n'.format(unique_word_count) + 
'Sentences:  {}'.format(sentence_count))

The output is the following:

Total word count:  0
Unique words:  0
Sentences:  5

What is really strange is that if I comment out the sentence_count() function, the word_count() function starts working and outputs the correct numbers.

Why is this inconsistency happening? If I comment out either function, one will output the correct value while the other will output 0's. Can someone help me such that both functions work?

1
  • Add contents = file.read() and pass contents to your methods. Commented Jul 31, 2018 at 19:43

2 Answers 2

1

The issue is that you can only iterate over an open file once. You need to either reopen or rewind the file to iterate over it again.

For example:

with open('sample.txt', 'r') as f:
  sentence, sentence_count = sentence_count(f)
with open('sample.txt', 'r') as f:
  word_count, unique_word_count = word_count(f)

Alternatively, f.seek(0) would rewind the file.

Sign up to request clarification or add additional context in comments.

1 Comment

thanks! I definitely didn't know that you can only iterate over an open file once -- really helpful to know.
0

Make sure to open and close your file properly. One way you can do this is by saving all the text first.

with open('sample.txt', 'r') as f:
    file = f.read()

The with statement can be used to open and safely close the file handle. Since you would have extracted all the contents into file, you don't need the file open anymore.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.