0

I have the following script in Python that is meant to find words with two or more vowels in them and output the result to a txt file. The script currently runs, but the output file is empty. I have tried several different methods to no avail, any idea why the output file is blank? I am using the (re) import to treat the input as a regular expression.

#!C:\Python33\python.exe

import re

file = open("Text of Steve Jobs' Commencement address (2005).htm");
output = open('twoVoweledWordList.txt', 'w');

for word in file.read():
   if len(re.findall('[aeiouy]', word)) >= 2:
      match == True;
      while True :
        output.write(word, '\n');

        file.close()
        output.close()
8
  • 1
    file.read() reads one character at a time and you are looking it up for two vowels. Commented Oct 29, 2013 at 1:22
  • 3
    the while True will cause an infinite loop. Watch out! Commented Oct 29, 2013 at 1:23
  • That makes sense! What would be a better way to read each word in at a time? Commented Oct 29, 2013 at 1:23
  • match == True is a comparison, not an assignment. Also, in Python you don't need a semicolon on the end of any line. Commented Oct 29, 2013 at 1:23
  • 1
    There is no need to loop or set a match flag or anything else like that. if len(re.findall('[aeiouy]', word)) >= 2 is already exactly the condition under which we want to write the word to the output file, and we want to write that given word exactly once. Commented Oct 29, 2013 at 1:52

3 Answers 3

5

You asked for a better way to read a word at a time. Here you go:

with open(input_file_name, "rt") as f:
    for line in f:
        for word in line.split():
            # do something with each word here

Comments:

  • In general I try to avoid using built-in Python features as variable names. Since file is a built-in in Python 2.x, syntax-coloring text editors will flag it in a different color... might as well just use f for the variable name.
  • It's best to use the with statement. It is very clear, and in all versions of Python it makes sure your file is properly closed when you are done. (Here it won't matter, but it's really a best practice.)
  • open() returns an object that you can use in a for loop. You will get one line of input from the file at a time.
  • line.split() splits the line into words, using any "white space" (spaces, tabs, etc.)

I don't know if you have seen generator functions yet, but you can wrap up the above doubly-nested for loops into a generator function like this:

def words(f):
    for line in f:
        for word in line.split():
            yield word

with open(input_file_name, "rt") as f:
    for word in words(f):
        # do something with word

I like hiding the machinery like this. And if you ever needed to make the word-splitting more complicated, the complex part is nicely separated from the part that actually handles the words.

Sign up to request clarification or add additional context in comments.

3 Comments

I implemented the function as you listed above and my output is the entire html file. I am using if len(re.findall('aeiou]', word)) >= 2 output.write(word + '\n'
Double-check your regex :)
Found out the issue, it wasn't overwriting the old file :) Less of a code issue more of a dumb human one, thanks for the help!
1

When you use with statement you dont have to worry about closing the file explicitly. And y is not a vowel, I believe. So, I removed it from my answer.

import re

with open("Input.txt") as inputFile, open("Output.txt", "w") as output:
    for line in inputFile:
        for word in line.split():
            if len(re.findall('[aeiou]', word)) >= 2:
                output.write(word + '\n')

Comments

0

While steveha says it nicely, just in case you like for loops better :-

import re

file = open("Text of Steve Jobs' Commencement address (2005).htm")
output = open('twoVoweledWordList.txt', 'w')

for line in file:
    for word in line.split():
       if len(re.findall('[aeiouy]', word)) >= 2:
          output.write(word + '\n')

2 Comments

I recommend re-writing the first for loop as simply: for line in file: The file.readlines() method function will read the entire file into memory, but we only need one line at a time. Simply using the opened file object as an iterator will read one line at a time. This won't matter much for small files, but what if the file was 10 GB of data? Then it would matter a lot.
Thanks. I think it is best done the way in your answer. I just put it as an alternative and because I had written it all up :p

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.