4

I am attempting to count the number of 'difficult words' in a file, which requires me to count the number of letters in each word. For now, I am only trying to get single words, one at a time, from a file. I've written the following:

file = open('infile.txt', 'r+')
fileinput = file.read()

for line in fileinput:
    for word in line.split():
        print(word)

Output:

t
h
e

o
r
i
g
i
n

.
.
.

It seems to be printing one character at a time instead of one word at a time. I'd really like to know more about what is actually happening here. Any suggestions?

1
  • 1
    Try to print each line and see what it does ;) Commented Nov 4, 2015 at 18:30

3 Answers 3

6

Use splitlines():

fopen = open('infile.txt', 'r+')
fileinput = fopen.read()

for line in fileinput.splitlines():
    for word in line.split():
        print(word)

fopen.close()

Without splitlines():

You can also use with statement to open the file. It closes the file automagically:

with open('infile.txt', 'r+') as fopen:
    for line in fopen:
        for word in line.split():
            print(word)
Sign up to request clarification or add additional context in comments.

1 Comment

This worked perfectly; thank you. I wasn't aware of that method.
3

A file supports the iteration protocol, which for bigger files is much better than reading the whole content in memory in one go

with open('infile.txt', 'r+') as f:
    for line in f:
        for word in line.split():
            print(word)

Assuming you are going to define a filter function, you could do something along the line

def is_difficult(word):
    return len(word)>5

with open('infile.txt', 'r+') as f:
    words = (w for line in f for w in line.split() if is_difficult(w))
    for w in words:
        print(w)

which, with an input file of

ciao come va
oggi meglio di domani
ieri peggio di oggi

produces

meglio
domani
peggio

3 Comments

Ah, I see. I am still fairly new to Python, so I'm picking up the proper way to write scripts as I go. Your example is very helpful; thank you.
you are welcome. BTW, why are you using mode r+ on the input file? Do you plan to write to it as well?
Initially, yes I did. But I think I'm going to use another file for output just to make things cleaner. I'll be sure to change it though.
0

Your code is giving you single characters because you called .read() which store all the content as a single string so when you for line in fileinput you are iterating over the string char by char, there is no good reason to use read and splitlines you as can simple iterate over the file object, if you did want a list of lines you would call readlines.

If you want to group words by length use a dict using the length of the word as the key, you will want to also remove punctuation from the words which you can do with str.strip:

def words(n, fle):
    from collections import defaultdict
    d = defaultdict(list)
    from string import punctuation
    with open(fle) as f:
        for line in f:
            for word in line.split():
                word = word.strip(punctuation)
                _len = len(word)
                if _len >= n:
                    d[_len].append(word)
    return d

Your dict will contain all the words in the file grouped by length and all at least n characters long.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.