Trying to count words in a file using Python

Question

I am attempting to count the number of 'difficult words' in a file, which requires me to count the number of letters in each word. For now, I am only trying to get single words, one at a time, from a file. I've written the following:

file = open('infile.txt', 'r+')
fileinput = file.read()

for line in fileinput:
    for word in line.split():
        print(word)

Output:

t
h
e

o
r
i
g
i
n

.
.
.

It seems to be printing one character at a time instead of one word at a time. I'd really like to know more about what is actually happening here. Any suggestions?

Try to print each line and see what it does ;)

Nir Alfasi
– Nir Alfasi

2015-11-04 18:30:22 +00:00
Commented Nov 4, 2015 at 18:30 — Nir Alfasi
– Nir Alfasi, Commented Nov 4, 2015 at 18:30

Andrés Pérez-Albela H. · Accepted Answer · 2015-11-04 19:04:02Z

6

Use splitlines():

fopen = open('infile.txt', 'r+')
fileinput = fopen.read()

for line in fileinput.splitlines():
    for word in line.split():
        print(word)

fopen.close()

Without splitlines():

You can also use with statement to open the file. It closes the file automagically:

with open('infile.txt', 'r+') as fopen:
    for line in fopen:
        for word in line.split():
            print(word)

edited Nov 4, 2015 at 19:04

answered Nov 4, 2015 at 18:29

Andrés Pérez-Albela H.

4,0211 gold badge21 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

AustinC Over a year ago

This worked perfectly; thank you. I wasn't aware of that method.

Pynchia · Accepted Answer · 2015-11-05 06:33:03Z

3

A file supports the iteration protocol, which for bigger files is much better than reading the whole content in memory in one go

with open('infile.txt', 'r+') as f:
    for line in f:
        for word in line.split():
            print(word)

Assuming you are going to define a filter function, you could do something along the line

def is_difficult(word):
    return len(word)>5

with open('infile.txt', 'r+') as f:
    words = (w for line in f for w in line.split() if is_difficult(w))
    for w in words:
        print(w)

which, with an input file of

ciao come va
oggi meglio di domani
ieri peggio di oggi

produces

meglio
domani
peggio

edited Nov 5, 2015 at 6:33

answered Nov 4, 2015 at 18:40

Pynchia

11.7k5 gold badges38 silver badges49 bronze badges

3 Comments

AustinC Over a year ago

Ah, I see. I am still fairly new to Python, so I'm picking up the proper way to write scripts as I go. Your example is very helpful; thank you.

Pynchia Over a year ago

you are welcome. BTW, why are you using mode r+ on the input file? Do you plan to write to it as well?

AustinC Over a year ago

Initially, yes I did. But I think I'm going to use another file for output just to make things cleaner. I'll be sure to change it though.

Padraic Cunningham · Accepted Answer · 2015-11-30 18:50:09Z

Your code is giving you single characters because you called .read() which store all the content as a single string so when you for line in fileinput you are iterating over the string char by char, there is no good reason to use read and splitlines you as can simple iterate over the file object, if you did want a list of lines you would call readlines.

If you want to group words by length use a dict using the length of the word as the key, you will want to also remove punctuation from the words which you can do with str.strip:

def words(n, fle):
    from collections import defaultdict
    d = defaultdict(list)
    from string import punctuation
    with open(fle) as f:
        for line in f:
            for word in line.split():
                word = word.strip(punctuation)
                _len = len(word)
                if _len >= n:
                    d[_len].append(word)
    return d

Your dict will contain all the words in the file grouped by length and all at least n characters long.

Collectives™ on Stack Overflow

Trying to count words in a file using Python

3 Answers 3

1 Comment

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related