0

I would like to count the occurences of missings of every line in a txt file.

foo.txt file:

1 1 1 1 1 NA    # so, Missings: 1
1 1 1 NA 1 1    # so, Missings: 1
1 1 NA 1 1 NA   # so, Missings: 2  

But I would also like to obtain the amount of elements for the first line (assuming this is equal for all lines).

miss = []
with open("foo.txt") as f:
    for line in f:
        miss.append(line.count("NA"))

>>> miss
[1, 1, 2]         # correct

The problem is when I try to identify the amount of elements. I did this with the following code:

miss = []
with open("foo.txt") as f:
    first_line = f.readline()
    elements = first_line.count(" ")  # given that values are separated by space
    for line in f:
        miss.append(line.count("NA"))

>>> (elements + 1)
6   # True, this is correct          
>>> miss 
[1,2]  # misses the first item due to readline() removing lines.`

How can I read the first line once without removing it for the further operation?

1
  • Premature optimization is the root of all evil. Just calculate the length for each line inside the loop: for line in f: ... elements = len(line.split()). Commented Jun 3, 2013 at 9:33

3 Answers 3

2

Try f.seek(0). This will reset the file handle to the beginning of the file.

Complete example would then be:

miss = []
with open("foo.txt") as f:
    first_line = f.readline()
    elements = first_line.count(" ")  # given that values are separated by space
    f.seek(0)
    for line in f:
        miss.append(line.count("NA"))

Even better would be to read all lines, even the first line, only once, and checking for number of elements only once:

miss = []
elements = None
with open("foo.txt") as f:
    for line in f:
        if elements is None:
            elements = line.count(" ")  # given that values are separated by space
        miss.append(line.count("NA"))

BTW: wouldn't the number of elements be line.count(" ") + 1?

I'd recommend using len(line.split()), as this also handles tabs, double spaces, leading/trailing spaces etc.

Sign up to request clarification or add additional context in comments.

Comments

2

Provided all lines have the number of items you can just count items in the last line:

miss = []
with open("foo.txt") as f:
    for line in f:
        miss.append(line.count("NA")
    elements = len(line.split())

A better way to count is probably:

elements = len(line.split())  

because this also counts items separated with multiple spaces or tabs.

2 Comments

Note that .count(" ") will be off by 1, so len(split) is the only correct one.
Thanks. Yes. That is the way I would do it. In addition, often there are more than one space or tabs in between items. Deleted the OP version.
0

You can also just treat the first line separately

with open("foo.txt") as f:
    first_line = next(f1)
    elements = first_line.count(" ")  # given that values are separated by space
    miss = [first_line.count("NA")]
    for line in f:
        miss.append(line.count("NA")

1 Comment

What exactly is next then?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.