3

I have 30 text files of 30 lines each. For some reason, I need to write a script that opens file 1, prints line 1 of file 1, closes it, opens file 2, prints line 2 of file 2, closes it, and so on. I tried this:

import glob

files = glob.glob('/Users/path/to/*/files.txt')             
for file in files:
    i = 0
    while i < 30:
        with open(file,'r') as f:
            for index, line in enumerate(f):
                if index == i:
                    print(line)
                    i += 1
                    f.close()
            continue 

Obviously, I got the following error:

ValueError: I/O operation on closed file.

Because of the f.close() thing. How can I do to move from a file to the next one after reading only the desired line?

3
  • 3
    You can use break to exit a loop; replace f.close() with that. The continue at the bottom is also unnecessary, and the outer loop can be a for i in range(0, 30): (or i, file in enumerate(files)?) without explicitly incrementing i. Commented Feb 15, 2017 at 2:40
  • 1
    Note following up on @Ryan: The f.close() isn't needed at all because you (correctly) used the with statement when opening the file, ensuring that it is automatically closed when you exit the block. Commented Feb 15, 2017 at 2:46
  • Side-note: You could remove the explicit inner loop entirely using itertools.islice. Replace the whole contents of the with block with print(next(itertools.islice(f, i, None))), no need for explicit looping of any kind. This requires @Ryan's suggested change of replacing the outer while loop with a for i, file in enumerate(files): (or to ensure you only process 30 files, for i, file in enumerate(islice(files, 30)):) so you're not manually tracking/incrementing i. Commented Feb 15, 2017 at 2:52

4 Answers 4

6

First off, to answer the question, as noted in the comments, your main problem is that you close the file then try to continue iterating it. The guilty code:

        for index, line in enumerate(f): # <-- Reads
            if index == i:
                print(line)
                i += 1
                f.close()                # <-- Closes when you get a hit
                                         # But loop is not terminated, so you'll loop again

The simplest fix is to just break instead of explicitly closing, since your with statement already guarantees deterministic closing when the block is exited:

        for index, line in enumerate(f):
            if index == i:
                print(line)
                i += 1
                break

But because this was fun, here's a significantly cleaned up bit of code to accomplish the same task:

import glob
from itertools import islice

# May as well use iglob since we'll stop processing at 30 files anyway    
files = glob.iglob('/Users/path/to/*/files.txt')

# Stop after no more than 30 files, use enumerate to track file num
for i, file in enumerate(islice(files, 30)):
    with open(file,'r') as f:
        # Skip the first i lines of the file, then print the next line
        print(next(islice(f, i, None)))
Sign up to request clarification or add additional context in comments.

Comments

2

You can use the linecache module to get the line you need and save yourself a lot of headache:

import glob
import linecache

line = 1
for file in glob.glob('/Users/path/to/*/files.txt'):
    print(linecache.getline(file, line))
    line += 1
    if line > 30:  # if you really need to limit it to only 30
        break

6 Comments

Good suggestion, though I will note that linecache caches the whole file into memory to get a single line; this is usually not a problem for smallish files (e.g. the source files the module was originally designed for), particularly if you need to perform random access for multiple lines, but for arbitrary inputs, you can end up reading a GB file into memory (where the lines require far more than a GB of memory thanks to Python overhead) even if all you want is the first line of the file. It would also make sense to avoid manual line tracking, and just wrap the glob call in enumerate.
True, while very convenient linecache can eat up memory but I didn't get the notion that OP will have large files to deal with. One can always call clearcache() after dealing with it if access to the files is no longer required. And if access to really huge files is required, going through them line by line (the traditional way) would probably have horrible performance either - if that was the requirement I'd rather suggest using the mmap module and let the OS optimize access to the data.
Thanks! That worked perfectly, although I had to replace line 0 by line 1.
Ooops, forgot that linecache line index starts with 1. Fixed.
@zwer: Going through them line by line until you reach the target line would be fine if you're only accessing the first 30 lines or fewer; doesn't matter how large the file itself is, the time to read the first 30 lines is tied to the size of the first 30 lines, not the size of the file. There are line oriented uses for mmap, but it wouldn't help much here; you'd still need to scan for line breaks. You could skip an arbitrary number of bytes, then look for a nearby line, but w/o fixed length lines, that wouldn't get you a specific line number.
|
0

I think something like this is what you want:

import glob

files = glob.glob('/Users/path/to/*/files.txt')             
for file in files:
    i = 0
    while i < 30:
        with open(file,'r') as f:
            for index, line in enumerate(f):
                if index == i:
                    print(line)
                    i += 1
                    break
        f.close()

Currently you are closing the file in the middle of the for loop and then trying to read it in again. So if you only close the file once you are out of the for loop it should be ok.

Comments

0

Split your job into simpler steps, until the final step is trivial. Use functions.

Remember that a file object works as a sequence of lines.

def nth(n, sequence):
  for position, item in enumerate(sequence):
    if position == n:
      return item
  return None  # if the sequence ended before position n

def printNthLines(glob_pattern)
  # Note: sort file names; glob guarantees no order.
  filenames = sorted(glob.glob(glob_pattern))
  for position, filename in enumerate(filenames):
    with open(filename) as f:
      line = nth(position, f)  # Pick the n-th line.
      if line is not None:
        print(line)
      # IDK what to do if there's no n-th line in n-th file

printNthLines('path/to/*/file.txt')

Obviously we scan n-th file to n-th line, but this is inevitable, there's no way to get directly to n-th line in a plaintext file.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.