Print specific lines of multiple files in Python

Question

I have 30 text files of 30 lines each. For some reason, I need to write a script that opens file 1, prints line 1 of file 1, closes it, opens file 2, prints line 2 of file 2, closes it, and so on. I tried this:

import glob

files = glob.glob('/Users/path/to/*/files.txt')             
for file in files:
    i = 0
    while i < 30:
        with open(file,'r') as f:
            for index, line in enumerate(f):
                if index == i:
                    print(line)
                    i += 1
                    f.close()
            continue

Obviously, I got the following error:

ValueError: I/O operation on closed file.

Because of the f.close() thing. How can I do to move from a file to the next one after reading only the desired line?

You can use break to exit a loop; replace f.close() with that. The continue at the bottom is also unnecessary, and the outer loop can be a for i in range(0, 30): (or i, file in enumerate(files)?) without explicitly incrementing i. — Ry-
– Ry- ♦, Commented Feb 15, 2017 at 2:40
Note following up on @Ryan: The f.close() isn't needed at all because you (correctly) used the with statement when opening the file, ensuring that it is automatically closed when you exit the block. — ShadowRanger
– ShadowRanger, Commented Feb 15, 2017 at 2:46
Side-note: You could remove the explicit inner loop entirely using itertools.islice. Replace the whole contents of the with block with print(next(itertools.islice(f, i, None))), no need for explicit looping of any kind. This requires @Ryan's suggested change of replacing the outer while loop with a for i, file in enumerate(files): (or to ensure you only process 30 files, for i, file in enumerate(islice(files, 30)):) so you're not manually tracking/incrementing i. — ShadowRanger
– ShadowRanger, Commented Feb 15, 2017 at 2:52

ShadowRanger · Accepted Answer · 2017-02-15 03:11:06Z

First off, to answer the question, as noted in the comments, your main problem is that you close the file then try to continue iterating it. The guilty code:

        for index, line in enumerate(f): # <-- Reads
            if index == i:
                print(line)
                i += 1
                f.close()                # <-- Closes when you get a hit
                                         # But loop is not terminated, so you'll loop again

The simplest fix is to just break instead of explicitly closing, since your with statement already guarantees deterministic closing when the block is exited:

        for index, line in enumerate(f):
            if index == i:
                print(line)
                i += 1
                break

But because this was fun, here's a significantly cleaned up bit of code to accomplish the same task:

import glob
from itertools import islice

# May as well use iglob since we'll stop processing at 30 files anyway    
files = glob.iglob('/Users/path/to/*/files.txt')

# Stop after no more than 30 files, use enumerate to track file num
for i, file in enumerate(islice(files, 30)):
    with open(file,'r') as f:
        # Skip the first i lines of the file, then print the next line
        print(next(islice(f, i, None)))

zwer · Accepted Answer · 2017-02-15 03:20:47Z

2

You can use the linecache module to get the line you need and save yourself a lot of headache:

import glob
import linecache

line = 1
for file in glob.glob('/Users/path/to/*/files.txt'):
    print(linecache.getline(file, line))
    line += 1
    if line > 30:  # if you really need to limit it to only 30
        break

edited Feb 15, 2017 at 3:20

answered Feb 15, 2017 at 2:49

zwer

25.9k3 gold badges53 silver badges70 bronze badges

6 Comments

ShadowRanger Over a year ago

Good suggestion, though I will note that linecache caches the whole file into memory to get a single line; this is usually not a problem for smallish files (e.g. the source files the module was originally designed for), particularly if you need to perform random access for multiple lines, but for arbitrary inputs, you can end up reading a GB file into memory (where the lines require far more than a GB of memory thanks to Python overhead) even if all you want is the first line of the file. It would also make sense to avoid manual line tracking, and just wrap the glob call in enumerate.

zwer Over a year ago

True, while very convenient linecache can eat up memory but I didn't get the notion that OP will have large files to deal with. One can always call clearcache() after dealing with it if access to the files is no longer required. And if access to really huge files is required, going through them line by line (the traditional way) would probably have horrible performance either - if that was the requirement I'd rather suggest using the mmap module and let the OS optimize access to the data.

partialcorrelations Over a year ago

Thanks! That worked perfectly, although I had to replace line 0 by line 1.

zwer Over a year ago

Ooops, forgot that linecache line index starts with 1. Fixed.

ShadowRanger Over a year ago

@zwer: Going through them line by line until you reach the target line would be fine if you're only accessing the first 30 lines or fewer; doesn't matter how large the file itself is, the time to read the first 30 lines is tied to the size of the first 30 lines, not the size of the file. There are line oriented uses for mmap, but it wouldn't help much here; you'd still need to scan for line breaks. You could skip an arbitrary number of bytes, then look for a nearby line, but w/o fixed length lines, that wouldn't get you a specific line number.

|

Dan Vanatta · Accepted Answer · 2017-02-15 02:45:21Z

0

I think something like this is what you want:

import glob

files = glob.glob('/Users/path/to/*/files.txt')             
for file in files:
    i = 0
    while i < 30:
        with open(file,'r') as f:
            for index, line in enumerate(f):
                if index == i:
                    print(line)
                    i += 1
                    break
        f.close()

Currently you are closing the file in the middle of the for loop and then trying to read it in again. So if you only close the file once you are out of the for loop it should be ok.

answered Feb 15, 2017 at 2:45

Dan Vanatta

805 bronze badges

Comments

9000 · Accepted Answer · 2017-02-15 03:47:02Z

Split your job into simpler steps, until the final step is trivial. Use functions.

Remember that a file object works as a sequence of lines.

def nth(n, sequence):
  for position, item in enumerate(sequence):
    if position == n:
      return item
  return None  # if the sequence ended before position n

def printNthLines(glob_pattern)
  # Note: sort file names; glob guarantees no order.
  filenames = sorted(glob.glob(glob_pattern))
  for position, filename in enumerate(filenames):
    with open(filename) as f:
      line = nth(position, f)  # Pick the n-th line.
      if line is not None:
        print(line)
      # IDK what to do if there's no n-th line in n-th file

printNthLines('path/to/*/file.txt')

Obviously we scan n-th file to n-th line, but this is inevitable, there's no way to get directly to n-th line in a plaintext file.

Collectives™ on Stack Overflow

Print specific lines of multiple files in Python

4 Answers 4

Comments

6 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

6 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related