87

I want to skip the first 17 lines while reading a text file.

Let's say the file looks like:

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
good stuff

I just want the good stuff. What I'm doing is a lot more complicated, but this is the part I'm having trouble with.

1

9 Answers 9

173

Use a slice, like below:

with open('yourfile.txt') as f:
    lines_after_17 = f.readlines()[17:]

If the file is too big to load in memory:

with open('yourfile.txt') as f:
    for _ in range(17):
        next(f)
    for line in f:
        # do stuff
Sign up to request clarification or add additional context in comments.

5 Comments

I use the second solutions to read ten lines at the end of a file with 8 million (8e6) lines and it takes ~22 seconds. Is this still the preferred (=fastest) way for such long files (~250 MB)?
I would use tail for that.
@wim: I guess, tail doesn't work on Windows. Furthermore I don't always want to read the last 10 lines. I want to be able to read some lines in the middle. (e.g. if I read 10 lines after ~4e6 lines in the same file it takes still half of that time, ~11 seconds)
The thing is, you need to read the entire content before line number ~4e6 in order to know where the line separator bytes are located, otherwise you don't know how many lines you've passed. There's no way to magically jump to a line number. ~250 MB should be OK to read entire file to memory though, that's not particularly big data.
@riddleculous see stackoverflow.com/q/3346430/2491761 for getting last lines
50

Use itertools.islice, starting at index 17. It will automatically skip the 17 first lines.

import itertools
with open('file.txt') as f:
    for line in itertools.islice(f, 17, None):  # start=17, stop=None
        # process lines

2 Comments

Is this feasible for large text files that may not fit in the memory? That is, does itertools.islice load the entire file into the memory? I couldn't find this in the documentation.
@AdityaHarikrish - all functions within itertools are iterators, which only consumes memory as the object is read - i.e. the whole file is not read into memory, only one line at a time. For the example provided, the only memory that will be allocated is the data required to read in the line content. Saving that line content is another matter entirely.
4
for line in itertools.dropwhile(isBadLine, lines):
    # process as you see fit

Full demo:

from itertools import *

def isBadLine(line):
    return line=='0'

with open(...) as f:
    for line in dropwhile(isBadLine, f):
        # process as you see fit

Advantages: This is easily extensible to cases where your prefix lines are more complicated than "0" (but not interdependent).

1 Comment

Nice idea. Keeps it clean.
3

If you don't want to read the whole file into memory at once, you can use a few tricks:

With next(iterator) you can advance to the next line:

with open("filename.txt") as f:
     next(f)
     next(f)
     next(f)
     for line in f:
         print(f)

Of course, this is slighly ugly, so itertools has a better way of doing this:

from itertools import islice

with open("filename.txt") as f:
    # start at line 17 and never stop (None), until the end
    for line in islice(f, 17, None):
         print(f)

Comments

3

Here are the timeit results for the top 2 answers. Note that "file.txt" is a text file containing 100,000+ lines of random string with a file size of 1MB+.

Using itertools:

import itertools
from timeit import timeit

timeit("""with open("file.txt", "r") as fo:
    for line in itertools.islice(fo, 90000, None):
        line.strip()""", number=100)

>>> 1.604976346003241

Using two for loops:

from timeit import timeit

timeit("""with open("file.txt", "r") as fo:
    for i in range(90000):
        next(fo)
    for j in fo:
        j.strip()""", number=100)

>>> 2.427317383000627

clearly the itertools method is more efficient when dealing with large files.

Comments

0

This solution helped me to skip the number of lines specified by the linetostart variable. You get the index (int) and the line (string) if you want to keep track of those too. In your case, you substitute linetostart with 18, or assign 18 to linetostart variable.

f = open("file.txt", 'r')
for i, line in enumerate(f, linetostart):
    #Your code

1 Comment

This won’t actually skip lines, it will just offset the enumerate counter.
-1

You can use a List-Comprehension to make it a one-liner:

[fl.readline() for i in xrange(17)]

More about list comprehension in PEP 202 and in the Python documentation.

4 Comments

doesn't make much sense to store those lines in a list which will just get garbage collected.
@wim: The memory overhead is trivial (and probably unavoidable nomatter which way you do it, since you will need to do O(n) processing of those lines unless you skip to an arbitrary point in the file); I just don't think it's very readable.
I agree with @wim, if you are throwing away the result, use a loop. The whole point of a list comprehension is that you meant to store the list; you can just as easily fit a for loop on one line.
or use a generator in a 0-memory deque.
-1

Here is a method to get lines between two line numbers in a file:

import sys

def file_line(name,start=1,end=sys.maxint):
    lc=0
    with open(s) as f:
        for line in f:
            lc+=1
            if lc>=start and lc<=end:
                yield line


s='/usr/share/dict/words'
l1=list(file_line(s,235880))
l2=list(file_line(s,1,10))
print l1
print l2

Output:

['Zyrian\n', 'Zyryan\n', 'zythem\n', 'Zythia\n', 'zythum\n', 'Zyzomys\n', 'Zyzzogeton\n']
['A\n', 'a\n', 'aa\n', 'aal\n', 'aalii\n', 'aam\n', 'Aani\n', 'aardvark\n', 'aardwolf\n', 'Aaron\n']

Just call it with one parameter to get from line n -> EOF

Comments

-1

If it's a table.

pd.read_table("path/to/file", sep="\t", index_col=0, skiprows=17)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.