pandas' skiprows speed/efficiency [closed]

Question

Closed. This question needs details or clarity. It is not currently accepting answers.

Want to improve this question? As written, this question is lacking some of the information it needs to be answered. If the author adds details in comments, consider editing them into the question. Once there's sufficient detail to answer, vote to reopen the question.

Closed last year.

Improve this question

I've got huge csv files and a few thousands of files (each file running into Gbs and some running into Mbs). However, my interest is only the last n rows (say 50 records) of each of these files. My question is a general one about speed and efficiency: would it be faster if I read_csv all files using skiprows, or slower, or would it make no difference in terms of speed, thanks?

Why ask such a question when you can time it yourself and see how it works? — NotAName
– NotAName, Commented Jul 19, 2024 at 2:43

Gates Wang · Accepted Answer · 2024-07-19 02:42:24Z

1

You can use the timeit module to measure how long your code takes to run. It looks like read_csv() is slightly faster if you use skiprows.

import timeit
import pandas as pd

def test():
    df = pd.read_csv('large.csv')

def test2():
    df = pd.read_csv('large.csv', skiprows=range(0,10000))

if __name__ == "__main__":
    print(timeit.timeit("test()",  globals=globals(), number=500))
    print(timeit.timeit("test2()",  globals=globals(), number=500))

# iterations	without skiprows	with skiprows
100	4.880708541997592	4.318660000004456
500	23.931738541999948	21.48539920800249

answered Jul 19, 2024 at 2:42

Gates Wang

1201 silver badge6 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

JubG Over a year ago

Marginally faster. Thanks, Gates. Just curious - would you also know how skiprows actually works under the hood? i.e., the entire file is read and the unnecessary data removed, or the read itself picks up the relevant data straight-off?

mozway Over a year ago

Since you need to parse CSV, you can't magically jump to the end of the file. This is parsing the whole file.

Collectives™ on Stack Overflow

pandas' skiprows speed/efficiency [closed]

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related