I have a csv file with many millions of rows. I want to start iterating from the 10,000,000 row. At the moment I have the code:
with open(csv_file, encoding='UTF-8') as f:
r = csv.reader(f)
for row_number, row in enumerate(r):
if row_number < 10000000:
continue
else:
process_row(row)
This works, however take several seconds to run before the rows of interest appear. Presumably all the unrequired rows are loaded into python unnecessarily, slowing it down. Is there a way of starting the iteration process on a certain row - i.e. without the start of the data read in.
tailto skip the first N lines and pipe that to your python script?newline=''to theopencall; thecsvmodule expects you to leave newline interpolation to it, you don't wantopenperforming line ending conversions.