How to divide a csv file into smaller files based on row count in pandas (including header row in each file)? [duplicate]

Question

I have a .csv file with over 50k rows. I would like to divide it into smaller chunks and save as separate .csv files. Not sure if pandas are best approach here (if not I'm open for any suggestions).

My goal: Read file, identify number of existing rows in dataframe, divide dataframe into chunks (3000 rows each file including the header row, save as separate .csv files)

My code so far:

import os
import pandas as pd

i = 0
while os.path.exists("output/path/chunk%s.csv" % i):
    i += 1

size = 3000
df = pd.read_csv('/input/path/input.csv')
list_of_dfs = [df.loc[i:i+size-1,:] for i in range(0, len(df),size)]


for x in list_of_dfs:
    x.to_csv('/output/path/chunk%s.csv' % i, index=False)

the above code didn't throw any error, but created only one file ('chunk0.csv') with 1439 rows instead of 3000.

Could someone help me with this? thanks in advance!

jezrael · Accepted Answer · 2020-10-22 10:33:16Z

3

Use DataFrame.groupby with pass integer division of index values by size, loop and write to files with f for f-strings:

size = 3000
df = pd.read_csv('/input/path/input.csv')

for i, g in df.groupby(df.index // size):
    g.to_csv(f'/output/path/chunk{i}.csv', index=False)

answered Oct 22, 2020 at 10:33

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jaouena · Accepted Answer · 2020-10-22 10:44:01Z

2

You may be interested in pd.read_csv chunksize parameter

You can use it this way :

size = 3000
filename = '/input/path/input.csv'
for i, chunk in enumerate(pd.read_csv(filename, chunksize=size)):
    chunk.to_csv(f"output/path/chunk{i}.csv", index=False)

answered Oct 22, 2020 at 10:44

jaouena

986 bronze badges

Collectives™ on Stack Overflow

How to divide a csv file into smaller files based on row count in pandas (including header row in each file)? [duplicate]

2 Answers 2

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Linked

Related