1

I have a .csv file with over 50k rows. I would like to divide it into smaller chunks and save as separate .csv files. Not sure if pandas are best approach here (if not I'm open for any suggestions).

My goal: Read file, identify number of existing rows in dataframe, divide dataframe into chunks (3000 rows each file including the header row, save as separate .csv files)

My code so far:

import os
import pandas as pd

i = 0
while os.path.exists("output/path/chunk%s.csv" % i):
    i += 1

size = 3000
df = pd.read_csv('/input/path/input.csv')
list_of_dfs = [df.loc[i:i+size-1,:] for i in range(0, len(df),size)]


for x in list_of_dfs:
    x.to_csv('/output/path/chunk%s.csv' % i, index=False)

the above code didn't throw any error, but created only one file ('chunk0.csv') with 1439 rows instead of 3000.

Could someone help me with this? thanks in advance!

0

2 Answers 2

3

Use DataFrame.groupby with pass integer division of index values by size, loop and write to files with f for f-strings:

size = 3000
df = pd.read_csv('/input/path/input.csv')

for i, g in df.groupby(df.index // size):
    g.to_csv(f'/output/path/chunk{i}.csv', index=False)
Sign up to request clarification or add additional context in comments.

Comments

2

You may be interested in pd.read_csv chunksize parameter

You can use it this way :

size = 3000
filename = '/input/path/input.csv'
for i, chunk in enumerate(pd.read_csv(filename, chunksize=size)):
    chunk.to_csv(f"output/path/chunk{i}.csv", index=False)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.