Saving data to multiple csv files in pandas

Question

I have this data from a .gov site:

import pandas as pd
import io
import requests
url="https://download.bls.gov/pub/time.series/la/la.data.64.County"
s=requests.get(url).content
c=pd.read_csv(io.StringIO(s.decode('utf-8')))

The number of rows is 4942096. I want to get all these into multiple csv files.

I know how to get the first million as so:

c.to_csv('nick.csv', index = False, chunksize = 1000000)

How do I get the rest?

Umar.H · Accepted Answer · 2019-09-17 11:31:18Z

5

you can loop through the file and save it as so :

filename = io.StringIO(s.decode('utf-8'))
# ^ not tested this but assuming it would work for readability sake. 

chunk_size = 10 ** 6
for chunk in pd.read_csv(filename, chunksize=chunk_size):
    chunk.to_csv('nick.csv.gz',compression='gzip',index=False)

you'll need to add some sort of naming convention otherwise you will write over the file. I've also added in the gzip compression which significantly speeds up write times.

i'd just add a counter personally

chunk_size = 10 ** 6
counter = 0
for chunk in pd.read_csv(filename, chunksize=chunk_size):
    counter = counter + 1
    chunk.to_csv(f'nick_{str(counter)}.csv.gz',compression='gzip',index=False)

edited Sep 17, 2019 at 11:31

answered Sep 17, 2019 at 0:00

Umar.H

23.1k8 gold badges50 silver badges94 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Saving data to multiple csv files in pandas

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related