Read specific csv file from zip using pandas

Question

Here is a data I am interested in.

http://fenixservices.fao.org/faostat/static/bulkdownloads/Production_Crops_E_All_Data.zip

It consists of 3 files:

I want to download zip with pandas and create DataFrame from 1 file called Production_Crops_E_All_Data.csv

import pandas as pd
url="http://fenixservices.fao.org/faostat/static/bulkdownloads/Production_Crops_E_All_Data.zip"
df=pd.read_csv(url)

Pandas can download files, it can work with zips and of course it can work with csv files. But how can I work with 1 specific file in archive with many files?

Now I get error

ValueError: ('Multiple files found in compressed zip file %s)

This post doesn't answer my question bcause I have multiple files in 1 zip Read a zipped file as a pandas DataFrame

Does this answer your question? Read a zipped file as a pandas DataFrame — Pasindu Gamarachchi
– Pasindu Gamarachchi, Commented Jul 6, 2020 at 8:49
@Pasindu Gamarachchi no, the link you pointed to works well when the zip file contains only a single file, but the OP is talking about multiple files contained in a single zip file. — learnToCode
– learnToCode, Commented Aug 30, 2021 at 13:20

lytseeker · Accepted Answer · 2022-11-17 06:19:32Z

4

From this link

try this

from zipfile import ZipFile
import io
from urllib.request import urlopen
import pandas as pd

r = urlopen("http://fenixservices.fao.org/faostat/static/bulkdownloads/Production_Crops_E_All_Data.zip").read()
file = ZipFile(io.BytesIO(r))
data_df = pd.read_csv(file.open("Production_Crops_E_All_Data.csv"), encoding='latin1')
data_df_noflags = pd.read_csv(file.open("Production_Crops_E_All_Data_NOFLAG.csv"), encoding='latin1')
data_df_flags = pd.read_csv(file.open("Production_Crops_E_Flags.csv"), encoding='latin1')

Hope this helps! EDIT: updated for python3 StringIO to io.StringIO

EDIT: updated the import of urllib, changed usage of StringIO to BytesIO. Also your CSV files are not utf-8 encoding, I tried latin1 and that worked.

edited Nov 17, 2022 at 6:19

answered Jul 6, 2020 at 9:18

lytseeker

3045 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Igor K. Over a year ago

import urllib should be edited to import urllib.request.

Igor K. Over a year ago

file = ZipFile(io.StringIO(r)) traceback: TypeError: initial_value must be str or None, not bytes

Igor K. Over a year ago

Thank you for updating the post but error 'TypeError: initial_value must be str or None, not bytes' still exists. //// .read().decode('utf8') doesn't help

lytseeker Over a year ago

Hey @IgorK.Updated the answer to fix that, please use BytesIO instead of StringIO Cheers!

sammywemmy · Accepted Answer · 2020-07-06 09:45:46Z

1

You could use python's datatable, which is a reimplementation of Rdatatable in python.

Read in data :

from datatable import fread

#The exact file to be extracted is known, simply append it to the zip name:
 url = "Production_Crops_E_All_Data.zip/Production_Crops_E_All_Data.csv"

 df = fread(url)

#convert to pandas

 df.to_pandas()

You can equally work within datatable; do note however, that it is not as feature-rich as Pandas; but it is a powerful and very fast tool.

Update: You can use the zipfile module as well :

from zipfile import ZipFile
from io import BytesIO

with ZipFile(url) as myzip:
    with myzip.open("Production_Crops_E_All_Data.csv") as myfile:
        data = myfile.read()

#read data into pandas
#had to toy a bit with the encoding,
#thankfully it is a known issue on SO
#https://stackoverflow.com/a/51843284/7175713
df = pd.read_csv(BytesIO(data), encoding="iso-8859-1", low_memory=False)

edited Jul 6, 2020 at 9:45

answered Jul 6, 2020 at 9:17

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

1 Comment

Igor K. Over a year ago

for pity I cannot install this library using pip install. "SystemExit: Suitable C++ compiler cannot be determined. Please specify a compiler executable in the CXX environment variable."

Collectives™ on Stack Overflow

Read specific csv file from zip using pandas

2 Answers 2

4 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related