63

I'm trying to get data from a zipped csv file. Is there a way to do this without unzipping the whole files? If not, how can I unzip the files and read them efficiently?

1

10 Answers 10

100

I used the zipfile module to import the ZIP directly to pandas dataframe. Let's say the file name is "intfile" and it's in .zip named "THEZIPFILE":

import pandas as pd
import zipfile

zf = zipfile.ZipFile('C:/Users/Desktop/THEZIPFILE.zip') 
df = pd.read_csv(zf.open('intfile.csv'))
Sign up to request clarification or add additional context in comments.

Comments

57

If you aren't using Pandas it can be done entirely with the standard lib. Here is Python 3.7 code:

import csv
from io import TextIOWrapper
from zipfile import ZipFile

with ZipFile('yourfile.zip') as zf:
    with zf.open('your_csv_inside_zip.csv', 'r') as infile:
        reader = csv.reader(TextIOWrapper(infile, 'utf-8'))
        for row in reader:
            # process the CSV here
            print(row)

2 Comments

I tried doing this not realizing that I needed io.TextIOWrapper. How could I have known?
@KenIngram ZipFile.open() give a zipfile.ZipExtFile object. The built-in function open() function returns a _io.TextIOWrapper object directly
36

A quick solution can be using below code!

import pandas as pd

#pandas support zip file reads
df = pd.read_csv("/path/to/file.csv.zip")

1 Comment

Outstanding answer! I check that using this same solution without the ".csv" extension also works: df = pd.read_csv("/path/to/file.zip")
9

zipfile also supports the with statement.

So adding onto yaron's answer of using pandas:

with zipfile.ZipFile('file.zip') as myZip:
    with myZip.open('file.csv') as myZipCsv:
        df = pd.read_csv(myZipCsv) 

1 Comment

Does this code create a file somewhere? of it is in-memory streaming?
9

Thought Yaron had the best answer but thought I would add a code that iterated through multiple files inside a zip folder. It will then append the results:

import os
import pandas as pd
import zipfile

curDir = os.getcwd()
zf = zipfile.ZipFile(curDir + '/targetfolder.zip')
text_files = zf.infolist()
list_ = []

print ("Uncompressing and reading data... ")

for text_file in text_files:
    print(text_file.filename)
    df = pd.read_csv(zf.open(text_file.filename))
    # do df manipulations
    list_.append(df)

df = pd.concat(list_)

Comments

5

Yes. You want the module 'zipfile'

You open the zip file itself with zipfile.ZipInfo([filename[, date_time]])

You can then use ZipFile.infolist() to enumerate each file within the zip, and extract it with ZipFile.open(name[, mode[, pwd]])

Comments

5

this is the simplest thing I always use.

import pandas as pd
df = pd.read_csv("Train.zip",compression='zip')

Comments

4

Supposing you are downloading a zip file that contains a CSV and you don't want to use temporary storage. Here is what a sample implementation looks like:

#!/usr/bin/env python3

from csv import DictReader
from io import TextIOWrapper, BytesIO
from zipfile import ZipFile

import requests

def all_tickers():
    url = "https://simfin.com/api/bulk/bulk.php?dataset=industries&variant=null"
    r = requests.get(url)
    zip_ref = ZipFile(BytesIO(r.content))
    for name in zip_ref.namelist():
        print(name)
        with zip_ref.open(name) as file_contents:
            reader = DictReader(TextIOWrapper(file_contents, 'utf-8'), delimiter=';')
            for item in reader:
                print(item)

This takes care of all python3 bytes/str issues.

1 Comment

This is one of those answers which handles in-memory zips. None other does
3

Modern Pandas since version 0.18.1 natively supports compressed csv files: its read_csv method has compression parameter : {'infer', 'gzip', 'bz2', 'zip', 'xz', None}, default 'infer'.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

Comments

1

If you have a file name: my_big_file.csv and you zip it with the same name my_big_file.zip

you may simply do this:

df = pd.read_csv("my_big_file.zip")

Note: check your pandas version first (not applicable for older versions)

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.