13

I have .zip archive with filename.xlsx inside it and I want to parse Excel sheet line by line.

How to proper pass filename into pandas.read_excel in this case?

I tried:

import zipfile
import pandas
myzip=zipfile.ZipFile(filename.zip)
for fname in myzip.namelist():
    with myzip.open(fname) as from_archive:
        with pandas.read_excel(from_archive) as fin:
            for line in fin:
            ....

but it doesn't seem to work, and the result was:

AttributeError: __exit__
5
  • 1
    What if your ZIP file contains multiple .XLS(X) files? Commented Mar 7, 2018 at 16:53
  • you should edit your question to include the declaration of myzip instead of adding that as a comment. Commented Mar 7, 2018 at 17:09
  • @MaxU, it does not matter now. The goal is to solve the simplest case. Commented Mar 7, 2018 at 19:22
  • @IvanVodopyanov, why do you want to read it line by line - is it that huge that can't fit into memory? Commented Mar 7, 2018 at 19:24
  • @MaxU, I does not matter. First of all I want to open it. Can you help me? Commented Mar 8, 2018 at 7:58

3 Answers 3

16

You can extract your zip-file into a variable in memory and parse it using io.BytesIO:

import io
from zipfile import ZipFile
import pandas as pd


def read_zip(zip_fn, extract_fn=None):
    zf = ZipFile(zip_fn)
    if extract_fn:
        return zf.read(extract_fn)
    else:
        return {name:zf.read(name) for name in zf.namelist()}

Usage:

df = pd.read_excel(io.BytesIO(read_zip(r'C:\download\test.xlsx.zip', 'test.xlsx')))

Alternatively you can extract files from the zip-file to disk and parse them as a regular files.

PS there are tons of examples on StackOverflow, showing how to explode zip-file...

Sign up to request clarification or add additional context in comments.

3 Comments

Nice work -- what is the zip file was a URL? Apparently read_excel method can't accept a URL that is also a zip (with an embedded .xlsx in it)
@leeprevost, i'd first download such a zip file using requests and then use read_zip() function from the answer ;)
Anyway to do this for xlrs instead of pandas?
6

Using zipfile

import zipfile

archive = zipfile.ZipFile('filename.zip', 'r')
xlfile = archive.open('filename.xlsx')
df = pd.read_excel(xlfile)

2 Comments

they are already using zipfile, the problem was trying to use pd.read_excel as a context manager
@Floydian, I guess your answer is the same as my question. Am I right?
-2

Simple way is:

df = pd.read_csv('path to file', compression='zip').

if u need u can to add extra atr: encoding = 'windows-1251' and sep = ''

1 Comment

This doesn't work for an Excel file, which was the original question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.