How to download zip file and parse csv file from it in python

Question

I wrote a script which hit an URL and downloads a zip file, unzip it. Now I am facing problem while parsing CSV file which I get after unzip.

import csv
from requests import get
from io import BytesIO
from zipfile import ZipFile

request = get('https://example.com/some_file.zip')
zip_file = ZipFile(BytesIO(request.content))
files = zip_file.namelist()
with open(files[0], 'r') as csvfile:
    csvreader = csv.reader(csvfile)
    for row in csvreader:
        print(row)

what is the error/problem?

shahaf
– shahaf

2018-03-27 14:32:33 +00:00
Commented Mar 27, 2018 at 14:32 — shahaf
– shahaf, Commented Mar 27, 2018 at 14:32

9000 · Accepted Answer · 2024-07-31 14:07:22Z

4

Modern answer for Python 3

See above the answer by @joe-heffer: https://stackoverflow.com/a/53187751/223424

Old incomplete answer

When you do files = zip_file.namelist(), you just list the names of the files in the zip archive; these files are not yet extracted from the zip and you cannot open them as local files, like you're doing.

You can directly read a stream of data from a zip file using ZipFile.open.

So this should work:

zip_file = ZipFile(BytesIO(request.content))
files = zip_file.namelist()
with zip_file.open(files[0], 'r') as csvfile:
    csvreader = csv.reader(csvfile)
    ...

edited Jul 31, 2024 at 14:07

answered Mar 27, 2018 at 14:38

9000

41k9 gold badges70 silver badges108 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Magnotta Over a year ago

you are right that was the issue but now its throwing this issue _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

Magnotta Over a year ago

can you help me with the above error so I can accept the answer

Ryan Over a year ago

You can coerce the bytes to a string, but you may need to specify the encoding. Something like this might work: csvreader = csv.reader(str(csvfile, "utf-8-sig"))

Ike348 Over a year ago

That doesn't work either: TypeError: decoding to str: need a bytes-like object, ZipExtFile found

Ike348 Over a year ago

stackoverflow.com/a/56762078/4947006 is the actual answer, you need a call to io.TextIOWrapper

Joe Heffer · Accepted Answer · 2018-11-07 10:36:51Z

1

response = requests.get(url)
with io.BytesIO(response.content) as zip_file:
    with zipfile.ZipFile() as zip_file:
        # Get first file in the archive
        for zip_info in zip_file.infolist():
            logger.debug(zip_info)
            # Open file
            with zip_file.open(zip_info) as file:
                # Load CSV file, decode binary to text
                with io.TextIOWrapper(file) as text:
                    return csv.DictReader(text)

answered Nov 7, 2018 at 10:36

Joe Heffer

3995 silver badges7 bronze badges

Comments

Matthew Woodruff · Accepted Answer · 2018-03-27 14:34:14Z

0

Looks like you haven't imported the csv module. Try putting import csv at the top with your imports.

answered Mar 27, 2018 at 14:34

Matthew Woodruff

4532 silver badges8 bronze badges

1 Comment

Magnotta Over a year ago

I forgot to mention here, but that's not the issue here but the problem is it is not printing anything but file is not empty

Akim Sissaoui · Accepted Answer · 2022-10-04 00:58:06Z

So. After some hours of searching and trying, I finally got something working. Here is my script.

So my need was:

Download a ZIP file.
Find in that zip file a specific text file with "anystring" in the name
Extract from that text file the 1st URL containing the string "csv"

#!/bin/env python
from io import BytesIO
from zipfile import ZipFile
import requests
import re
import sys

# define url value
url = "https://whateverurlyouneed"

# Define string to be found in the file name to be extracted
filestr = "anystring"

# Define string to be found in URL
urlstr = "anystring"

# Define regex to extract URL
regularex = r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|(([^\s()<>]+|(([^\s()<>]+)))))+(?:(([^\s()<>]+|(([^\s()<>]+))))|[^\s`!()[]{};:'\".,<>?«»“”‘’]))"

# download zip file
content = requests.get(url)

# Open stream
zipfile = ZipFile(BytesIO(content.content))

# Open first file from the ZIP archive containing 
# the filestr string in the name
data = [zipfile.open(file_name) for file_name in zipfile.namelist() if filestr in file_name][0]

# read lines from the file. If csv found, print URL and exit
# This will return the 1st URL containing CSV in the opened file
for line in data.readlines():
    if urlstr in line.decode("latin-1"):
        urls = re.findall(regularex,line.decode("latin-1"))
        print([url[0] for url in urls])
        break
sys.exit(0)

Collectives™ on Stack Overflow

How to download zip file and parse csv file from it in python

4 Answers 4

Modern answer for Python 3

Old incomplete answer

5 Comments

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Modern answer for Python 3

Old incomplete answer

5 Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related