5

I wrote a script which hit an URL and downloads a zip file, unzip it. Now I am facing problem while parsing CSV file which I get after unzip.

import csv
from requests import get
from io import BytesIO
from zipfile import ZipFile

request = get('https://example.com/some_file.zip')
zip_file = ZipFile(BytesIO(request.content))
files = zip_file.namelist()
with open(files[0], 'r') as csvfile:
    csvreader = csv.reader(csvfile)
    for row in csvreader:
        print(row)
1
  • 2
    what is the error/problem? Commented Mar 27, 2018 at 14:32

4 Answers 4

4
Modern answer for Python 3

See above the answer by @joe-heffer: https://stackoverflow.com/a/53187751/223424

Old incomplete answer

When you do files = zip_file.namelist(), you just list the names of the files in the zip archive; these files are not yet extracted from the zip and you cannot open them as local files, like you're doing.

You can directly read a stream of data from a zip file using ZipFile.open.

So this should work:

zip_file = ZipFile(BytesIO(request.content))
files = zip_file.namelist()
with zip_file.open(files[0], 'r') as csvfile:
    csvreader = csv.reader(csvfile)
    ...
Sign up to request clarification or add additional context in comments.

5 Comments

you are right that was the issue but now its throwing this issue _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
can you help me with the above error so I can accept the answer
You can coerce the bytes to a string, but you may need to specify the encoding. Something like this might work: csvreader = csv.reader(str(csvfile, "utf-8-sig"))
That doesn't work either: TypeError: decoding to str: need a bytes-like object, ZipExtFile found
stackoverflow.com/a/56762078/4947006 is the actual answer, you need a call to io.TextIOWrapper
1
response = requests.get(url)
with io.BytesIO(response.content) as zip_file:
    with zipfile.ZipFile() as zip_file:
        # Get first file in the archive
        for zip_info in zip_file.infolist():
            logger.debug(zip_info)
            # Open file
            with zip_file.open(zip_info) as file:
                # Load CSV file, decode binary to text
                with io.TextIOWrapper(file) as text:
                    return csv.DictReader(text)

Comments

0

Looks like you haven't imported the csv module. Try putting import csv at the top with your imports.

1 Comment

I forgot to mention here, but that's not the issue here but the problem is it is not printing anything but file is not empty
0

So. After some hours of searching and trying, I finally got something working. Here is my script.

So my need was:

  • Download a ZIP file.
  • Find in that zip file a specific text file with "anystring" in the name
  • Extract from that text file the 1st URL containing the string "csv"
#!/bin/env python
from io import BytesIO
from zipfile import ZipFile
import requests
import re
import sys

# define url value
url = "https://whateverurlyouneed"

# Define string to be found in the file name to be extracted
filestr = "anystring"

# Define string to be found in URL
urlstr = "anystring"

# Define regex to extract URL
regularex = r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|(([^\s()<>]+|(([^\s()<>]+)))))+(?:(([^\s()<>]+|(([^\s()<>]+))))|[^\s`!()[]{};:'\".,<>?«»“”‘’]))"

# download zip file
content = requests.get(url)

# Open stream
zipfile = ZipFile(BytesIO(content.content))

# Open first file from the ZIP archive containing 
# the filestr string in the name
data = [zipfile.open(file_name) for file_name in zipfile.namelist() if filestr in file_name][0]

# read lines from the file. If csv found, print URL and exit
# This will return the 1st URL containing CSV in the opened file
for line in data.readlines():
    if urlstr in line.decode("latin-1"):
        urls = re.findall(regularex,line.decode("latin-1"))
        print([url[0] for url in urls])
        break
sys.exit(0)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.