2

I am trying to read WGIData.csv file in a pandas dataframe. WGIData.csv is present inside a zip file which i am downloading from this url

http://databank.worldbank.org/data/download/WGI_csv.zip

But when i tried to read, it throws error BadZipFile: File is not a zip file

Here is my python code

import pandas as pd
from urllib.request import urlopen
from zipfile import ZipFile

class Get_Data():

    def Return_csv_from_zip(self, url):
        self.zip = urlopen(url)
        self.myzip = ZipFile(self.zip)
        self.myzip = self.zip.extractall(self.myzip)
        self.file = pd.read_csv(self.myzip)
        self.zip.close()

        return self.file

url = 'http://databank.worldbank.org/data/download/WGI_csv.zip'
data = Get_Data()
df = data.Return_csv_from_zip(url)
7
  • Your zip has two files: ['WGIData.csv', 'WGISeries.csv'] this might be the problem. Commented May 28, 2018 at 19:18
  • It's not bad zip file, i extracted it using winrar @AChampion Commented May 28, 2018 at 19:20
  • Then what should i need to do?@coldspeed Commented May 28, 2018 at 19:21
  • i only want to read WGIData.csv Commented May 28, 2018 at 19:21
  • Something like this might help: stackoverflow.com/q/21075999/4909087 Commented May 28, 2018 at 19:22

1 Answer 1

7

urlopen() does not return an object (HTTPResponse) you can send to ZipFile(). You can read() the response and use io.BytesIO() to do what you need:

In []:
from io import BytesIO

z = urlopen('http://databank.worldbank.org/data/download/WGI_csv.zip')
myzip = ZipFile(BytesIO(z.read())).extract('WGIData.csv')
pd.read_csv(myzip)

Out[]:
     Country Name Country Code                                     Indicator Name    Indicator Code       1996  \
0        Anguilla          AIA                    Control of Corruption: Estimate            CC.EST        NaN   
1        Anguilla          AIA           Control of Corruption: Number of Sources         CC.NO.SRC        NaN   
2        Anguilla          AIA             Control of Corruption: Percentile Rank        CC.PER.RNK        NaN   
3        Anguilla          AIA  Control of Corruption: Percentile Rank, Lower ...  CC.PER.RNK.LOWER        NaN   
4        Anguilla          AIA  Control of Corruption: Percentile Rank, Upper ...  CC.PER.RNK.UPPER        NaN   
5        Anguilla          AIA              Control of Corruption: Standard Error        CC.STD.ERR        NaN   
...
Sign up to request clarification or add additional context in comments.

2 Comments

Is there a way to do it without specifying the file name?
You can extractall() to extract all the contents of a zip file or you can get the namelist() and iterate through the names.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.