How to query an ReST Storage API with Python

Question

I have little to no knowledge of APIs so apologies for the vagueness of this

I need to query data from here using the API - but i really don't have a clue where to even start. I've been learning python a few months and am fairly confident with the basics, but haven't a clue with APIs and i really need to start using them. Anywhere i can go give me a breakdown? Or can someone start me off? I'd want to query the data for Cornwall for example...

Thanks

https://voaratinglists.blob.core.windows.net/html/rlidata.htm

Why do you want to use the REST API? There's a SDK available for Python. Use that instead of consuming the REST API directly as the SDK is a wrapper over REST API. — Gaurav Mantri
– Gaurav Mantri, Commented Apr 14, 2020 at 11:24
Agreed - you don't need to use the REST API, you can use simpler API libraries. learn.microsoft.com/en-us/azure/storage/blobs/… — Nick.Mc
– Nick.Mc, Commented Apr 14, 2020 at 11:34
looks like you're right, but you need an account for storage and that only lasts 12 months and they want a credit card. Doing it for work so that's not happening! Thanks :) — GlassShark1
– GlassShark1, Commented Apr 20, 2020 at 7:48

Teemu · Accepted Answer · 2020-04-14 11:31:48Z

You can make HTTP requests to APIs with many libraries, one of the most popular, and easiest to use, is requests.

It works as simply as

import requests
response = requests.get('https://api.com/')
print(response)  # shows the response's HTTP status code
print(response.json())  # shows the response's JSON response body, if it has one
print(response.content)  # get the data content of the response (for your case this is the downloaded file)
print(dir(response))  # shows you all the different methods you can call on this response object

I went to the voaratings website, and couldn't find any API specifications, which could be used to call their database. In their technical guidance document they describe the use of their file download API, so let's use that one for our case.

Because the API they provide is a file-API and not a typical REST JSON API, we should save the response into a file, so that we don't have to re-request it everytime we run the script (the .zip file I picked from the downloads-page is 113MB and it contains two .csv files of the size 544MB and 55MB)

import os
import io
import sys
import time
import zipfile
import requests


def download_file(url, file_name):
    """Download file from `url` and save it as `file_name`."""
    print('Downloading')
    response = requests.get(url)
    print('Download finished')
    print('Saving response to file')
    with open(file_name, 'wb') as f:
        f.write(response.content)
    print('Response saved')


def process_row(row):
    """Do something here to read the row data."""
    time.sleep(0.5)  # I put a half second delay here to prevent spam
    print(row)


def read_file(file_name):
    """Read file `file_name` and process it with `process_row()`."""
    print('Unzipping file')
    with zipfile.ZipFile(file_name) as unzipped:
        print('Opening csv file')
        for csv_file in unzipped.infolist():
            with unzipped.open(csv_file) as cf:
                print('Parsing csv file')
                for row in cf.readlines():
                    process_row(row)


if __name__ == '__main__':
    # Some configuration
    file_name = sys.argv[1] if len(sys.argv) > 1 else 'ratings.csv.zip'
    url = sys.argv[2] if len(sys.argv) > 1 else 'https://voaratinglists.blob.core.windows.net/downloads/uk-englandwales-ndr-2017-    listentries-compiled-epoch-0018-baseline-csv.zip'
    # Check if file already exists, if not, download it
    if os.path.exists(file_name):
        print('File already exists, skipping download')
    else:
        print('File not found, download file from API')
        download_file(url, file_name)
    # Work with the file
    read_file(file_name)

The sys.argv configuration is for if you want to run this from the command line and give the API address and filename:

$ python voa.py <filename> <api>
$ python voa.py filename.csv.zip https://api.com/filename.csv.zip

I made the default configuration to point to the "2017 non domestic rating list entries" from the Download section, and to give the downloaded file the name ratings.csv.zip.

The script begins with checking if the given filename already exists, if it doesn't, the script downloads and saves the file to disk. Then the script unzips the package (in memory, doesn't save unzipped contents to disk) and iterates through the csv files. Finally it goes through all the lines in each file, and you can change what to do/look for in those lines with process_row(), for example, I just made it print out the lines, but you can parse them how you wish.

There are also libraries for handling csv files, like csv or pandas, but I couldn't get them to work properly, so I regressed into this simpler line-by-line parsing example.

Example of running the script:

$ python voa.py 
File not found, download file from API
Downloading
Download finished
Saving response to file
Response saved
Unzipping file
Opening csv file
Parsing csv file
b'1*0345**1007299058048*CW*WAREHOUSE AND PREMISES*6898341000*UNIT 1 THE MINSTER 58, PORTMAN ROAD, READING**UNIT 1 THE MINSTER 58*PORTMAN ROAD*READING***RG30 1EA***27500**18062324000**096G****21035872144*01-APR-2017**\r\n'
b'2*0345**1004697011002*CS*SHOP AND PREMISES*6931304000*GND FLR 11-12, GUN STREET, READING**GND FLR 11-12*GUN STREET*READING***RG1 2JR***18500**16134902000**249G****21063751144*01-APR-2017**\r\n'
b'3*0345**1004697011003*CO*OFFICES AND PREMISES*6931305000*BST FLR 11-12, GUN STREET, READING**BST FLR 11-12*GUN STREET*READING***RG1 2JR***3900**17143722000**203G****21027287144*01-APR-2017**\r\n'
b'4*0345**1005914008311*CO*OFFICES AND PREMISES*7008147000*83-85, LONDON STREET, READING**83-85*LONDON STREET*READING***RG1 4QA*01-APR-2017****19719807000*25-SEP-2017*203G****29775438144*25-SEP-2017**\r\n'
...

Thanks for this - gives me a starting point. I think the VOA API maybe isn't the best of learning this though - might start on something simpler. Thanks again for taking the time :)

Collectives™ on Stack Overflow

How to query an ReST Storage API with Python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related