You can make HTTP requests to APIs with many libraries, one of the most popular, and easiest to use, is requests.
It works as simply as
import requests
response = requests.get('https://api.com/')
print(response) # shows the response's HTTP status code
print(response.json()) # shows the response's JSON response body, if it has one
print(response.content) # get the data content of the response (for your case this is the downloaded file)
print(dir(response)) # shows you all the different methods you can call on this response object
I went to the voaratings website, and couldn't find any API specifications, which could be used to call their database. In their technical guidance document they describe the use of their file download API, so let's use that one for our case.
Because the API they provide is a file-API and not a typical REST JSON API, we should save the response into a file, so that we don't have to re-request it everytime we run the script (the .zip file I picked from the downloads-page is 113MB and it contains two .csv files of the size 544MB and 55MB)
import os
import io
import sys
import time
import zipfile
import requests
def download_file(url, file_name):
"""Download file from `url` and save it as `file_name`."""
print('Downloading')
response = requests.get(url)
print('Download finished')
print('Saving response to file')
with open(file_name, 'wb') as f:
f.write(response.content)
print('Response saved')
def process_row(row):
"""Do something here to read the row data."""
time.sleep(0.5) # I put a half second delay here to prevent spam
print(row)
def read_file(file_name):
"""Read file `file_name` and process it with `process_row()`."""
print('Unzipping file')
with zipfile.ZipFile(file_name) as unzipped:
print('Opening csv file')
for csv_file in unzipped.infolist():
with unzipped.open(csv_file) as cf:
print('Parsing csv file')
for row in cf.readlines():
process_row(row)
if __name__ == '__main__':
# Some configuration
file_name = sys.argv[1] if len(sys.argv) > 1 else 'ratings.csv.zip'
url = sys.argv[2] if len(sys.argv) > 1 else 'https://voaratinglists.blob.core.windows.net/downloads/uk-englandwales-ndr-2017- listentries-compiled-epoch-0018-baseline-csv.zip'
# Check if file already exists, if not, download it
if os.path.exists(file_name):
print('File already exists, skipping download')
else:
print('File not found, download file from API')
download_file(url, file_name)
# Work with the file
read_file(file_name)
The sys.argv configuration is for if you want to run this from the command line and give the API address and filename:
$ python voa.py <filename> <api>
$ python voa.py filename.csv.zip https://api.com/filename.csv.zip
I made the default configuration to point to the "2017 non domestic rating list entries" from the Download section, and to give the downloaded file the name ratings.csv.zip.
The script begins with checking if the given filename already exists, if it doesn't, the script downloads and saves the file to disk. Then the script unzips the package (in memory, doesn't save unzipped contents to disk) and iterates through the csv files. Finally it goes through all the lines in each file, and you can change what to do/look for in those lines with process_row(), for example, I just made it print out the lines, but you can parse them how you wish.
There are also libraries for handling csv files, like csv or pandas, but I couldn't get them to work properly, so I regressed into this simpler line-by-line parsing example.
Example of running the script:
$ python voa.py
File not found, download file from API
Downloading
Download finished
Saving response to file
Response saved
Unzipping file
Opening csv file
Parsing csv file
b'1*0345**1007299058048*CW*WAREHOUSE AND PREMISES*6898341000*UNIT 1 THE MINSTER 58, PORTMAN ROAD, READING**UNIT 1 THE MINSTER 58*PORTMAN ROAD*READING***RG30 1EA***27500**18062324000**096G****21035872144*01-APR-2017**\r\n'
b'2*0345**1004697011002*CS*SHOP AND PREMISES*6931304000*GND FLR 11-12, GUN STREET, READING**GND FLR 11-12*GUN STREET*READING***RG1 2JR***18500**16134902000**249G****21063751144*01-APR-2017**\r\n'
b'3*0345**1004697011003*CO*OFFICES AND PREMISES*6931305000*BST FLR 11-12, GUN STREET, READING**BST FLR 11-12*GUN STREET*READING***RG1 2JR***3900**17143722000**203G****21027287144*01-APR-2017**\r\n'
b'4*0345**1005914008311*CO*OFFICES AND PREMISES*7008147000*83-85, LONDON STREET, READING**83-85*LONDON STREET*READING***RG1 4QA*01-APR-2017****19719807000*25-SEP-2017*203G****29775438144*25-SEP-2017**\r\n'
...