Heyo, trynna download images from a site. I've setup a basic filter which works fine but my aim is to automate this and one of the steps to doing that is constantly re-downloading the site. I'm using wget to do this which works fine from terminal but it seems os.system() in python creates it's own (can't think of the name atm) 'terminal' which means I can't use things that I've installed, such as wget. I've tried gnome-terminal but I might be doing something wrong :/ Any other solutions would be greatly appreciated, thanks!
1 Answer
Why are you trying to download the site by calling wget from the terminal ? I think a better idea is to download a site the python way:
import sys
import os
import urllib.error
import urllib.request
def get_raw_webpage(url):
"""
Download a web url as raw bytes
"""
try:
req = urllib.request.Request(url)
response = urllib.request.urlopen(req)
data = response.read()
return data
except urllib.error.HTTPError as e:
print('HTTPError: ', e.code , file = sys.stderr)
return None
except urllib.error.URLError as e:
print('URLError: ', e.args, file = sys.stderr)
return None
except ValueError as e:
print('Invalid url.', e.args, file = sys.stderr)
return None
def get_webpage(url):
"""
Get webpage as raw bytes and then
convert to readable form
"""
data = get_raw_webpage(url)
if data == None:
return None
return data.decode('utf-8')
You can also use get_raw_webpage function with a link to an image to download it!
3 Comments
Andy
Thanks for your answer, the code works great. You mentioned being able to use the get_raw_webpage function to download an image?? Is it possible to get some more detail on that? Thanks!
George TG
Yes, get_raw_webpage actually downloads whatever your link points to as raw byte data, so if you give it a link of an image/sound or w/e file and then save that data to a file as binary you have therefore downloaded the image/sound/whatever.
Andy
Thanks heaps! Played around with it and got it to work perfectly.
which wget. And it would help if you post your code.