4

I want to get the image dimensions as seen from a viewer on a website.

I'm using beautiful soup and I get image links like this:

links = soup.findAll('img', {"src":True})

The way I get the image dimensions is by using:

link.has_key('height')
height = link['height']

and similarly with width as well. However, some links only have one of these attributes. I tried PIL but that gives the actual image size if downloaded.

Is there any other way of finding the image dimensions as seen on a website?

1 Answer 1

17

Your main issue is you are searching the html source for references to height and width. In most cases (when things are done well), images don't have height and width specified in html, in which case they are rendered at the height and width of the image file itself.

To get the height and width of the image file, you need to actually query for and load that file, then check the height and width using image processing. If this is what you want, let me know, and I'll help you work through that process.

import urllib, cStringIO
from PIL import Image

# given an object called 'link'

SITE_URL = "http://www.targetsite.com"
URL = SITE_URL + link['src']
# Here's a sample url that works for demo purposes
# URL = "http://therealtomrose.therealrosefamily.com/wp-content/uploads/2012/08/headshot_tight.png"
file = cStringIO.StringIO(urllib.urlopen(URL).read())
im=Image.open(file)
width, height = im.size
if link.has_key('height'):
    height = link['height']  # set height if site modifies it
if link.has_key('width'):
    width = link['width']  # set width if site modifies it

Requirements: This method requires the PIL library for image processing.

# from command line in a virtual environment
pip install PIL
Sign up to request clarification or add additional context in comments.

6 Comments

I'm assuming this will give me the actual image dimensions. I'm actually looking for the image given from how it looks like on the site, do you know how I could do this? thanks!
Sure, I just added handling for sites with manually modified dimensions. I've optimized this script for sites where most image dimensions aren't specified, which is how most sites are.
thanks! This is very much what i did with my code. However, I ran into the issue with a modified image, but only having one of the keys present. Do you know another way to work around this? thanks!
Will this download the image?
In the part where you check if the HTML specifies image height or width, don't you need to rescale the other dimension if that happens? For example, if I have a 600x600 image and put in HTML width=300, it'll appear as 300x300.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.