How can I parse with python beautifulsoup the following code? I need to get each image with its corresponding width and height properties (if they exist).
The code below "means there are 3 images on this page, the first image is 300x300, the middle one has unspecified dimensions, and the last one is 1000px tall" (as explained here)
<meta property="og:image" content="http://example.com/rock.jpg" />
<meta property="og:image:width" content="300" />
<meta property="og:image:height" content="300" />
<meta property="og:image" content="http://example.com/rock2.jpg" />
<meta property="og:image" content="http://example.com/rock3.jpg" />
<meta property="og:image:height" content="1000" />
So far I have the following code, but it only returns the first set of dimensions:
images = []
img_list = soup.findAll('meta', {"property":'og:image'})
for og_image in img_list:
if not og_image.get('content'):
continue
image = {'url': og_image['content']}
width = self.soup.find('meta', {"property":'og:image:width'})
if width:
image['width'] = width['content']
height = self.soup.find('meta', {"property":'og:image:height'})
if width:
image['height'] = height['content']
images.append(image)
Thanks!