0

I have HTML pages that contains image tags with 2 'src' attributes, and I want to use BS to extract the first 'src' and not the second 'src'.

For example:

When I use BS as follows:

from bs4 import BeautifulSoup

html_doc = <img class="lazy" src="https://hips.hearstapps.com/hmg-prod.s3.amazonaws.com/images/dog-puppy-on-garden-royalty-free-image-1586966191.jpg?crop=1.00xw:0.669xh;0,0.190xh&resize=980:*"                        src="https://www.mdf.qa/media/catalog/product/cache/1/image/800x800/9df78eab33525d08d6e5fb8d27136e95/b/l/black_2.jpg"/>

soup = BeautifulSoup(html_doc, 'html.parser')
bs_images = soup.find_all('img')
for bs_image in bs_images:
   attrs = bs_image.attrs
   image_path = attrs['src']

The path I'm getting is the second src "https://www.mdf.qa/media/catalog/product/cache/1/image/800x800/9df78eab33525d08d6e5fb8d27136e95/b/l/black_2.jpg" but I need the first src - https://hips.hearstapps.com/hmg-prod.s3.amazonaws.com/images/dog-puppy-on-garden-royalty-free-image-1586966191.jpg?crop=1.00xw:0.669xh;0,0.190xh&resize=980:* .

1 Answer 1

0

It seems that BeautifulSoup is rewriting second src on the top of the first so the first src is not stored anywhere. I would sugest using regex for this problem.

import re

html_doc = '<img class="lazy" src="https://hips.hearstapps.com/hmg-prod.s3.amazonaws.com/images/dog-puppy-on-garden-royalty-free-image-1586966191.jpg?crop=1.00xw:0.669xh;0,0.190xh&resize=980:*"                        src="https://www.mdf.qa/media/catalog/product/cache/1/image/800x800/9df78eab33525d08d6e5fb8d27136e95/b/l/black_2.jpg"/>'

bs_images = re.findall('<img[^<>]+>', html_doc)
for bs_image in bs_images:
   image_path = re.search('src="([^"]+)"', bs_image).group(1)
   print(image_path)

Here is the link to src match. With re.search we only get first match (with findall would we get all matches).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.