2

I am trying to use the BeautifulSoup library in Python to extract the jpg image names from a html script. In the url wherever you find srcset it is always proceeded by a jpg file name. I want to extract all the jpg files this way however whenever I run the following code it prints out None. However in the url there is always a jpg file name after srcset. For example , ' srcset="https://img.shopstyle-cdn.com/pim/31/94/3194ec1ca5e3a56cb83f708533b9084d_best.jpg" ' can be found in the html.

import urllib2 
html = urllib2.urlopen("https://www.shopstyle.com/p/prada-notch-lapel-fitted-blazer/645742403").read()

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')

print soup.find(attrs= {"img":"srcset"})
2
  • Can you give the entire tag where the Image can be found? Commented Oct 5, 2017 at 9:52
  • Hope it helps soup.find('img', attrs = {'srcset' : True}) Commented Oct 5, 2017 at 9:52

3 Answers 3

5

To find all urls from srcset you can do this:

import urllib2 
html = urllib2.urlopen("https://www.shopstyle.com/p/prada-notch-lapel-fitted-blazer/645742403").read()

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')

for el in soup.findAll('img', attrs = {'srcset' : True}):
    print el['srcset']

Your query returns None because argument attrs expected a dictionary with a property as key and filter as value. See the explanation from bs4 docs

Sign up to request clarification or add additional context in comments.

Comments

2

Try this :

soup.find('img')['srcset']
'https://img.shopstyle-cdn.com/pim/31/94/3194ec1ca5e3a56cb83f708533b9084d_best.jpg'

Comments

2

I want to extract all the jpg files

from bs4 import BeautifulSoup
import requests

html_doc = requests.get("https://www.shopstyle.com/p/prada-notch-lapel-fitted-blazer/645742403")
soup = BeautifulSoup(html_doc.content, 'html.parser')
imgs = [i.get('srcset') for i in soup.find_all('img', srcset=True)]

print(imgs)

The output:

['https://img.shopstyle-cdn.com/pim/31/94/3194ec1ca5e3a56cb83f708533b9084d_best.jpg', 'https://img.shopstyle-cdn.com/pim/16/c3/16c3e46d3547d6404ba29b61b8f229fd_best.jpg', 'https://img.shopstyle-cdn.com/pim/65/e6/65e6d0e3c0160f0aca361934b999f0c9_best.jpg', 'https://img.shopstyle-cdn.com/sim/31/94/3194ec1ca5e3a56cb83f708533b9084d/prada-notch-lapel-fitted-blazer.jpg', 'https://img.shopstyle-cdn.com/sim/16/c3/16c3e46d3547d6404ba29b61b8f229fd/prada-notch-lapel-fitted-blazer.jpg', 'https://img.shopstyle-cdn.com/sim/65/e6/65e6d0e3c0160f0aca361934b999f0c9/prada-notch-lapel-fitted-blazer.jpg', 'https://img.shopstyle-cdn.com/pim/73/76/737689fa284d6640f7619e5f2f3558a5_xlarge.jpg', 'https://img.shopstyle-cdn.com/pim/2c/b0/2cb0acb147bd20df78bc482d66d7218b_xlarge.jpg', 'https://img.shopstyle-cdn.com/pim/5c/20/5c20824543749df684f3264c5e976e8c_xlarge.jpg', 'https://img.shopstyle-cdn.com/pim/48/b8/48b81f60d61e5c23cdfa343940e43ce9_xlarge.jpg', 'https://img.shopstyle-cdn.com/pim/ff/08/ff081818581b0363d4c0ec02c2cba5d4_xlarge.jpg', 'https://img.shopstyle-cdn.com/pim/86/0a/860ae7abdde0bf40046d53668abbe126_xlarge.jpg', 'https://img.shopstyle-cdn.com/pim/2f/5c/2f5c78d017052b14fd2db0d886a2a326_xlarge.jpg', 'https://img.shopstyle-cdn.com/pim/49/d5/49d5de5b62e6ddc0864afee987dd5e67_xlarge.jpg', 'https://img.shopstyle-cdn.com/pim/50/04/5004bf25e97ac0e4564d8a219a3b34b4_xlarge.jpg', 'https://img.shopstyle-cdn.com/pim/a8/76/a876ac6696e140f34e4cf82b5dbcaadf_xlarge.jpg']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.