how to extract image link with python

Question

I am trying to get a link from a weboage HTML in python. the problem is that when I open the chrome inspect tool, I can see that the link is like this:

<meta property="og:image" content="https://dkstatics-public.digikala.com/digikala-products/09edc9a95239bbd46cbf5d2f344fc45620166666_1620816530.jpg?x-oss-process=image/resize,m_lfit,h_350,w_350/quality,q_60">

But, when I get the HTML using this code, this line doesn't exist in the HTML.

import requests
from bs4 import BeautifulSoup
import pandas as pd
from urllib.request import Request, urlopen
import re
 
url = 'https://www.digikala.com/product/dkp-729879/%D8%B4%D9%88%D8%B1%D8%AA-%D9%85%D8%B1%D8%AF%D8%A7%D9%86%D9%87-%D8%A2%D8%B1%DB%8C%D8%A7%D9%86-%D9%86%D8%AE-%D8%A8%D8%A7%D9%81-%DA%A9%D8%AF-1312-%D9%85%D8%AC%D9%85%D9%88%D8%B9%D9%87-3-%D8%B9%D8%AF%D8%AF%DB%8C/'

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
}
req = Request(url,headers=headers)
webpage = str(urlopen(req).read())
print(webpage)

The HTML that I get from python seems to be a lot shorter and doesn't contain this element at all. what I want to know is, how can I get that element through python?

Andrej Kesely · Accepted Answer · 2022-09-21 12:35:20Z

The tags you see are created programmatically via Javascript that loads the data from different URL. This example will use requests/json module to get the data:

import re
import json
import requests

url = "https://www.digikala.com/product/dkp-729879/%D8%B4%D9%88%D8%B1%D8%AA-%D9%85%D8%B1%D8%AF%D8%A7%D9%86%D9%87-%D8%A2%D8%B1%DB%8C%D8%A7%D9%86-%D9%86%D8%AE-%D8%A8%D8%A7%D9%81-%DA%A9%D8%AF-1312-%D9%85%D8%AC%D9%85%D9%88%D8%B9%D9%87-3-%D8%B9%D8%AF%D8%AF%DB%8C/"

id_ = re.search(r"-(\d+)/", url).group(1)
product_url = f"https://api.digikala.com/v1/product/{id_}/"
data = requests.get(product_url).json()

# uncomment to print all data:
# print(json.dumps(data, indent=4))

print(data["data"]["seo"]["open_graph"]["image"])

Prints:

https://dkstatics-public.digikala.com/digikala-products/113697641.jpg?x-oss-process=image/resize,m_lfit,h_350,w_350/quality,q_60

Franz Gastring · Accepted Answer · 2022-09-21 12:34:03Z

1

The page has a js rendering so the html returned by the query should be evaluated as javascript. Like a browser does!

Try to use a splash docker and send the request througth it and should work or use another tool that works similarly.

Link to Splash

answered Sep 21, 2022 at 12:34

Franz Gastring

1,1222 gold badges15 silver badges14 bronze badges

Collectives™ on Stack Overflow

how to extract image link with python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related