2

I am trying to get a link from a weboage HTML in python. the problem is that when I open the chrome inspect tool, I can see that the link is like this:

<meta property="og:image" content="https://dkstatics-public.digikala.com/digikala-products/09edc9a95239bbd46cbf5d2f344fc45620166666_1620816530.jpg?x-oss-process=image/resize,m_lfit,h_350,w_350/quality,q_60">

But, when I get the HTML using this code, this line doesn't exist in the HTML.

import requests
from bs4 import BeautifulSoup
import pandas as pd
from urllib.request import Request, urlopen
import re
 
url = 'https://www.digikala.com/product/dkp-729879/%D8%B4%D9%88%D8%B1%D8%AA-%D9%85%D8%B1%D8%AF%D8%A7%D9%86%D9%87-%D8%A2%D8%B1%DB%8C%D8%A7%D9%86-%D9%86%D8%AE-%D8%A8%D8%A7%D9%81-%DA%A9%D8%AF-1312-%D9%85%D8%AC%D9%85%D9%88%D8%B9%D9%87-3-%D8%B9%D8%AF%D8%AF%DB%8C/'

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
}
req = Request(url,headers=headers)
webpage = str(urlopen(req).read())
print(webpage)

The HTML that I get from python seems to be a lot shorter and doesn't contain this element at all. what I want to know is, how can I get that element through python?

2 Answers 2

1

The tags you see are created programmatically via Javascript that loads the data from different URL. This example will use requests/json module to get the data:

import re
import json
import requests

url = "https://www.digikala.com/product/dkp-729879/%D8%B4%D9%88%D8%B1%D8%AA-%D9%85%D8%B1%D8%AF%D8%A7%D9%86%D9%87-%D8%A2%D8%B1%DB%8C%D8%A7%D9%86-%D9%86%D8%AE-%D8%A8%D8%A7%D9%81-%DA%A9%D8%AF-1312-%D9%85%D8%AC%D9%85%D9%88%D8%B9%D9%87-3-%D8%B9%D8%AF%D8%AF%DB%8C/"

id_ = re.search(r"-(\d+)/", url).group(1)
product_url = f"https://api.digikala.com/v1/product/{id_}/"
data = requests.get(product_url).json()

# uncomment to print all data:
# print(json.dumps(data, indent=4))

print(data["data"]["seo"]["open_graph"]["image"])

Prints:

https://dkstatics-public.digikala.com/digikala-products/113697641.jpg?x-oss-process=image/resize,m_lfit,h_350,w_350/quality,q_60
Sign up to request clarification or add additional context in comments.

Comments

1

The page has a js rendering so the html returned by the query should be evaluated as javascript. Like a browser does!

Try to use a splash docker and send the request througth it and should work or use another tool that works similarly.

Link to Splash

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.