0

I need to parse Pinterest, but for some reason, instead of links to pictures, incomprehensible and non-working links appear.

def parse():
    url = 'https://www.pinterest.ie/'
    r = requests.get(url)
    soup = BeautifulSoup(r.text,'lxml')
    print(soup.find_all('a'))
parse()
1
  • 2
    Have you LOOKED at the source code for that page, using View Source or by printing out r.text? The HTML you fetch contains little more than ads. The page is built dynamically with Javascript. You'd need to use something like Selenium to get a real browser involved. Commented Sep 3, 2022 at 6:19

1 Answer 1

0

The site requires JavaScript to be active, which isn't the case when you send a request through BeautifulSoup. A workaround has been suggested here, where you can use Selenium to open up the page in an actual browser (thereby enabling JavaScript), and then use BeautifulSoup to parse the HTML.

Something like this should work:

from bs4 import BeautifulSoup
import selenium.webdriver.chrome.service as service
from selenium import webdriver

service = service.Service("../chromedriver.exe")
service.start()
driver = webdriver.Remote(service.service_url)

def parse():
    url = 'https://www.pinterest.ie/'
    driver.get(url)
    html = driver.page_source
    soup = BeautifulSoup(html, 'lxml')
    print(soup.find_all('a'))

parse()

You will, of course, need some idea of how to use Selenium. The official docs should help.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.