Newest 'web-scraping' Questions - Page 3

0 votes

2 answers

98 views

nodriver crashes with infinite recursion in headless mode when running browser.get()

Here is the code I'm trying to run: import nodriver as uc async def main(): browser = await uc.start(headless=True) page = await browser.get('https://www.nowsecure.nl') if __name__ == '...

grasswistle

505

asked May 15 at 3:05

-3 votes

1 answer

94 views

Issue in scraping data

I have an issue in scraping schools data. I need their email and website URL. I tried a lot but it's returning empty results. What's the best way to do this? Here is the code: from selenium import ...

Omprakash S

1

asked May 14 at 9:50

1 vote

1 answer

47 views

Request breaks down with field change (scraping)

I am trying to download some data (in an efficient way), however I encountered an unexpected problem. Here is the code that works just fine: import requests import os from bs4 import BeautifulSoup ...

IPII

72

asked May 12 at 19:15

2 votes

1 answer

77 views

Scrapling Save PDF File Locally

I have to scrape a website which returns a static PDF file. The only Python package that can access the document successfully is scrapling. However, the PDF file returned is not saved correctly in my ...

pythonuser43343

31

asked May 11 at 19:53

0 votes

0 answers

46 views

Selenium + ChromeDriver crash when scraping Google Maps entries with Selenium on MacOS

I’m building a Telegram bot in Python that scrapes points of interest (POIs) from Google Maps using Selenium. The overall flow works, but as soon as I try to scroll through and click on the .hfpxzc ...

Vazgen Hakobjanyan

1

asked May 10 at 22:02

0 votes

0 answers

38 views

Selenium wire failing to intercept request randomly

I have a scraping script that navigates to a website that contains the widget I want to scrape. The driver clicks a button that should trigger a xhr request and has never failed to do so in my manual ...

Cole Kuhlers

1

asked May 6 at 19:20

1 vote

0 answers

900 views

Why does yt-dlp work locally but fail with a bot error in production (live environment) even with cookies?

I'm trying to download YouTube videos using the yt-dlp Python library in a deployed web application. The same script works perfectly on my local machine, but in the production (live) environment, it ...

Jeba Angelline Mary M SNSIHUB

11

asked May 6 at 13:04

-7 votes

1 answer

71 views

Amazon web scraping with python [closed]

I am scraping an Amazone page using Python and saving the result into a csv file. This code is running well, but the problem is that I get some product names without the first word. So for example ...

ellie_in_wonderland

13

asked May 5 at 9:21

0 votes

0 answers

36 views

Selenium doesn't detect specific item when not being active in the tab in the browser but works if I am active

I am trying to scrape from a very specific site that has two things that doesn't work as expected: reject the cookie banner using reject button enter a specific section to get a view about the ...

CoMpUtEr1941

19

asked May 5 at 7:56

2 votes

1 answer

78 views

Scrapy Crawlspider does not work with 507 status code

I have a scrapy Crawlspider that parses reviews, using a scrapy-rotating-proxies. But when I tried to connect to the site I got the 507 status code. In ...

CollonelDain

31

asked May 3 at 13:08

2 votes

1 answer

164 views

How to programmatically zoom in on a webpage and improve <canvas> resolution (like Ctrl + does)? [closed]

In some webpages where there is a <canvas> element, when i tried every single method of making the browser bigger, the page bigger... I did found some methods that will make everything big, but ...

MOB

31

asked May 2 at 22:59

-2 votes

2 answers

80 views

BeautifulSoup Not Finding Table Headers on ClinicalTrials.gov Despite Inspect Element Showing Them

I want to use the Beautifulsoup library to fetch the clinical Trials data ("mitochondrial diseases") for my research studies. Although they have an API, I want to use web scraping. URL = ...

Gautam Sharma

1

asked Apr 29 at 17:35

1 vote

2 answers

194 views

Can't close cookie pop up on website with selenium webdriver

I am trying to use selenium to click the Accept all or Reject all button on a cookie pop up for the the website autotrader.co.uk, but I cannot get it to make the pop up disappear for some reason. This ...

teeeeee

785

asked Apr 26 at 1:47

2 votes

1 answer

88 views

python-requests-html render inconsistent result

background: by default the website is only showing few names and there s a "moreBtn" to generate the full list code idea: create Html session, render with script clicking the "moreBtn&...

Beginner

31

asked Apr 24 at 1:06

1 vote

1 answer

107 views

Rotating Authenticate Proxy In Selenium Using Chrome Driver

import zipfile import json import os from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.by import By import time def ...

fozan javaid

11

asked Apr 23 at 17:54

-1 votes

2 answers

56 views

Why am I getting no data using BeautifulSoup and requests when scraping a news website?

import requests from bs4 import BeautifulSoup url = "https://example-news-site.com" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)" } response =...

LANAYA88

1

asked Apr 23 at 13:45

1 vote

2 answers

87 views

Importing geographic data with WFS works on Chrome but not on Python

I am trying to pull a geojson file from here. The JSON appears as expected when I paste that link into Chrome or Safari. However, I get the following error every time when I run the following code on ...

opposity

121

asked Apr 21 at 8:21

0 votes

0 answers

169 views

yfinance returning empty dataframes

I have a simple yfinance program that is supposed to get financial statements for any company I pick. I run the code and it just returns an empty dataframe. My yfinance and everything else is up to ...

ridhamb

23

asked Apr 20 at 21:54

0 votes

0 answers

74 views

Scraper crate can't match <tr> without <table> – why?

I'm trying to write a simple web scraper using the scraper crate to learn Rust, and I encountered a weird (for me) problem. My find_element function can't find <tr> elements unless they're ...

crotylaldehyde

31

asked Apr 18 at 15:24

1 vote

2 answers

120 views

How to detect and scrape a specific language version of a multilingual publication, if available?

I wrote a python script for scraping data from WHO website, I wanted to retrieve Title, author name, date, pdf link and child page link from parent page (i applied some filters on parent page) I am ...

Mann Jain

11

asked Apr 17 at 4:42

0 votes

0 answers

30 views

How to fetch automatic refreshed token from XHR?

Some websites update JWT regulary to prevent scraping: in browser JS sends XHR to server to get fresh token- see the Token XHR on the picture below. Eg. curl "https://www.nemlig.com/webapi/Token&...

Igor Savinkin

6,369

asked Apr 16 at 15:09

0 votes

0 answers

183 views

Crawl4AI token threshold not applied to raw html in arun

Here’s a brief overview of what I want to achieve Extract raw htmls and save them Use Crawl4AI to produce a ‘cleaner’ and smaller HTML that has a lot of information, including what I will eventually ...

Leksa99

115

asked Apr 13 at 13:10

2 votes

2 answers

154 views

Why does Scrapfly API sometimes return 422 and sometimes 200 for the same JS scenario?

I'm using the Scrapfly API to scrape a webpage using a GET request with a JavaScript scenario like this: "js_scenario": [ { "fill": { "selector": "form#...

Asjad Gohar

1

asked Apr 12 at 17:15

-3 votes

1 answer

78 views

How to switch to a popup cookie consent page?

I'm using Python 3.12.3, Selenium 4.31.0, Firefox driver in Ubuntu 24.04. When I try to open an url, a cookie consent popup, asking to continue without accepting, accept and more options. How can I ...

Michael

117

asked Apr 12 at 12:04

-1 votes

2 answers

57 views

How to get attribute value from webpage by css_selector?

I'm using Python 3.12.3, Selenium 4.31.0, Firefox driver. How do I retrieve the class attribute in this html tag? <div class="costa-itinerary-tile FS03250425_BCN03A20" data-cc-cruise-id=&...

Michael

117

asked Apr 11 at 12:20

2 votes

1 answer

210 views

How to extract data from leaflet-generated pages

Is it possible to scrape the polygon data from this interactive map in R? 🔗 https://fogocruzado.org.br/mapadosgruposarmados The map shows territories controlled by armed groups across different years....

Maria Mittelbach

105

asked Apr 9 at 14:12

0 votes

1 answer

102 views

Python Selenium Scraping problem (combining multiple occurences into N in for loop, for n in x:

I'm trying to do some scraping for educational purposes, I just started and am fairly noob at python. My problem is, in selenium I am trying to scrape a product page, take the name, price, shipping ...

Lahearle

1

asked Apr 9 at 4:44

0 votes

2 answers

118 views

How do I web scrape this basic webpage to CSV?

https://www.cruisetimetables.com/invergordon-scotland-cruise-ship-schedule-2025.html So so far I have: scrape1 <- read_html('https://www.cruisetimetables.com/invergordon-scotland-cruise-ship-...

dreddlord

11

asked Apr 8 at 19:29

0 votes

1 answer

42 views

XHR Endpoint Returning Loading Page Data Only

I want to access the tables of the following website: https://www.marketbeat.com/ratings/ However, pages can only be changed by setting the "Reporting Date". I do know that I can change the ...

AmyTheGhostHunter

3

asked Apr 8 at 15:29

0 votes

0 answers

155 views

Extract values from dynamic table using VBA

I would like to extract data from a few tables from the following website: https://www2.bmf.com.br/pages/portal/bmfbovespa/boletim1/SistemaPregao1.asp?pagetype=pop&caminho=Resumo%20Estat%EDstico%...

Osvaldo Assunção

177

asked Apr 7 at 18:11

0 votes

1 answer

40 views

Issue Consuming API on Port 448 with Azure WAF: Request Rate Limiting and Proxy Handling

Context and Problem: I'm developing a service that consumes an API hosted on port 448. The API is protected by Azure's WAF V2, which limits requests: it allows only 150 consecutive requests, after ...

Alejandro Echeverria

1

asked Apr 7 at 13:59

0 votes

0 answers

132 views

Using the free-proxy library with requests to access general https websites

When basically requesting a proxy, what happens is that it delivers an http that currently seems to me to be unusable because the vast majority of sites use https and this causes the request to be ...

Digital Farmer

2,222

asked Apr 5 at 19:28

0 votes

0 answers

88 views

is there any method for managing memory usage in an infinite scrolling scraping?

I'm writing a Python Selenium scraper for a web page that uses infinite scrolling to load content dynamically. Over time, as more posts are loaded, the JavaScript heap memory usage in ChromeDriver ...

mohammad jcm

31

asked Apr 4 at 21:08

-1 votes

1 answer

59 views

Need help scraping FAA N Number database as I can't seem to communicate with the url

Trying to pull data from FAA N Number results but request.get() doesn't seem to be working. I followed this tutorial (https://www.youtube.com/watch?v=QhD015WUMxE) and was able to scrape the website he ...

Emily Stauf

1

asked Apr 4 at 8:28

0 votes

1 answer

74 views

Python scraper not returning value to Excel

I have a Python webscraper that pulls a specific value every 1 second. The target website it AJAXed, so I'm not hitting it with too many requests. This is the Python code: import time import logging ...

Matteo

1,136

asked Apr 3 at 23:29

0 votes

1 answer

78 views

node-fetch maximum redirect reached

I have this issue with node-fetch which returns error: ERR FetchError { message: 'maximum redirect reached at: https://www.alsa.es/o/Alsa-main-theme/images/web2020/logo-alsa.svg', type: '...

dalvi

57

asked Apr 2 at 12:04

1 vote

0 answers

65 views

Problems downloading files with Selenium + Scrapy

This code is supposed to download some documents it most locate within series of given links. While is does seemingly locate the link of the pdf file, its failing to download it. What might be the ...

42WaysToAnswerThat

371

asked Apr 2 at 1:03

0 votes

1 answer

51 views

Selenium script for Amazon UK postcode entry triggers CAPTCHA and fails to apply zip code

Automating Amazon UK with Selenium: Handling CAPTCHA, Setting Postcode, and Extracting Product Data I'm automating Amazon UK (www.amazon.co.uk) using Selenium to: Decline cookies (if present). Click ...

Luis swift

63

asked Apr 1 at 12:30

0 votes

3 answers

95 views

How to extract particular tags from soup using python?

From below webpages I like to extract data: https://www.ams.usda.gov/services/enforcement/organic/settlements https://www.ams.usda.gov/services/enforcement/organic/settlements-2023 "03/19/2025&...

Anjali Kushwaha

51

asked Apr 1 at 5:03

1 vote

1 answer

26 views

Python Selenium clicking a jquery dropdown to expose more dropdowns

I'm new to Selenium and trying to undertake a live example of web-scraping a list using the following URL - https://mcscertified.com/find-an-installer/ However, I'm struggling to click on a drop-drown ...

Lee Murray

575

asked Mar 31 at 16:32

0 votes

1 answer

51 views

SSLCertVerificationError even when certificate file provided?

SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1018) This is my code: import requests s = requests....

Identicon

445

asked Mar 31 at 8:53

-1 votes

2 answers

65 views

bs4 cannot extract text from an element

import requests from bs4 import BeautifulSoup url = 'https://www.tori.fi/recommerce/forsale/item/22362242' headers = {"User-Agent": "Mozilla/5.0"} response = requests.get(url, ...

Dotres

9

asked Mar 29 at 14:20

-1 votes

1 answer

139 views

Using IMPORTXML to scrape dropdown data from webpage

I am not well versed in using advanced datascraping and coding. However, there is a webpage ("https://www.ksestocks.com/HistoryHighLow") which carries a dropdown menu with various options. I ...

Huma Wakani

1

asked Mar 28 at 22:42

1 vote

1 answer

579 views

Power Automate Desktop: "The untyped object argument to the 'Text' function has an incorrect type" error

I am creating a flow in Microsoft Power Automate Desktop that retrieves the weather information for a user-specified city. The flow prompts the user to enter a city, then opens Google Search to get ...

Lucas Rijnberk

11

asked Mar 27 at 13:28

0 votes

0 answers

143 views

Accessing an element inside an iframe using nodriver

I have recently switched from selenium to nodriver for speed and stealth reasons. I am having trouble accessing elements inside an iframe even though material on this site and elsewhere says that '...

Stephen Smith

352

asked Mar 25 at 10:03

1 vote

1 answer

632 views

Cant take a Screenshot using Crawl4ai

I am currently trying to take a screenshot of a given web page using Crawl4ai, however each time that I try to do it I get an error or I don't get anything. Here is the code I used that is the same ...

Bernardo

41

asked Mar 24 at 19:49

1 vote

1 answer

633 views

Playwright Chromium Executable Missing or Version Mismatch in Python (VSCode)

I'm trying to use an llm ollama in python (vscode) to scrape data from a website. But whenever I run the code it gives an error: ERROR [browser] Failed to initialize Playwright browser: BrowserType....

Mohammad Abdullah

33

asked Mar 24 at 9:27

0 votes

0 answers

67 views

Autogen: How to get a simple csv from the search instead of using

I am trying to get data from the web. Unfortunately, I get this error. I suppose that it identifies a big file using a search and then tries to have a conversation about that entire file. Is there any ...

Karel Macek

1,189

asked Mar 24 at 4:50

-1 votes

1 answer

29 views

How to parse a site with authorization in kotlin using skrape it?

I am interested in a site that when you enter a login, it goes to a password entry page. How can I enter data in such a case?

semorka

3

asked Mar 23 at 18:45

0 votes

3 answers

109 views

how to turn html text into multiple different columns in r

this is the code i wrote to generate the data: info <- html_nodes(manga, ".mt4") %>% html_text2() %>% strsplit("\n") it returns 50 rows of lists that that look like this: [...

doot

1

asked Mar 23 at 6:54

Collectives™ on Stack Overflow