51,743 questions
0
votes
2
answers
98
views
nodriver crashes with infinite recursion in headless mode when running browser.get()
Here is the code I'm trying to run:
import nodriver as uc
async def main():
browser = await uc.start(headless=True)
page = await browser.get('https://www.nowsecure.nl')
if __name__ == '...
-3
votes
1
answer
94
views
Issue in scraping data
I have an issue in scraping schools data. I need their email and website URL. I tried a lot but it's returning empty results.
What's the best way to do this?
Here is the code:
from selenium import ...
1
vote
1
answer
47
views
Request breaks down with field change (scraping)
I am trying to download some data (in an efficient way), however I encountered an unexpected problem.
Here is the code that works just fine:
import requests
import os
from bs4 import BeautifulSoup
...
2
votes
1
answer
77
views
Scrapling Save PDF File Locally
I have to scrape a website which returns a static PDF file. The only Python package that can access the document successfully is scrapling. However, the PDF file returned is not saved correctly in my ...
0
votes
0
answers
46
views
Selenium + ChromeDriver crash when scraping Google Maps entries with Selenium on MacOS
I’m building a Telegram bot in Python that scrapes points of interest (POIs) from Google Maps using Selenium. The overall flow works, but as soon as I try to scroll through and click on the .hfpxzc ...
0
votes
0
answers
38
views
Selenium wire failing to intercept request randomly
I have a scraping script that navigates to a website that contains the widget I want to scrape. The driver clicks a button that should trigger a xhr request and has never failed to do so in my manual ...
1
vote
0
answers
900
views
Why does yt-dlp work locally but fail with a bot error in production (live environment) even with cookies?
I'm trying to download YouTube videos using the yt-dlp Python library in a deployed web application. The same script works perfectly on my local machine, but in the production (live) environment, it ...
-7
votes
1
answer
71
views
Amazon web scraping with python [closed]
I am scraping an Amazone page using Python and saving the result into a csv file. This code is running well, but the problem is that I get some product names without the first word.
So for example ...
0
votes
0
answers
36
views
Selenium doesn't detect specific item when not being active in the tab in the browser but works if I am active
I am trying to scrape from a very specific site that has two things that doesn't work as expected:
reject the cookie banner using reject button
enter a specific section to get a view about the ...
2
votes
1
answer
78
views
Scrapy Crawlspider does not work with 507 status code
I have a scrapy Crawlspider that parses reviews, using a scrapy-rotating-proxies.
But when I tried to connect to the site I got the 507 status code. In ...
2
votes
1
answer
164
views
How to programmatically zoom in on a webpage and improve <canvas> resolution (like Ctrl + does)? [closed]
In some webpages where there is a <canvas> element, when i tried every single method of making the browser bigger, the page bigger... I did found some methods that will make everything big, but ...
-2
votes
2
answers
80
views
BeautifulSoup Not Finding Table Headers on ClinicalTrials.gov Despite Inspect Element Showing Them
I want to use the Beautifulsoup library to fetch the clinical Trials data ("mitochondrial diseases") for my research studies. Although they have an API, I want to use web scraping.
URL = ...
1
vote
2
answers
194
views
Can't close cookie pop up on website with selenium webdriver
I am trying to use selenium to click the Accept all or Reject all button on a cookie pop up for the the website autotrader.co.uk, but I cannot get it to make the pop up disappear for some reason.
This ...
2
votes
1
answer
88
views
python-requests-html render inconsistent result
background:
by default the website is only showing few names and there s a "moreBtn" to generate the full list
code idea:
create Html session, render with script clicking the "moreBtn&...
1
vote
1
answer
107
views
Rotating Authenticate Proxy In Selenium Using Chrome Driver
import zipfile
import json
import os
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import time
def ...
-1
votes
2
answers
56
views
Why am I getting no data using BeautifulSoup and requests when scraping a news website?
import requests
from bs4 import BeautifulSoup
url = "https://example-news-site.com"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
}
response =...
1
vote
2
answers
87
views
Importing geographic data with WFS works on Chrome but not on Python
I am trying to pull a geojson file from here.
The JSON appears as expected when I paste that link into Chrome or Safari. However, I get the following error every time when I run the following code on ...
0
votes
0
answers
169
views
yfinance returning empty dataframes
I have a simple yfinance program that is supposed to get financial statements for any company I pick. I run the code and it just returns an empty dataframe. My yfinance and everything else is up to ...
0
votes
0
answers
74
views
Scraper crate can't match <tr> without <table> – why?
I'm trying to write a simple web scraper using the scraper crate to learn Rust, and I encountered a weird (for me) problem. My find_element function can't find <tr> elements unless they're ...
1
vote
2
answers
120
views
How to detect and scrape a specific language version of a multilingual publication, if available?
I wrote a python script for scraping data from WHO website, I wanted to retrieve Title, author name, date, pdf link and child page link from parent page (i applied some filters on parent page)
I am ...
0
votes
0
answers
30
views
How to fetch automatic refreshed token from XHR?
Some websites update JWT regulary to prevent scraping: in browser JS sends XHR to server to get fresh token- see the Token XHR on the picture below. Eg.
curl "https://www.nemlig.com/webapi/Token&...
0
votes
0
answers
183
views
Crawl4AI token threshold not applied to raw html in arun
Here’s a brief overview of what I want to achieve
Extract raw htmls and save them
Use Crawl4AI to produce a ‘cleaner’ and smaller HTML that has a lot of information, including what I will eventually ...
2
votes
2
answers
154
views
Why does Scrapfly API sometimes return 422 and sometimes 200 for the same JS scenario?
I'm using the Scrapfly API to scrape a webpage using a GET request with a JavaScript scenario like this:
"js_scenario": [
{ "fill": { "selector": "form#...
-3
votes
1
answer
78
views
How to switch to a popup cookie consent page?
I'm using Python 3.12.3, Selenium 4.31.0, Firefox driver in Ubuntu 24.04.
When I try to open an url, a cookie consent popup, asking to continue without accepting, accept and more options. How can I ...
-1
votes
2
answers
57
views
How to get attribute value from webpage by css_selector?
I'm using Python 3.12.3, Selenium 4.31.0, Firefox driver.
How do I retrieve the class attribute in this html tag?
<div class="costa-itinerary-tile FS03250425_BCN03A20" data-cc-cruise-id=&...
2
votes
1
answer
210
views
How to extract data from leaflet-generated pages
Is it possible to scrape the polygon data from this interactive map in R?
🔗 https://fogocruzado.org.br/mapadosgruposarmados
The map shows territories controlled by armed groups across different years....
0
votes
1
answer
102
views
Python Selenium Scraping problem (combining multiple occurences into N in for loop, for n in x:
I'm trying to do some scraping for educational purposes, I just started and am fairly noob at python.
My problem is, in selenium I am trying to scrape a product page, take the name, price, shipping ...
0
votes
2
answers
118
views
How do I web scrape this basic webpage to CSV?
https://www.cruisetimetables.com/invergordon-scotland-cruise-ship-schedule-2025.html
So so far I have:
scrape1 <- read_html('https://www.cruisetimetables.com/invergordon-scotland-cruise-ship-...
0
votes
1
answer
42
views
XHR Endpoint Returning Loading Page Data Only
I want to access the tables of the following website:
https://www.marketbeat.com/ratings/
However, pages can only be changed by setting the "Reporting Date".
I do know that I can change the ...
0
votes
0
answers
155
views
Extract values from dynamic table using VBA
I would like to extract data from a few tables from the following website:
https://www2.bmf.com.br/pages/portal/bmfbovespa/boletim1/SistemaPregao1.asp?pagetype=pop&caminho=Resumo%20Estat%EDstico%...
0
votes
1
answer
40
views
Issue Consuming API on Port 448 with Azure WAF: Request Rate Limiting and Proxy Handling
Context and Problem:
I'm developing a service that consumes an API hosted on port 448. The API is protected by Azure's WAF V2, which limits requests: it allows only 150 consecutive requests, after ...
0
votes
0
answers
132
views
Using the free-proxy library with requests to access general https websites
When basically requesting a proxy, what happens is that it delivers an http that currently seems to me to be unusable because the vast majority of sites use https and this causes the request to be ...
0
votes
0
answers
88
views
is there any method for managing memory usage in an infinite scrolling scraping?
I'm writing a Python Selenium scraper for a web page that uses infinite scrolling to load content dynamically. Over time, as more posts are loaded, the JavaScript heap memory usage in ChromeDriver ...
-1
votes
1
answer
59
views
Need help scraping FAA N Number database as I can't seem to communicate with the url
Trying to pull data from FAA N Number results but request.get() doesn't seem to be working.
I followed this tutorial (https://www.youtube.com/watch?v=QhD015WUMxE) and was able to scrape the website he ...
0
votes
1
answer
74
views
Python scraper not returning value to Excel
I have a Python webscraper that pulls a specific value every 1 second. The target website it AJAXed, so I'm not hitting it with too many requests.
This is the Python code:
import time
import logging
...
0
votes
1
answer
78
views
node-fetch maximum redirect reached
I have this issue with node-fetch which returns error:
ERR FetchError {
message: 'maximum redirect reached at: https://www.alsa.es/o/Alsa-main-theme/images/web2020/logo-alsa.svg',
type: '...
1
vote
0
answers
65
views
Problems downloading files with Selenium + Scrapy
This code is supposed to download some documents it most locate within series of given links.
While is does seemingly locate the link of the pdf file, its failing to download it. What might be the ...
0
votes
1
answer
51
views
Selenium script for Amazon UK postcode entry triggers CAPTCHA and fails to apply zip code
Automating Amazon UK with Selenium: Handling CAPTCHA, Setting Postcode, and Extracting Product Data
I'm automating Amazon UK (www.amazon.co.uk) using Selenium to:
Decline cookies (if present).
Click ...
0
votes
3
answers
95
views
How to extract particular tags from soup using python?
From below webpages I like to extract data:
https://www.ams.usda.gov/services/enforcement/organic/settlements https://www.ams.usda.gov/services/enforcement/organic/settlements-2023
"03/19/2025&...
1
vote
1
answer
26
views
Python Selenium clicking a jquery dropdown to expose more dropdowns
I'm new to Selenium and trying to undertake a live example of web-scraping a list using the following URL - https://mcscertified.com/find-an-installer/
However, I'm struggling to click on a drop-drown ...
0
votes
1
answer
51
views
SSLCertVerificationError even when certificate file provided?
SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1018)
This is my code:
import requests
s = requests....
-1
votes
2
answers
65
views
bs4 cannot extract text from an element
import requests
from bs4 import BeautifulSoup
url = 'https://www.tori.fi/recommerce/forsale/item/22362242'
headers = {"User-Agent": "Mozilla/5.0"}
response = requests.get(url, ...
-1
votes
1
answer
139
views
Using IMPORTXML to scrape dropdown data from webpage
I am not well versed in using advanced datascraping and coding. However, there is a webpage ("https://www.ksestocks.com/HistoryHighLow") which carries a dropdown menu with various options. I ...
1
vote
1
answer
579
views
Power Automate Desktop: "The untyped object argument to the 'Text' function has an incorrect type" error
I am creating a flow in Microsoft Power Automate Desktop that retrieves the weather information for a user-specified city. The flow prompts the user to enter a city, then opens Google Search to get ...
0
votes
0
answers
143
views
Accessing an element inside an iframe using nodriver
I have recently switched from selenium to nodriver for speed and stealth reasons. I am having trouble accessing elements inside an iframe even though material on this site and elsewhere says that '...
1
vote
1
answer
632
views
Cant take a Screenshot using Crawl4ai
I am currently trying to take a screenshot of a given web page using Crawl4ai, however each time that I try to do it I get an error or I don't get anything.
Here is the code I used that is the same ...
1
vote
1
answer
633
views
Playwright Chromium Executable Missing or Version Mismatch in Python (VSCode)
I'm trying to use an llm ollama in python (vscode) to scrape data from a website. But whenever I run the code it gives an error:
ERROR [browser] Failed to initialize Playwright browser: BrowserType....
0
votes
0
answers
67
views
Autogen: How to get a simple csv from the search instead of using
I am trying to get data from the web. Unfortunately, I get this error. I suppose that it identifies a big file using a search and then tries to have a conversation about that entire file. Is there any ...
-1
votes
1
answer
29
views
How to parse a site with authorization in kotlin using skrape it?
I am interested in a site that when you enter a login, it goes to a password entry page. How can I enter data in such a case?
0
votes
3
answers
109
views
how to turn html text into multiple different columns in r
this is the code i wrote to generate the data:
info <- html_nodes(manga, ".mt4") %>% html_text2() %>% strsplit("\n")
it returns 50 rows of lists that that look like this:
[...