6,086 questions
0
votes
1
answer
35
views
BeautifulSoup - Extracting content blocks after specific subheadings within a larger section, ignoring document introduction
I am scraping the Dead by Daylight Fandom wiki (specifically TOME pages, e.g., https://deadbydaylight.fandom.com/wiki/Tome_1_-_Awakening) to extract memory logs.
The goal is to extract the Memory ...
0
votes
2
answers
202
views
Beautiful Soup, children are clearly inside but can't get it
From the below structure I only want value of href attribute. But rec_block is returning h5 element without its children so basically <h5 class="series">Recommendations</h5>.
<...
0
votes
0
answers
56
views
Issue With Jsoup Document Selector
I'm using java spring boot and jsoup and recently I upgraded jsoup version to 1.21.1.
My code creates search query and searches for it in the document
Elements targetElements = document.select(...
Advice
1
vote
0
replies
96
views
Parsing with Python html.parser: accessing and using raw tags
I'm not a Python specialist, so bear with me. I'm trying to replace a Perl HTML::TokeParser based parser that I use for template foreign language translation to use Python html.parser. Here's the ...
3
votes
1
answer
61
views
Beautiful Soup; splitting a paragraph only by <br> where stripped_strings is not working
I'm rather new to using Beautiful Soup and I'm having some issues splitting some html correctly by only looking at html breaks and ignoring other html elements such as changes in font color etc.
The ...
0
votes
1
answer
47
views
Can we combine '[class="a"]' and '[id="c d"]' in the same command?
I have a html where I want to get elements with class="a" and id="c d". If I have only one of them, I can use soup.select('[class="a"]') and soup.select('[id="c d&...
0
votes
0
answers
99
views
parse marked customize for list
I've seen the docs https://marked.js.org/using_pro#renderer and it has no example for the list i want to customize
more detail https://github.com/markedjs/marked/blob/master/src/Tokens.ts#L137 as the ...
4
votes
5
answers
184
views
How to extract links from an html page
I have an html page that has data like so:
<td><a href="test-2025-03-24_17-05.log">test-2025-03-24_17-05.log</a></td>
<td><a href="PASS_report_test_2025-...
3
votes
1
answer
93
views
Why isn't the end tag included in an ASIDE.OuterHTML
My intent was to give an advise on the question Delete everything between two strings (inclusive) to use the HTMLDocument parser instead of a text based replace command.
But somehow the OuterHTML ...
-1
votes
1
answer
51
views
why is my html parser not outputting wanted number
my programming teacher made us program in python a calculator for calculating fuel consummation in L/100KM and i decided to go further and even have it calculate the price per 100km but heres the ...
1
vote
0
answers
31
views
Passing CSRF token through Dart html parsing
I'm making an app where students can login to there portal website and it shows their data, however I'm having trouble authenticated users, when I did this project on another website I used ...
1
vote
2
answers
92
views
Extracting text from Wikisource using BeautifulSoup returns empty result
I'm trying to extract the text of a book from a Wikisource page using BeautifulSoup, but the result is always empty. The page I'm working on is Le Père Goriot by Balzac.
Here's the code I'm using:
...
-1
votes
2
answers
84
views
Parser on python returns an empty list (i guess its an HTML class selection issue)
The idea is: i wanna collect the name of the flat and its price as a list for every flat on the website.
Ive made a simple parser on python, but looks like i cant get any values, since it returns an ...
1
vote
1
answer
150
views
How can I scrape a table from baseball reference using pandas and beautiful soup? [duplicate]
I am trying to scrape the pitching stats on this url and then save the dataframe to a csv file.
https://www.baseball-reference.com/boxes/ARI/ARI202204070.shtml
My current code is below (Python 3.9.7)
...
0
votes
1
answer
52
views
Duplicate extra data when webscraping fbref.com
I am trying to webscrape the league table for the EPL, but when I do that I am getting duplicate links as well as links to the teams that are not even in the premier league which makes no sense.
Here ...
-1
votes
2
answers
117
views
I'M trying to scrape the website payscale.com to get some data there using BeautifulSoup, but i can't manage to get it no matter what i did [closed]
Here are my codes:
`import pandas as pd
import requests
from bs4 import BeautifulSoup
url = "https://www.payscale.com/college-salary-report/majors-that-pay-you-back/bachelors/"
response = ...
1
vote
1
answer
168
views
How do I get to the root directory in C++?
I am building a web-server. I am trying to build a function handler that parses the index.html file in the root directory. It works but when I go to the website on my localhost 127.0.0.1:8080 I get ...
1
vote
1
answer
65
views
How do I properly import preformatted text from a web site into Excel and it still look like preformatted text?
Where I work uses the Fire Weather Forecast product from the National Weather Service to produce a product for fire management officers that has the fire weather specific to their area. We have been ...
0
votes
1
answer
31
views
Code will not scroll down playlist to parse song names
Using beautifulsoup and selenium in python, I am trying to scroll down a list of songs in a playlist to parse the song names. The code however will not get past the first 30 songs and scroll down ...
1
vote
1
answer
116
views
Internal Error while using angular compiler to parse html
I am creating a angular shcematics project to propose suggestions to my angular project. I am trying to use the built in angular compiler to parse the code because libraries such as parse5 and ...
-7
votes
1
answer
124
views
Replace the querystring of an href declaration in an <a> tag
I want to replace the following hyperlinks dynamically
from
<a href="/xsearch2?q=some search/21">21</a>
to
<a href="/xsearch2?q=some search&page=21">21</a&...
0
votes
0
answers
60
views
Python regular expression not working as expected [duplicate]
I am having trouble parsing an HTML page on wikipedia. I want to get all text between two headings. I can get all text in the HTML wiki seperated by newline by executing the following in python:
...
-1
votes
1
answer
30
views
Divs not being detected with BeautifulSoup
I am trying to parse https://rateyourmusic.com/release/album/tyler-the-creator/igor/reviews/1/
I can access the divs that have class_=review_body if I download the html files locally on to my system. ...
1
vote
1
answer
93
views
Perl's HTML::TableExtract does not see all the tables on Pro Football Reference pages
I am trying to extract data from an HTML table with perl, using HTML::TableExtract.
Specifically I am trying to grab some rushing stats for the 2024 Baltimore Ravens from Pro Football Referemce. The ...
0
votes
0
answers
78
views
jsoup converting '&' to '&' when I set the Element
I am trying to parse an html input using jsoup (v1.18.1), extract elements, extract each attribute value and replace as follows:
> with >
< with <
The method I'm feeding this code ...
1
vote
0
answers
56
views
How to extract URL from HTML response in Android?
I have @GET endpoint in response to which I receive an html code with URLs. I need to reach the URL that comes after 200 code. Do you have any idea how to do it in Android?
I already tried to use the ...
0
votes
0
answers
24
views
PHP Simple HTML DOM Parser not Returning Anything [duplicate]
I'm trying to use the PHP Simple HTML DOM Parser for the first time from here - https://simplehtmldom.sourceforge.io/docs/1.9/index.html
Unfortunately, I'm having an issue where it's not returning ...
1
vote
1
answer
2k
views
Getting a HTTP Response with nodriver
I'm using nodriver and it's not directly supporting network methods. But it does support for several CDP objects (network: https://ultrafunkamsterdam.github.io/nodriver/nodriver/cdp/network.html) and ...
-2
votes
1
answer
54
views
JavaScript for bookmarklet data extraction from an html monthly calendar schedule
I have a bookmarklet and JavaScript with which I am extracting data from an html table from a website.
For the most part the script works fine however it parses the date wrong. The date, in the HTML ...
0
votes
1
answer
73
views
Parse WhatsApp message read status [closed]
My question is more about html layout and parsing dynamic of content.
My task: parse contacts who read my particular message in the Group.
I tried to see DOM structure for the DIV block that hold that ...
0
votes
1
answer
62
views
Xml Parse code working fine at my end, does not work at client region over same Html
I have written Apps Script code for Html Parsing using XmlParse. It works fine at my end, my browser and system language both are English as well as my Google Account's. But when I shared the same ...
0
votes
1
answer
36
views
How to get the page content of the link in right-click context menu
everyone. I've never did JS coding before but I needed a certain extension that I couldn't find in the shop. So I've decided to make my own. Here is the logic: when you right-click the link you get ...
2
votes
1
answer
133
views
How to handle self-closing tags without end-slash in html.parser.HTMLParser
By default it seems that html.parser.HTMLParser cannot handle self closing tags correctly, if they are not terminated using /. E.g. it handles <img src="asfd"/> fine, but it ...
4
votes
1
answer
148
views
Treatment of superfluous closing tags depends on tag name
Unlike XHTML, HTML does not allow separate closing tags for empty-content elements like br and hr. The HTML validator gives an error
end tag for element "..." which is not open
in such ...
-1
votes
2
answers
36
views
code not running when webscraping weather data
I am trying to scrape earthquake weather data from USGS and my code runs up to the print(soup) line but nothing after that
import requests
from bs4 import BeautifulSoup
url="https://earthquake....
1
vote
1
answer
40
views
My Beautiful Soup library is not extracting out the all the anchor elements from a listed display
Hi so I am very new to web scraping and I am trying out the basics for it. Right now, I wanted to extract links from a root website (coventry.gov.uk). The problem was, however, I could not get the ...
0
votes
1
answer
60
views
Populating Spreadsheet(s) from email html table
I am not a programmer but I've been digging through the weeds to figure something out on my own and I'm stuck. I have a google spreadsheet with multiple sheets that I need to populate with content ...
0
votes
1
answer
112
views
Selenium doesn't find the popup button
I have this webpage (https://goldapple.ru/) on which I want to parse some data about cosmetics. However, when I open the webpage, the popup button appears, and I want to click the left "Да, верно&...
1
vote
0
answers
57
views
Error: peg$SyntaxError: Expected Character but "&" Found While Parsing SVG Path Data in JavaScript
I am working with an SVG file and converting it to JSON using svgson library. Additionally, I am using the svg-path-to-polygons library to decode the d attribute in the path element. However, I am ...
1
vote
1
answer
65
views
How to return or parse chart data properly in React from google apps script?
In my implementation of adding charts to a react frontend, from gsheets, using an apps script backend, there seems to be some sort of an issue where my constructed base64 png string fails to be parsed ...
0
votes
0
answers
538
views
"unstructured" and langchain's "HTMLHeaderTextSplitter" ignores "pre" and/or "code" HTML tags
I want to read a webpage and split it into chunks to feed a vector database in a RAG pipeline. This webpage has python code examples on it, but I cannot create chunks with that code text, it is ...
-1
votes
1
answer
110
views
JavaScript/HTML beautifier - remove newlines around `html` tags
I'm using js-beautify to beautify my HTML like this:
import { html_beautify } from 'js-beautify';
// later in the component
html_beautify(localHtmlContent, {indent_size: 2});
which makes my html go ...
0
votes
1
answer
312
views
Get Errors on HTML Content Using Jsoup for Java
I am building an application that receives HTML content as strings. I need to verify that these HTML strings are well-formed, meaning I want to parse them and detect lines with errors.
During my ...
0
votes
1
answer
164
views
I'm having trouble scraping a table from Baseball Reference
As the title suggests, I'm having trouble scraping a table from Baseball Reference. I want to scrape the first 2 tables from here. To be clear, the ones titled "Team Standard Batting" and &...
-1
votes
1
answer
643
views
Web scraping 2nd table player stats from fbref.com [duplicate]
Was hoping for help here. I'm trying to web scrape this second table of player goal and shot creation stats on FB Ref for the MLS, but my script is bringing in the first table of team statistics ...
0
votes
1
answer
134
views
How does HTML parser interact with Speculative parser
Whatwg spec describes conception of the speculative HTML parsing.
So, there are many places in spec with the term active speculative parser. Spec says that HTML parser that owns an instance of ...
1
vote
1
answer
443
views
How to create vnodes from a string with html tags in vue 3
Research
So I've found this answer on how to create a vnode list from a simple SVG with one path layer and how to transform that in Vue2.
I could not find any good solutions for Vue 3, so I scaffolded ...
0
votes
1
answer
41
views
Regex to parse comment blocks & parse their contents
I want a regex that will look at a string like this, get the "card" value from each of these comment blocks and also, TRUE if there is a "disabled":true or "hide":true ...
1
vote
1
answer
247
views
Problem: How to scrape dynamically loaded data table in Python?
Python novice here. I have been learning how to scrape from various baseball sites (Fangraphs, Statcast, Rotowire). I have had success with a few different methods, but the Park Factors table on ...
0
votes
1
answer
1k
views
Can we style html tags with flutter_widget_from_html package
I am getting html code and showing like below image using flutter_widget_from_html package.
But now I need to style like it on the website.
I tried to find a guide to do this but had no luck.
I just ...