Newest 'html-parsing' Questions

0 votes

1 answer

35 views

BeautifulSoup - Extracting content blocks after specific subheadings within a larger section, ignoring document introduction

I am scraping the Dead by Daylight Fandom wiki (specifically TOME pages, e.g., https://deadbydaylight.fandom.com/wiki/Tome_1_-_Awakening) to extract memory logs. The goal is to extract the Memory ...

zeromiedo

1

asked 5 hours ago

0 votes

2 answers

202 views

Beautiful Soup, children are clearly inside but can't get it

From the below structure I only want value of href attribute. But rec_block is returning h5 element without its children so basically <h5 class="series">Recommendations</h5>. <...

Emby

1

asked Nov 25 at 18:27

0 votes

0 answers

56 views

Issue With Jsoup Document Selector

I'm using java spring boot and jsoup and recently I upgraded jsoup version to 1.21.1. My code creates search query and searches for it in the document Elements targetElements = document.select(...

user613

243

asked Nov 18 at 9:04

Advice

1 vote

0 replies

96 views

Parsing with Python html.parser: accessing and using raw tags

I'm not a Python specialist, so bear with me. I'm trying to replace a Perl HTML::TokeParser based parser that I use for template foreign language translation to use Python html.parser. Here's the ...

Hugh Barnard

352

asked Oct 29 at 10:50

3 votes

1 answer

61 views

Beautiful Soup; splitting a paragraph only by <br> where stripped_strings is not working

I'm rather new to using Beautiful Soup and I'm having some issues splitting some html correctly by only looking at html breaks and ignoring other html elements such as changes in font color etc. The ...

James Brian

33

asked Aug 30 at 17:29

0 votes

1 answer

47 views

Can we combine '[class="a"]' and '[id="c d"]' in the same command?

I have a html where I want to get elements with class="a" and id="c d". If I have only one of them, I can use soup.select('[class="a"]') and soup.select('[id="c d&...

Akira

2,820

asked May 25 at 13:27

0 votes

0 answers

99 views

parse marked customize for list

I've seen the docs https://marked.js.org/using_pro#renderer and it has no example for the list i want to customize more detail https://github.com/markedjs/marked/blob/master/src/Tokens.ts#L137 as the ...

zummon

996

asked Apr 4 at 0:57

4 votes

5 answers

184 views

How to extract links from an html page

I have an html page that has data like so: <td><a href="test-2025-03-24_17-05.log">test-2025-03-24_17-05.log</a></td> <td><a href="PASS_report_test_2025-...

Archie

389

asked Mar 26 at 20:23

3 votes

1 answer

93 views

Why isn't the end tag included in an ASIDE.OuterHTML

My intent was to give an advise on the question Delete everything between two strings (inclusive) to use the HTMLDocument parser instead of a text based replace command. But somehow the OuterHTML ...

iRon

24.4k

asked Mar 3 at 9:46

-1 votes

1 answer

51 views

why is my html parser not outputting wanted number

my programming teacher made us program in python a calculator for calculating fuel consummation in L/100KM and i decided to go further and even have it calculate the price per 100km but heres the ...

VXV

1

asked Feb 12 at 22:26

1 vote

0 answers

31 views

Passing CSRF token through Dart html parsing

I'm making an app where students can login to there portal website and it shows their data, however I'm having trouble authenticated users, when I did this project on another website I used ...

abtlb

11

asked Feb 12 at 9:54

1 vote

2 answers

92 views

Extracting text from Wikisource using BeautifulSoup returns empty result

I'm trying to extract the text of a book from a Wikisource page using BeautifulSoup, but the result is always empty. The page I'm working on is Le Père Goriot by Balzac. Here's the code I'm using: ...

Hugo Durif

13

asked Jan 30 at 21:31

-1 votes

2 answers

84 views

Parser on python returns an empty list (i guess its an HTML class selection issue)

The idea is: i wanna collect the name of the flat and its price as a list for every flat on the website. Ive made a simple parser on python, but looks like i cant get any values, since it returns an ...

Danny Mxxre

1

asked Jan 18 at 16:45

1 vote

1 answer

150 views

How can I scrape a table from baseball reference using pandas and beautiful soup? [duplicate]

I am trying to scrape the pitching stats on this url and then save the dataframe to a csv file. https://www.baseball-reference.com/boxes/ARI/ARI202204070.shtml My current code is below (Python 3.9.7) ...

Preston Albury

13

asked Jan 14 at 6:11

0 votes

1 answer

52 views

Duplicate extra data when webscraping fbref.com

I am trying to webscrape the league table for the EPL, but when I do that I am getting duplicate links as well as links to the teams that are not even in the premier league which makes no sense. Here ...

Vignesh

27

asked Dec 26, 2024 at 22:39

-1 votes

2 answers

117 views

I'M trying to scrape the website payscale.com to get some data there using BeautifulSoup, but i can't manage to get it no matter what i did [closed]

Here are my codes: `import pandas as pd import requests from bs4 import BeautifulSoup url = "https://www.payscale.com/college-salary-report/majors-that-pay-you-back/bachelors/" response = ...

Dave

33

asked Dec 25, 2024 at 16:43

1 vote

1 answer

168 views

How do I get to the root directory in C++?

I am building a web-server. I am trying to build a function handler that parses the index.html file in the root directory. It works but when I go to the website on my localhost 127.0.0.1:8080 I get ...

Codemon

11

asked Dec 21, 2024 at 17:49

1 vote

1 answer

65 views

How do I properly import preformatted text from a web site into Excel and it still look like preformatted text?

Where I work uses the Fire Weather Forecast product from the National Weather Service to produce a product for fire management officers that has the fire weather specific to their area. We have been ...

Giric Red Wolf

13

asked Dec 19, 2024 at 20:01

0 votes

1 answer

31 views

Code will not scroll down playlist to parse song names

Using beautifulsoup and selenium in python, I am trying to scroll down a list of songs in a playlist to parse the song names. The code however will not get past the first 30 songs and scroll down ...

BouckleyBoy

11

asked Dec 9, 2024 at 4:57

1 vote

1 answer

116 views

Internal Error while using angular compiler to parse html

I am creating a angular shcematics project to propose suggestions to my angular project. I am trying to use the built in angular compiler to parse the code because libraries such as parse5 and ...

Jonathan

461

asked Dec 1, 2024 at 23:58

-7 votes

1 answer

124 views

Replace the querystring of an href declaration in an <a> tag

I want to replace the following hyperlinks dynamically from <a href="/xsearch2?q=some search/21">21</a> to <a href="/xsearch2?q=some search&page=21">21</a&...

KTH Clips

1

asked Dec 1, 2024 at 2:40

0 votes

0 answers

60 views

Python regular expression not working as expected [duplicate]

I am having trouble parsing an HTML page on wikipedia. I want to get all text between two headings. I can get all text in the HTML wiki seperated by newline by executing the following in python: ...

MattJ

149

asked Nov 21, 2024 at 18:41

-1 votes

1 answer

30 views

Divs not being detected with BeautifulSoup

I am trying to parse https://rateyourmusic.com/release/album/tyler-the-creator/igor/reviews/1/ I can access the divs that have class_=review_body if I download the html files locally on to my system. ...

Nate

1

asked Nov 21, 2024 at 7:04

1 vote

1 answer

93 views

Perl's HTML::TableExtract does not see all the tables on Pro Football Reference pages

I am trying to extract data from an HTML table with perl, using HTML::TableExtract. Specifically I am trying to grab some rushing stats for the 2024 Baltimore Ravens from Pro Football Referemce. The ...

JimZipCode

111

asked Sep 27, 2024 at 3:59

0 votes

0 answers

78 views

jsoup converting '&' to '&amp' when I set the Element

I am trying to parse an html input using jsoup (v1.18.1), extract elements, extract each attribute value and replace as follows: > with &gt < with &lt The method I'm feeding this code ...

Pallavi

1

asked Sep 12, 2024 at 20:10

1 vote

0 answers

56 views

How to extract URL from HTML response in Android?

I have @GET endpoint in response to which I receive an html code with URLs. I need to reach the URL that comes after 200 code. Do you have any idea how to do it in Android? I already tried to use the ...

Alex20280

385

asked Sep 2, 2024 at 18:29

0 votes

0 answers

24 views

PHP Simple HTML DOM Parser not Returning Anything [duplicate]

I'm trying to use the PHP Simple HTML DOM Parser for the first time from here - https://simplehtmldom.sourceforge.io/docs/1.9/index.html Unfortunately, I'm having an issue where it's not returning ...

Lewis Hardisty

121

asked Aug 29, 2024 at 18:45

1 vote

1 answer

2k views

Getting a HTTP Response with nodriver

I'm using nodriver and it's not directly supporting network methods. But it does support for several CDP objects (network: https://ultrafunkamsterdam.github.io/nodriver/nodriver/cdp/network.html) and ...

Aca

67

asked Aug 29, 2024 at 13:47

-2 votes

1 answer

54 views

JavaScript for bookmarklet data extraction from an html monthly calendar schedule

I have a bookmarklet and JavaScript with which I am extracting data from an html table from a website. For the most part the script works fine however it parses the date wrong. The date, in the HTML ...

SystemWorks

181

asked Aug 28, 2024 at 8:47

0 votes

1 answer

73 views

Parse WhatsApp message read status [closed]

My question is more about html layout and parsing dynamic of content. My task: parse contacts who read my particular message in the Group. I tried to see DOM structure for the DIV block that hold that ...

Jeffrey Rasmussen

393

asked Aug 13, 2024 at 17:22

0 votes

1 answer

62 views

Xml Parse code working fine at my end, does not work at client region over same Html

I have written Apps Script code for Html Parsing using XmlParse. It works fine at my end, my browser and system language both are English as well as my Google Account's. But when I shared the same ...

Amna Irfan

3

asked Aug 10, 2024 at 18:07

0 votes

1 answer

36 views

How to get the page content of the link in right-click context menu

everyone. I've never did JS coding before but I needed a certain extension that I couldn't find in the shop. So I've decided to make my own. Here is the logic: when you right-click the link you get ...

aleksds1

3

asked Aug 5, 2024 at 7:20

2 votes

1 answer

133 views

How to handle self-closing tags without end-slash in html.parser.HTMLParser

By default it seems that html.parser.HTMLParser cannot handle self closing tags correctly, if they are not terminated using /. E.g. it handles <img src="asfd"/> fine, but it ...

flawr

11.7k

asked Aug 4, 2024 at 12:41

4 votes

1 answer

148 views

Treatment of superfluous closing tags depends on tag name

Unlike XHTML, HTML does not allow separate closing tags for empty-content elements like br and hr. The HTML validator gives an error end tag for element "..." which is not open in such ...

Heiko Theißen

18.2k

asked Jul 19, 2024 at 13:23

-1 votes

2 answers

36 views

code not running when webscraping weather data

I am trying to scrape earthquake weather data from USGS and my code runs up to the print(soup) line but nothing after that import requests from bs4 import BeautifulSoup url="https://earthquake....

Lumko Mtengwane

1

asked Jul 18, 2024 at 16:44

1 vote

1 answer

40 views

My Beautiful Soup library is not extracting out the all the anchor elements from a listed display

Hi so I am very new to web scraping and I am trying out the basics for it. Right now, I wanted to extract links from a root website (coventry.gov.uk). The problem was, however, I could not get the ...

Gs can't

23

asked Jul 7, 2024 at 18:02

0 votes

1 answer

60 views

Populating Spreadsheet(s) from email html table

I am not a programmer but I've been digging through the weeds to figure something out on my own and I'm stuck. I have a google spreadsheet with multiple sheets that I need to populate with content ...

notobella designs

1

asked Jul 5, 2024 at 15:56

0 votes

1 answer

112 views

Selenium doesn't find the popup button

I have this webpage (https://goldapple.ru/) on which I want to parse some data about cosmetics. However, when I open the webpage, the popup button appears, and I want to click the left "Да, верно&...

Alexei Rozhenko

1

asked Jun 30, 2024 at 16:36

1 vote

0 answers

57 views

Error: peg$SyntaxError: Expected Character but "&" Found While Parsing SVG Path Data in JavaScript

I am working with an SVG file and converting it to JSON using svgson library. Additionally, I am using the svg-path-to-polygons library to decode the d attribute in the path element. However, I am ...

HEMAL

430

asked Jun 26, 2024 at 9:53

1 vote

1 answer

65 views

How to return or parse chart data properly in React from google apps script?

In my implementation of adding charts to a react frontend, from gsheets, using an apps script backend, there seems to be some sort of an issue where my constructed base64 png string fails to be parsed ...

mayank

378

asked Jun 14, 2024 at 11:30

0 votes

0 answers

538 views

"unstructured" and langchain's "HTMLHeaderTextSplitter" ignores "pre" and/or "code" HTML tags

I want to read a webpage and split it into chunks to feed a vector database in a RAG pipeline. This webpage has python code examples on it, but I cannot create chunks with that code text, it is ...

Abraham Martín Expósito

29

asked May 29, 2024 at 11:02

-1 votes

1 answer

110 views

JavaScript/HTML beautifier - remove newlines around `html` tags

I'm using js-beautify to beautify my HTML like this: import { html_beautify } from 'js-beautify'; // later in the component html_beautify(localHtmlContent, {indent_size: 2}); which makes my html go ...

Filip Savic

3,363

asked May 28, 2024 at 8:32

0 votes

1 answer

312 views

Get Errors on HTML Content Using Jsoup for Java

I am building an application that receives HTML content as strings. I need to verify that these HTML strings are well-formed, meaning I want to parse them and detect lines with errors. During my ...

Juan Rojas

75

asked May 26, 2024 at 0:44

0 votes

1 answer

164 views

I'm having trouble scraping a table from Baseball Reference

As the title suggests, I'm having trouble scraping a table from Baseball Reference. I want to scrape the first 2 tables from here. To be clear, the ones titled "Team Standard Batting" and &...

pcm1113

1

asked May 21, 2024 at 2:50

-1 votes

1 answer

643 views

Web scraping 2nd table player stats from fbref.com [duplicate]

Was hoping for help here. I'm trying to web scrape this second table of player goal and shot creation stats on FB Ref for the MLS, but my script is bringing in the first table of team statistics ...

user15039720

1

asked May 13, 2024 at 23:22

0 votes

1 answer

134 views

How does HTML parser interact with Speculative parser

Whatwg spec describes conception of the speculative HTML parsing. So, there are many places in spec with the term active speculative parser. Spec says that HTML parser that owns an instance of ...

MaximPro

556

asked May 12, 2024 at 4:32

1 vote

1 answer

443 views

How to create vnodes from a string with html tags in vue 3

Research So I've found this answer on how to create a vnode list from a simple SVG with one path layer and how to transform that in Vue2. I could not find any good solutions for Vue 3, so I scaffolded ...

Nebulosar

1,897

asked Apr 15, 2024 at 15:12

0 votes

1 answer

41 views

Regex to parse comment blocks & parse their contents

I want a regex that will look at a string like this, get the "card" value from each of these comment blocks and also, TRUE if there is a "disabled":true or "hide":true ...

user18102663

57

asked Apr 11, 2024 at 23:16

1 vote

1 answer

247 views

Problem: How to scrape dynamically loaded data table in Python?

Python novice here. I have been learning how to scrape from various baseball sites (Fangraphs, Statcast, Rotowire). I have had success with a few different methods, but the Park Factors table on ...

gredow1979

11

asked Apr 10, 2024 at 20:58

0 votes

1 answer

1k views

Can we style html tags with flutter_widget_from_html package

I am getting html code and showing like below image using flutter_widget_from_html package. But now I need to style like it on the website. I tried to find a guide to do this but had no luck. I just ...

Kavinda Lochana

125

asked Mar 29, 2024 at 17:14

Collectives™ on Stack Overflow