1

UPDATE: Using lxml instead of html.parser helped solve the problem, as Freddier suggested in the answer below!

I am trying to webscrape some information off of this website: https://www.ticketmonster.co.kr/deal/952393926.

I get an error when I run soup(thispage, 'html.parser) but this error only happens for this specific page. Does anyone know why this is happening?

The code I have so far is very simple:

from bs4 import BeautifulSoup as soup

openU = urlopen(url)
thispage = openU.read()
open.close()

pageS = soup(thispage, 'html.parser')

The error I get is:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Kathy\AppData\Local\Programs\Python\Python36\lib\site-packages\bs4\__init__.py", line 228, in __init__
    self._feed()
  File "C:\Users\Kathy\AppData\Local\Programs\Python\Python36\lib\site- packages\bs4\__init__.py", line 289, in _feed
    self.builder.feed(self.markup)
  File "C:\Users\Kathy\AppData\Local\Programs\Python\Python36\lib\site-packages\bs4\builder\_htmlparser.py", line 215, in feed
    parser.feed(markup)
  File "C:\Users\Kathy\AppData\Local\Programs\Python\Python36\lib\html\parser.py", line 111, in feed
    self.goahead(0)
  File "C:\Users\Kathy\AppData\Local\Programs\Python\Python36\lib\html\parser.py", line 179, in goahead
    k = self.parse_html_declaration(i)
  File "C:\Users\Kathy\AppData\Local\Programs\Python\Python36\lib\html\parser.py", line 264, in parse_html_declaration
    return self.parse_marked_section(i)
  File "C:\Users\Kathy\AppData\Local\Programs\Python\Python36\lib\_markupbase.py", line 149, in parse_marked_section
    sectName, j = self._scan_name( i+3, i )
  File "C:\Users\Kathy\AppData\Local\Programs\Python\Python36\lib\_markupbase.py", line 391, in _scan_name
    % rawdata[declstartpos:declstartpos+20])
  File "C:\Users\Kathy\AppData\Local\Programs\Python\Python36\lib\_markupbase.py", line 34, in error
    "subclasses of ParserBase must override error()")
NotImplementedError: subclasses of ParserBase must override error()

Please help!

1 Answer 1

2

Try using

pageS = soup(thispage, 'lxml')

insted of

pageS = soup(thispage, 'html.parser')

It looks may be a problem with characters encoding using "html.parser"

Sign up to request clarification or add additional context in comments.

3 Comments

Please do not post images of code or data. Copy from your editor/ide and paste it as text formatted as code - Formatting help .
"As you are in Python3, is preferable to use mechanicalsoup" How do you figure that? bs4 is widely used and I've never even heard of mechanicalsoup.
sorry @wwii I did want to show the code result. I'll edit add the code

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.