-4

I am trying to open and convert my HTML file into a CSV so I can use it as a dataframe.

import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'file:///C:/Users/jessi/OneDrive/Documents/posts.html'
response = request.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

print(soup)

got this error: InvalidSchema: No connection adapters were found for 'file://C://Users//jessi//OneDrive//Documents//posts.html'

2
  • 2
    There's no web server here, so you don't need requests at all. Just open the file, read its contents into a string and pass it to BeautifulSoup. Commented Apr 27, 2023 at 0:31
  • The problem is that its a file URI, not http or https. Commented Apr 27, 2023 at 0:51

1 Answer 1

0

There's no server to request here. You have a simple file. Just read it.

from bs4 import BeautifulSoup
import pandas as pd
filename = 'C:/Users/jessi/OneDrive/Documents/posts.html'
soup = BeautifulSoup(open(filename).read(), 'html.parser')
print(soup)
Sign up to request clarification or add additional context in comments.

5 Comments

okay, I understand, but now I have this error: UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 17024: character maps to <undefined
That means your file is not a UTF-8 file. You need to know what character set it is in in order to read the file. WE can't tell you that without seeing the file. You can specify the character set in the open call.
Try open(filename, "rb") to see if the html parser can read the encoding.
@TimRoberts can I show you in Github what I am trying to open? Because I'm doing a project and I am a beginner
It would have been quicker just to post the link rather than ask permission. It can't be a typical Windows CP1252 file, because 0x8D is not defined there.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.