I'm new to HTML and beautiful soup. I am trying to read a locally saved HTML file in Python and I tested the following code:
with open(file_path) as fp:
soup = BeautifulSoup(fp)
print(soup)
The output looks weird and here is a part of it:
<html><body><p>ÿþh t m l >
h e a d >
m e t a h t t p - e q u i v = C o n t e n t - T y p e c o n t e n t = " t e x t / h t m l ; c h a r s e t = u n i c o d e " >
m e t a n a m e = G e n e r a t o r c o n t e n t = " M i c r o s o f t W o r d 1 5 ( f i l t e r e d ) " >
s t y l e >
! - -
/ * F o n t D e f i n i t i o n s * /
The original HTML code is something like
<html>
<head>
<meta http-equiv=Content-Type content="text/html; charset=unicode">
<meta name=Generator content="Microsoft Word 15 (filtered)">
<style>
<!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
Can anyone help me or share some thoughts?
Thank you!