I am extracting a table from my customer website and I need to parse this HTML into a Pandas dataframe. However, on the table I want to store all the HREFs into my dataframe. My HTML has the following schema:
<table>
<tr>
<th>Col_1</th>
<th>Col_2</th>
<th>Col_3</th>
<th>Col_4</th>
<th>Col_5</th>
<th>Col_6</th>
<th>Col_7</th>
<th>Col_8</th>
<th>Col_9</th>
</tr>
<tr>
<td>Office</td>
<td>Office2</td>
<td>Customer</td>
<td></td>
<td><a href="test12345_163">New Doc</a><br><a href="test12345_163">my_work.jpg</a></td>
<td><a href="test12345_163">Person_2</a><br><a href="test12345_163">Person_3</a><br><a href="test12345_163">Person 3</a></td>
<td><a href="test12345_163">Person_1</a><br><a href="test12345_163">Person_1</a><br><a href="test12345_163">Person_1</a><br><a href="test12345_163">Person_1</a></td>
<td>STATUS</td>
<td>9030303</td>
</tr>
</table>
I have this code:
soup = BeautifulSoup(page.content, "html.parser")
html_table = soup.find('table')
df = pd.read_html(str(html_table), header=0)[0]
df['Link'] = [link.get('href') for link in html_table.find_all('a')]
I am just trying to create a column with all the links from each index (if has more than one, then group it). But when I run this code I got:
Length of values (1102) does not match length of index (435)
What I am doing wrong?
Thanks!

