I'm building a web crawler using Python 3.8. What I want to do is transform the table below in a pandas dataframe using Selenium, pandas, bs4.
<table class="no_border_top" width="95%" align="center">
<tbody>
<tr>
<th align="left" valign="top" rowspan="100" width="8%">1.2.12</th>
<th align="left" colspan="4">Outras cotas de Fundos de Investimento</th>
</tr>
<tr>
<td width="40%"><b>Fundo</b></td>
<td width="22%"><b>CNPJ</b></td>
<td width="15%"><b>Quantidade</b></td>
</tr>
<tr>
<td>Itaú Soberano RF Simples LP FICFI</td>
<td>06.175.696/0001-73</td>
<td>247.719,87</td>
<td>11.996.245,91</td>
</tr>
<tr>
<td>Itaú TOP RF Referenciado DI FICFI</td>
<td>05.902.521/0001-58</td>
<td>77.085,90</td><td>372.686,27</td>
</tr>
</tbody>
</table>
Btw, this is the link from which I'm trying to scrape data:
The Problem
If you were able to open the link (and if that doesn't work out I'll edit this question posting some prints from the website html), you'll see that the table I'm interested is one of many tables inside a html document inside the webpage html I'm scraping info.
All the tables inside this nested html have the same class, they are all <table class="no_border_top" width="95%" align="center">...</table>. Selenium provides some tools to catch elements given the id, class, Xpath of the element and etc.
I've tried to grab this table using find_element_by_xpath()( passing the Xpath collected by copying the full XPath given by Chrome Dev Tools) and find_element_by_link_text(), but none of those options worked out for me. This is the code I'm using:
def make_selenium_browser():
options = Options()
options.headless = True
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
browser = Firefox(options=options)
return browser
url = 'https://fnet.bmfbovespa.com.br/fnet/publico/visualizarDocumento?id=111845&cvm=true'
browser = make_selenium_browser()
browser.get(url)
#not working
data = browser.find_element_by_xpath('/html/body/table[28]')
#this is also not working
browser.find_element_by_partial_link_text('Outras cotas de Fundos de Investimento')
The error
File "/home/occhima/miniconda3/envs/wdev/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 462, in find_element_by_partial_link_text
return self.find_element(by=By.PARTIAL_LINK_TEXT, value=link_text)
File "/home/occhima/miniconda3/envs/wdev/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 976, in find_element
return self.execute(Command.FIND_ELEMENT, {
File "/home/occhima/miniconda3/envs/wdev/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/home/occhima/miniconda3/envs/wdev/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: Outras cotas de Fundos de Investimento
How can I make Selenium find the table I'm interested in?