1

I'm building a web crawler using Python 3.8. What I want to do is transform the table below in a pandas dataframe using Selenium, pandas, bs4.

<table class="no_border_top" width="95%" align="center">
    <tbody>
        <tr>
            <th align="left" valign="top" rowspan="100" width="8%">1.2.12</th>
            <th align="left" colspan="4">Outras cotas de Fundos de Investimento</th>          
        </tr>
        <tr>
            <td width="40%"><b>Fundo</b></td>
            <td width="22%"><b>CNPJ</b></td>
            <td width="15%"><b>Quantidade</b></td>           
       </tr>
        <tr>
            <td>Itaú Soberano RF Simples LP FICFI</td>
            <td>06.175.696/0001-73</td>
            <td>247.719,87</td>
            <td>11.996.245,91</td>            
        </tr>
        <tr>
            <td>Itaú TOP RF Referenciado DI FICFI</td>
            <td>05.902.521/0001-58</td>
            <td>77.085,90</td><td>372.686,27</td>
        </tr>
    </tbody>
</table>

Btw, this is the link from which I'm trying to scrape data:

The Problem

If you were able to open the link (and if that doesn't work out I'll edit this question posting some prints from the website html), you'll see that the table I'm interested is one of many tables inside a html document inside the webpage html I'm scraping info.

All the tables inside this nested html have the same class, they are all <table class="no_border_top" width="95%" align="center">...</table>. Selenium provides some tools to catch elements given the id, class, Xpath of the element and etc.

I've tried to grab this table using find_element_by_xpath()( passing the Xpath collected by copying the full XPath given by Chrome Dev Tools) and find_element_by_link_text(), but none of those options worked out for me. This is the code I'm using:

def make_selenium_browser():
    
    
    options = Options()
    options.headless = True
    options.add_argument('--no-sandbox')
    options.add_argument('--disable-dev-shm-usage')        

    browser = Firefox(options=options)

    return browser

url = 'https://fnet.bmfbovespa.com.br/fnet/publico/visualizarDocumento?id=111845&cvm=true'
browser = make_selenium_browser()
browser.get(url)

#not working
data = browser.find_element_by_xpath('/html/body/table[28]')

#this is also not working
browser.find_element_by_partial_link_text('Outras cotas de Fundos de Investimento')

The error

 File "/home/occhima/miniconda3/envs/wdev/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 462, in find_element_by_partial_link_text
    return self.find_element(by=By.PARTIAL_LINK_TEXT, value=link_text)
  File "/home/occhima/miniconda3/envs/wdev/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 976, in find_element
    return self.execute(Command.FIND_ELEMENT, {
  File "/home/occhima/miniconda3/envs/wdev/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/home/occhima/miniconda3/envs/wdev/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: Outras cotas de Fundos de Investimento

How can I make Selenium find the table I'm interested in?

1 Answer 1

1

Your target element is inside a frame:

<iframe src="exibirDocumento?id=111845&amp;cvm=true&amp;#toolbar=0" style="width: 100%; height: 800px">
  #document
    <html>
    ...
    ...
    ...
      <table class="no_border_top" width="95%" align="center">
        ...
        ...

You need to switch it first. Using browser.switch_to.frame(iframe_reference), like this:

url = 'https://fnet.bmfbovespa.com.br/fnet/publico/visualizarDocumento?id=111845&cvm=true'
browser = make_selenium_browser()
browser.get(url)
browser.switch_to.frame(browser.find_element_by_css_selector('iframe[src*=exibirDocumento]'))
data = browser.find_element_by_xpath('/html/body/table[28]')
print(data.text)

And you can also use the following xpath:

data = browser.find_element_by_xpath('//table[contains(., "Outras cotas de Fundos de Investimento")]')
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you so much, i did not knew that you could search for patterns inside XPath, where i can learn more about that?
Hi @Occhima glad it worked. For reference, one of them you can see here: w3schools.com/xml/xpath_axes.asp

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.