Finding elements that have same html attributes in Selenium

Question

I'm building a web crawler using Python 3.8. What I want to do is transform the table below in a pandas dataframe using Selenium, pandas, bs4.

<table class="no_border_top" width="95%" align="center">
    <tbody>
        <tr>
            <th align="left" valign="top" rowspan="100" width="8%">1.2.12</th>
            <th align="left" colspan="4">Outras cotas de Fundos de Investimento</th>          
        </tr>
        <tr>
            <td width="40%"><b>Fundo</b></td>
            <td width="22%"><b>CNPJ</b></td>
            <td width="15%"><b>Quantidade</b></td>           
       </tr>
        <tr>
            <td>Itaú Soberano RF Simples LP FICFI</td>
            <td>06.175.696/0001-73</td>
            <td>247.719,87</td>
            <td>11.996.245,91</td>            
        </tr>
        <tr>
            <td>Itaú TOP RF Referenciado DI FICFI</td>
            <td>05.902.521/0001-58</td>
            <td>77.085,90</td><td>372.686,27</td>
        </tr>
    </tbody>
</table>

Btw, this is the link from which I'm trying to scrape data:

https://fnet.bmfbovespa.com.br/fnet/publico/visualizarDocumento?id=111845&cvm=true

The Problem

If you were able to open the link (and if that doesn't work out I'll edit this question posting some prints from the website html), you'll see that the table I'm interested is one of many tables inside a html document inside the webpage html I'm scraping info.

All the tables inside this nested html have the same class, they are all <table class="no_border_top" width="95%" align="center">...</table>. Selenium provides some tools to catch elements given the id, class, Xpath of the element and etc.

I've tried to grab this table using find_element_by_xpath()( passing the Xpath collected by copying the full XPath given by Chrome Dev Tools) and find_element_by_link_text(), but none of those options worked out for me. This is the code I'm using:

def make_selenium_browser():
    
    
    options = Options()
    options.headless = True
    options.add_argument('--no-sandbox')
    options.add_argument('--disable-dev-shm-usage')        

    browser = Firefox(options=options)

    return browser

url = 'https://fnet.bmfbovespa.com.br/fnet/publico/visualizarDocumento?id=111845&cvm=true'
browser = make_selenium_browser()
browser.get(url)

#not working
data = browser.find_element_by_xpath('/html/body/table[28]')

#this is also not working
browser.find_element_by_partial_link_text('Outras cotas de Fundos de Investimento')

The error

 File "/home/occhima/miniconda3/envs/wdev/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 462, in find_element_by_partial_link_text
    return self.find_element(by=By.PARTIAL_LINK_TEXT, value=link_text)
  File "/home/occhima/miniconda3/envs/wdev/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 976, in find_element
    return self.execute(Command.FIND_ELEMENT, {
  File "/home/occhima/miniconda3/envs/wdev/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/home/occhima/miniconda3/envs/wdev/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: Outras cotas de Fundos de Investimento

How can I make Selenium find the table I'm interested in?

frianH · Accepted Answer · 2020-10-22 04:16:42Z

1

Your target element is inside a frame:

<iframe src="exibirDocumento?id=111845&amp;cvm=true&amp;#toolbar=0" style="width: 100%; height: 800px">
  #document
    <html>
    ...
    ...
    ...
      <table class="no_border_top" width="95%" align="center">
        ...
        ...

You need to switch it first. Using browser.switch_to.frame(iframe_reference), like this:

url = 'https://fnet.bmfbovespa.com.br/fnet/publico/visualizarDocumento?id=111845&cvm=true'
browser = make_selenium_browser()
browser.get(url)
browser.switch_to.frame(browser.find_element_by_css_selector('iframe[src*=exibirDocumento]'))
data = browser.find_element_by_xpath('/html/body/table[28]')
print(data.text)

And you can also use the following xpath:

data = browser.find_element_by_xpath('//table[contains(., "Outras cotas de Fundos de Investimento")]')

edited Oct 22, 2020 at 4:16

answered Oct 22, 2020 at 4:08

frianH

7,5916 gold badges26 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Occhima Over a year ago

Thank you so much, i did not knew that you could search for patterns inside XPath, where i can learn more about that?

frianH Over a year ago

Hi @Occhima glad it worked. For reference, one of them you can see here: w3schools.com/xml/xpath_axes.asp

Collectives™ on Stack Overflow

Finding elements that have same html attributes in Selenium

The Problem

The error

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

The Problem

The error

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related