Selecting specific table cells in Selenium web driver (Python)

Question

I am trying to extract the information from a link from a page that is structured as such:

...

<td align="left" bgcolor="#FFFFFF">$725,000</td>

<td align="left" bgcolor="#FFFFFF"> Available</td>

*<td align="left" bgcolor="#FFFFFF">
    <a href="/washington">


 Washington Street Studios
<br>1410 Washington Street SW<br>Albany, Oregon, 97321
</a>
</td>*

<td align="center" bgcolor="#FFFFFF">15</td>

<td align="center" bgcolor="#FFFFFF">8.49%</td>

<td align="center" bgcolor="#FFFFFF">$48,333</td>

</tr>

I tried targeting elements with attribute 'align = left' and iterating over it but that didn't work out. If anybody could help me locate the element <a href = "/washington"> (multiple tags like these within the same page) with selenium I would appreciate it.

Can you post more tr rows so we can get a clear picture of where the desired links are located? Thanks. — alecxe
– alecxe, Commented Sep 10, 2015 at 12:04

boon kwee · Accepted Answer · 2015-09-10 07:06:13Z

I would use lxml instead, if it is just to process hxml...

It would be helpful if you're more specific, but you can try this if you are traversing links in a webpage..

from lxml.html import parse

pdoc = parse(url_of_webpage)
doc = pdoc.getroot()
list_of_links = [i[2] for i in  doc.iterlinks()]

list_of_links will look like ['/en/images/logo_com.gif', 'http://www.brand.com/', '/en/images/logo.gif']

doc.iterlinks() will look for all links such as form, img, a-tags and yield lists containing Element object containing the tag, the type of tag (form, a or img), the url and a number, so the line

list_of_links = [i[2] for i in  doc.iterlinks()]

simply grab the url and returns as a separate list.

Note that the retrieved url is relative. As in you will see urls like

'/en/images/logo_com.gif'

instead of

'http://somedomain.com/en/images/logo_com.gif'

if you want to have the latter kind of url, add the code

from lxml.html import parse
pdoc = parse(url_of_webpage)
doc = pdoc.getroot()
doc.make_links_absolute()     #  add this line
list_of_links = [i[2] for i in  doc.iterlinks()]

If you are processing the url one by one, then simply modify the code to something like

for i in iterlinks():
    url = i[2]
    # some processing here with url...

Finally, if for some reason you need selenium to come in to get the webpage content, then simply add the following to the beginning

from selenium import webdriver
from StringIO import StringIO

browser = webdriver.Firefox()
browser.get(url)
doc = parse(StringIO(browser.page_source)).getroot()

alecxe · Accepted Answer · 2015-09-10 12:07:17Z

0

From what we have provided at the moment, there is a table and you have the desired links in a specific column. There are no "data-oriented" attributes to rely on, but using column index to locate the links looks good enough:

for row in driver.find_elements_by_css_selector("table#myid tr"):
    cells = row.find_elements_by_tag_name("td")

    print(cells[2].text)  # put a correct index here

answered Sep 10, 2015 at 12:07

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Collectives™ on Stack Overflow

Selecting specific table cells in Selenium web driver (Python)

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related