Scraping each element from each row from an HTML table

Question

Link to website: http://www.tennisabstract.com/cgi-bin/player-classic.cgi?p=RafaelNadal

I am trying to write code which goes through each row in a table and extracts each element from that row. I am aiming for an ouput in the following layout

Row1Element1, Row1Element2, Row1Element3 
Row2Element1, Row2Element2, Row2Element3
Row3Element1, Row3Element2, Row3Element3

I have had two major attempts at coding this.

Attempt 1:

rows = driver.find_elements_by_xpath('//table//body//tr')
elements = rows.find_elements_by_xpath('//td')
#this gets all rows in the table, but then gets all elements on the page, 
not just the table

Attempt 2:

driver.find_elements_by_xpath('//table//body//tr//td')
#this gets all the elements that I want, but makes no distinction to which 
 row each element belongs to

Any help is appreciated

Can you provide a link to the site you're scraping from?

Nick Reed
– Nick Reed

2019-10-08 02:44:49 +00:00
Commented Oct 8, 2019 at 2:44 — Nick Reed
– Nick Reed, Commented Oct 8, 2019 at 2:44
Sure, there you go

Michael
– Michael

2019-10-08 02:56:23 +00:00
Commented Oct 8, 2019 at 2:56 — Michael
– Michael, Commented Oct 8, 2019 at 2:56
Many tables in the pages, which the table you mean?

frianH
– frianH

2019-10-08 03:31:27 +00:00
Commented Oct 8, 2019 at 3:31 — frianH
– frianH, Commented Oct 8, 2019 at 3:31
The large one at the bottom

Michael
– Michael

2019-10-08 03:39:47 +00:00
Commented Oct 8, 2019 at 3:39 — Michael
– Michael, Commented Oct 8, 2019 at 3:39

Sers · Accepted Answer · 2019-10-08 07:10:47Z

1

You can get table headers and use indexes to get right sequence in the row data.

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("http://www.tennisabstract.com/cgi-bin/player-classic.cgi?p=RafaelNadal")

table_headers = [th.text.strip() for th in driver.find_elements_by_css_selector("#matchheader th")]
rows = driver.find_elements_by_css_selector("#matches tbody > tr")

date_index = table_headers.index("Date")
tournament_index = table_headers.index("Tournament")
score_index = table_headers.index("Score")

for row in rows:
    table_data = row.find_elements_by_tag_name("td")
    print(table_data[date_index].text, table_data[tournament_index].text, table_data[score_index].text)

answered Oct 8, 2019 at 7:10

Sers

12.3k2 gold badges14 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Sers Over a year ago

@Michael Happy to help. If this answer or any other one solved your issue, please mark it as accepted, how to accept the answer

frianH · Accepted Answer · 2019-10-08 05:24:57Z

0

This is the locator each rows the table you mean XPATH: //table[@id="matches"]//tbody//tr

First following import:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

Each rows:

driver.get('http://www.tennisabstract.com/cgi-bin/player-classic.cgi?p=RafaelNadal')

rows = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//table[@id="matches"]//tbody//tr')))

for row in rows:
    print(row.text)

Or each cells:

for row in rows:
    cols = row.find_elements_by_tag_name('td')
    for col in cols:
        print(col.text)

edited Oct 8, 2019 at 5:24

answered Oct 8, 2019 at 5:08

frianH

7,5916 gold badges26 silver badges49 bronze badges

Collectives™ on Stack Overflow

Scraping each element from each row from an HTML table

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related