Turn table in selenium into a pandas data frame?

Question

I am trying to a scrape a table consisting og 45 columns and 7 rows. The table is loaded using ajax and I can't access the API. Thus I needed to use selenium in Python. I am close to get what I want but I don't know how I can turn my 'selenium find elements' into a Pandas DataFrame. So far, my code looks like this:

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import time

driver = webdriver.Chrome()
url = "http://www.hctiming.com/myphp/resources/login/browse_results.php?live_action=yes&smartphone_action=no" #a redirect to a login page occurs
driver.get(url)
driver.find_element_by_id("open").click()

user = driver.find_element_by_name("username")
password = driver.find_element_by_name("password")
user.clear()
user.send_keys("MyUserNameWhichIWillNotShare")
password.clear()
password.send_keys("myPasswordWhicI willNotShare")
driver.find_element_by_name("submit").click()

try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.LINK_TEXT, "Results Services")) # I must first click in this line
    )
    element.click()

    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.LINK_TEXT, "View Live")) # Then I must click in this link. Now I have access to the result database
    )
    element.click()

except:
    driver.quit()

time.sleep(5) #I have set a timesleep to 5 secunds. There must be a better way to accomplish this. I just want to make sure that the table is loaded when I try to scrape it

columns = len(driver.find_elements_by_xpath("/html/body/div[2]/div/form[3]/div[2]/div[1]/div/div/div/div[2]/div[4]/section[1]/div[2]/div/div/table/thead/tr[2]/th"))
rows = len(driver.find_elements_by_xpath("/html/body/div[2]/div/form[3]/div[2]/div[1]/div/div/div/div[2]/div[4]/section[1]/div[2]/div/div/table/tbody/tr"))
print(columns, rows)

The last code line prints 45 and 7. Thus, this seems to work. However, I don't understand how I can make a dataframe of it? Thank you.

Maciałek · Accepted Answer · 2020-08-26 09:48:17Z

1

It's hard to tell not seeing data structure, but if table is simple, you can try to parse it directly by pandas read_html.

df = pd.read_html(driver.page_source)[0]

You can also create datafame by iterating through all table elements properly manipulating xpath:

df = pd.DataFrame()
    for i in range(rows):
        s = pd.Series()
        for c in range(columns):
            s[c] = driver.find_elements_by_xpath(f"/html/body/div[2]/div/form[3]/div[2]/div[1]/div/div/div/div[2]/div[4]/section[1]/div[2]/div/div/table/tbody/tr[{i+1}]/td[{c+1}]")
        df = df.append(s, ignore_index=True)

answered Aug 26, 2020 at 9:48

Maciałek

2742 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Cmagelssen Over a year ago

Interesting. So if I understand correctly, then df = pd.read_html(driver.page_source)[0] somehow can browse through all the tables on a page? Because when I gave to [1], [2], and so forth, it gave me different tables on the page.

Maciałek Over a year ago

Exactly. It just looks for <table> elements and parse all of them. To minimize computing effort, you can pass attrs, that parser should look for. Just read the docs :)

Collectives™ on Stack Overflow

Turn table in selenium into a pandas data frame?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related