Convert data to DataFrame in python

Question

With the help of @JaSON, here's a code that enables me to get the data in the table from local html and the code uses selenium

from selenium import webdriver

driver = webdriver.Chrome("C:/chromedriver.exe")
driver.get('file:///C:/Users/Future/Desktop/local.html')
counter = len(driver.find_elements_by_id("Section3"))
xpath = "//div[@id='Section3']/following-sibling::div[count(preceding-sibling::div[@id='Section3'])={0} and count(following-sibling::div[@id='Section3'])={1}]"
print(counter)

for i in range(counter):
    print('\nRow #{} \n'.format(i + 1))
    _xpath = xpath.format(i + 1, counter - (i + 1))
    cells = driver.find_elements_by_xpath(_xpath)
    for cell in cells:
         value = cell.find_element_by_xpath(".//td").text
         print(value)

How can these rows converted to be valid table that I can export to csv file? Here's the local HTML link https://pastebin.com/raw/hEq8K75C

** @Paul Brennan: After trying to edit counter to be counter-1 I got 17 rows to skip the error of row 18 temporarily, I got the filename.txt and here's snapshot of the output

stackoverflow.com/questions/45394374/… this will answer your problem. I could not tailor it to your solution as we cannot see your local HTML. — Paul Brennan
– Paul Brennan, Commented Dec 8, 2020 at 11:36

YasserKhalil · Accepted Answer · 2020-12-11 04:12:15Z

1

I have modified your code to do a simple output. This is not very pythonic as it does not use vectorized creation of the Dataframe, but here is how it works. First set up pandas second set up a dataframe (but we don't know the columns yet) then set up the columns on the first pass (this will cause problems if there are variable column lengths Then input the values into the dataframe

import pandas as pd
from selenium import webdriver

driver = webdriver.Chrome("C:/chromedriver.exe")
driver.get('file:///C:/Users/Future/Desktop/local.html')
counter = len(driver.find_elements_by_id("Section3"))
xpath = "//div[@id='Section3']/following-sibling::div[count(preceding-sibling::div[@id='Section3'])={0} and count(following-sibling::div[@id='Section3'])={1}]"
print(counter)

df = pd.Dataframe()

for i in range(counter):
    print('\nRow #{} \n'.format(i + 1))
    _xpath = xpath.format(i + 1, counter - (i + 1))
    cells = driver.find_elements_by_xpath(_xpath)
    if i == 0:
        df = pd.DataFrame(columns=cells) # fill the dataframe with the column names
    for cell in cells:
        value = cell.find_element_by_xpath(".//td").text
        #print(value)
        if not value:  # check the string is not empty
            # always puting the value in the first item
            df.at[i, 0] = value # put the value in the frame

df.to_csv('filename.txt') # output the dataframe to a file

How this could be made better is to put the items in a row into a dictionary and put them into the datframe. but I am writing this on my phone so I cannot test that.

edited Dec 11, 2020 at 4:12

YasserKhalil

9,60813 gold badges62 silver badges135 bronze badges

answered Dec 8, 2020 at 12:56

Paul Brennan

2,7364 gold badges23 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

YasserKhalil Over a year ago

Thanks a lot for great help. After printing the data fro row 18, I got error Message: no such element: Unable to locate element: {"method":"xpath","selector":".//td"}. The error refers to that line value = cell.find_element_by_xpath(".//td").text and there is no file exported. As for the number of column are 10 (You can have a look at the HTML local file in the browser)

YasserKhalil Over a year ago

I have skipped row 18 to get the filename as output. Attached snapshot in the main post.

YasserKhalil Over a year ago

@QHarr I am sure you have experience at this field.

Paul Brennan Over a year ago

How about we skip the blank lines... That will get the comma out.

YasserKhalil Over a year ago

I have used try.. except and after except I put break.this fixes the error and now I can get all the data in the same order of rows. How to get the data to be dataframe as the table on the webpage appears as I noticed the values are not in order? And I need to skip blank lines too.

|

YasserKhalil · Accepted Answer · 2020-12-11 04:32:44Z

With the great help of @Paul Brennan, I could modify the code so as to get the final desired output

import pandas as pd
from selenium import webdriver

driver = webdriver.Chrome("C:/chromedriver.exe")
driver.get('file:///C:/Users/Future/Desktop/local.html')
counter = len(driver.find_elements_by_id("Section3"))
xpath = "//div[@id='Section3']/following-sibling::div[count(preceding-sibling::div[@id='Section3'])={0} and count(following-sibling::div[@id='Section3'])={1}]"
finallist = []

for i in range(counter):
    #print('\nRow #{} \n'.format(i + 1))
    rowlist=[]
    _xpath = xpath.format(i + 1, counter - (i + 1))
    cells = driver.find_elements_by_xpath(_xpath)
    #if i == 0:
        #df = pd.DataFrame(columns=cells) # fill the dataframe with the column names
    for cell in cells:
        try:
            value = cell.find_element_by_xpath(".//td").text
            rowlist.append(value)
        except:
            break
    finallist.append(rowlist)
    
df = pd.DataFrame(finallist)
df[df.columns[[2, 0, 1, 7, 9, 8, 3, 5, 6, 4]]]

The code works well now but it is too slow. Is there a way to make it faster?

Collectives™ on Stack Overflow

Convert data to DataFrame in python

2 Answers 2

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related