2

With the help of @JaSON, here's a code that enables me to get the data in the table from local html and the code uses selenium

from selenium import webdriver

driver = webdriver.Chrome("C:/chromedriver.exe")
driver.get('file:///C:/Users/Future/Desktop/local.html')
counter = len(driver.find_elements_by_id("Section3"))
xpath = "//div[@id='Section3']/following-sibling::div[count(preceding-sibling::div[@id='Section3'])={0} and count(following-sibling::div[@id='Section3'])={1}]"
print(counter)

for i in range(counter):
    print('\nRow #{} \n'.format(i + 1))
    _xpath = xpath.format(i + 1, counter - (i + 1))
    cells = driver.find_elements_by_xpath(_xpath)
    for cell in cells:
         value = cell.find_element_by_xpath(".//td").text
         print(value)

How can these rows converted to be valid table that I can export to csv file? Here's the local HTML link https://pastebin.com/raw/hEq8K75C

** @Paul Brennan: After trying to edit counter to be counter-1 I got 17 rows to skip the error of row 18 temporarily, I got the filename.txt and here's snapshot of the output enter image description here

2
  • 1
    stackoverflow.com/questions/45394374/… this will answer your problem. I could not tailor it to your solution as we cannot see your local HTML. Commented Dec 8, 2020 at 11:36
  • 1
    I have updated the post and attached the HTML link. Commented Dec 8, 2020 at 11:42

2 Answers 2

1

I have modified your code to do a simple output. This is not very pythonic as it does not use vectorized creation of the Dataframe, but here is how it works. First set up pandas second set up a dataframe (but we don't know the columns yet) then set up the columns on the first pass (this will cause problems if there are variable column lengths Then input the values into the dataframe

import pandas as pd
from selenium import webdriver

driver = webdriver.Chrome("C:/chromedriver.exe")
driver.get('file:///C:/Users/Future/Desktop/local.html')
counter = len(driver.find_elements_by_id("Section3"))
xpath = "//div[@id='Section3']/following-sibling::div[count(preceding-sibling::div[@id='Section3'])={0} and count(following-sibling::div[@id='Section3'])={1}]"
print(counter)

df = pd.Dataframe()

for i in range(counter):
    print('\nRow #{} \n'.format(i + 1))
    _xpath = xpath.format(i + 1, counter - (i + 1))
    cells = driver.find_elements_by_xpath(_xpath)
    if i == 0:
        df = pd.DataFrame(columns=cells) # fill the dataframe with the column names
    for cell in cells:
        value = cell.find_element_by_xpath(".//td").text
        #print(value)
        if not value:  # check the string is not empty
            # always puting the value in the first item
            df.at[i, 0] = value # put the value in the frame

df.to_csv('filename.txt') # output the dataframe to a file

How this could be made better is to put the items in a row into a dictionary and put them into the datframe. but I am writing this on my phone so I cannot test that.

Sign up to request clarification or add additional context in comments.

6 Comments

Thanks a lot for great help. After printing the data fro row 18, I got error Message: no such element: Unable to locate element: {"method":"xpath","selector":".//td"}. The error refers to that line value = cell.find_element_by_xpath(".//td").text and there is no file exported. As for the number of column are 10 (You can have a look at the HTML local file in the browser)
I have skipped row 18 to get the filename as output. Attached snapshot in the main post.
@QHarr I am sure you have experience at this field.
How about we skip the blank lines... That will get the comma out.
I have used try.. except and after except I put break.this fixes the error and now I can get all the data in the same order of rows. How to get the data to be dataframe as the table on the webpage appears as I noticed the values are not in order? And I need to skip blank lines too.
|
0

With the great help of @Paul Brennan, I could modify the code so as to get the final desired output

import pandas as pd
from selenium import webdriver

driver = webdriver.Chrome("C:/chromedriver.exe")
driver.get('file:///C:/Users/Future/Desktop/local.html')
counter = len(driver.find_elements_by_id("Section3"))
xpath = "//div[@id='Section3']/following-sibling::div[count(preceding-sibling::div[@id='Section3'])={0} and count(following-sibling::div[@id='Section3'])={1}]"
finallist = []

for i in range(counter):
    #print('\nRow #{} \n'.format(i + 1))
    rowlist=[]
    _xpath = xpath.format(i + 1, counter - (i + 1))
    cells = driver.find_elements_by_xpath(_xpath)
    #if i == 0:
        #df = pd.DataFrame(columns=cells) # fill the dataframe with the column names
    for cell in cells:
        try:
            value = cell.find_element_by_xpath(".//td").text
            rowlist.append(value)
        except:
            break
    finallist.append(rowlist)
    
df = pd.DataFrame(finallist)
df[df.columns[[2, 0, 1, 7, 9, 8, 3, 5, 6, 4]]]

The code works well now but it is too slow. Is there a way to make it faster?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.