0

I am having trouble taking down all of the Xpath hits. I am telling it to take all of the elements from 0 to j (j=20) that is the length of the container for which there is an xpath hit for //[@id='tabs-1']/div[3]/table/tbody/tr[2]/td and for //[@id='tabs-1']/div[3]/table/tbody/tr[1]/td[3]. However, when it cycles through j it only seems to write the very last one into the csv file. Is this a problem with the way the csvWriter is coded? I want to take all of the hits and put them into separate rows in a csv file with each row having a hit for both path queries (spread across 2 columns) with each j having a separate row.

Also, how would I code it so that the csv adds to already existing rows when it cycles to the next page (for i in range (0, num_pages)) and repeats the process? Thanks for your help!

import sys
import csv
from selenium import webdriver
import time
import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC 


 
# default path to file to store data
path_to_file = "/Users/D/Desktop/reviews.csv"

# default number of scraped pages
num_page = 3

# default tripadvisor website of hotel or things to do (attraction/monument) 
url = "https://www.tripadvisor.com/Attraction_Review-g187791-d192285-Reviews-Colosseum-Rome_Lazio.html"

# if you pass the inputs in the command line
if (len(sys.argv) == 4):
    path_to_file = sys.argv[1]
    num_page = int(sys.argv[2])
    url = sys.argv[3]

# import the webdriver
driver = webdriver.Safari()
driver.get(url)

# open the file to save the review
csvFile = open(path_to_file, 'a', encoding="utf-8")
csvWriter = csv.writer(csvFile)

# change the value inside the range to save more or less reviews

for i in range(0, num_page):
    name = []
    start=[]
    # expand the review
    time.sleep(2)
    container = driver.find_elements_by_xpath("//*[@id='tabs-1']/div[3]/table/tbody")
    
    for j in range(len(container)):
        name = container[j].find_element_by_xpath(".//tr[2]/td").text
        start = container[j].find_element_by_xpath(".//tr[1]/td[3]").text
        
# name of csv file  
        filename = path_to_file
    
# writing to csv file  
        with open(filename, 'w') as csvfile:  
    # creating a csv writer object  
            csvwriter = csv.writer(csvfile)   
    # writing the data rows  
            csvwriter.writerow([name, start])

        driver.find_element_by_xpath("//*[@id='tabs-1']/div[2]/a[@accesskey='n']").click()
     

driver.quit()

1 Answer 1

1

in each iteration you are overwriting the old contents of the file. that is why only the last iteration survives.

this line

with open(filename, 'w') as csvfile:

opens the file and truncates (removes) the content

to append use a instead of w.

see https://docs.python.org/3/library/functions.html#open

or for better performance open the file once outside of the loop.

with open(filename, 'w') as csvfile:  
    csvwriter = csv.writer(csvfile)   
    for j in range(len(container)):
        ...
        csvwriter.writerow([name, start])

this might not matter much because selenium is likely far slower than multiple opens. but it is always nice for your system if you use open sparingly.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your answer - what is a multiple open? How can I make it faster?
with multiple open i mean that you repeatedly open (and close) the same file to add a line. open is expensive. you should open once and then write all the lines then close. consider open like opening your garage to take a tool out. you do not open and close for every single tool. instead you open the garage once and keep open until you are done working and only then close the garage.
Thanks for your help. I changed it based on your advice, but am still having some additional issues. I've posted a new issue here if you have some time - thanks in advance: stackoverflow.com/questions/65461944/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.